Apache Hadoop 3 has been released

January 22, 2018/Johnson Darkwah

Late in 2017, a long-expected Apache Hadoop 3.0.0., a new version of the Hadoop framework designed for the processing of large amounts of both structured and unstructured data, was released. Our big data solution specialist Johnson Darkwah will tell you what's new about it and why we can't wait to start working with it.

Why is it so important that the new version has been released?

Apache Hadoop 3.0.0 is a brand new release whose commercial distribution will start roughly in the middle of 2018. Hadoop is a technology that strongly influenced the way we work with big data. And now its new version with crucial updates is out.

What's so important about the new version?

I think the most important new feature is HDFS Erasure Coding, which will decrease the use of disk space by about 50 %. This means one very important thing – lower costs of data storage.

So is it primarily about higher speed and lower costs?

In recent years, Hadoop was usually installed as on-premise software at customers' data centers. It was necessary to update Hadoop and push it forward. Saving data to certain cloud storages gives a better cost to performance ratio, and so it is only logical that Hadoop aimed at cutting down initial and operating expenses.

What are other new features of the new version?

Changes in YARN are important for us, specifically the support of Docker containers, which will hopefully enable faster development on Hadoop.

What does the new version bring to customers?

Faster, cheaper and more secure processing of data on all levels. Customers should know that in Gauss Algorithmic, we are very carefully monitoring the development of the new Hadoop release, and once we consider the system advanced enough, we will add it to our portfolio and start offering it as a part of our data lake. We have a thing for Hadoop – we've been using it in most of our solution, and our Czech and Slovak customers will be the first to be offered version 3.

What role does the company Cloudera play here?

Although Hadoop is an open-source technology, Cloudera strongly influences the direction of its development. Moreover, Doug Cutting, who is one of the founders of Hadoop and a board member of the Apache Hadoop Foundation, works for ClouderaGauss Algorithmic is also Cloudera's official partner for Central and Eastern Europe, enabling us to deliver robust solutions in a very short time.

More information about the new version can be found on the Cloudera blog or on the Apache Hadoopwebsite.

Do you like the article? Share it.

On a similar topic

Anomaly detection in time series
Anomaly detection in time seriesMay 14, 2019

Anomaly detection is one of the areas I deal with in Gauss Algorithmic. And so when I was at Machine Learning Prague...

More info about Anomaly detection in time series

Read our blog

[LWM] Entity recognition 2

[LWM] Entity recognition 2

06. 7. 2022Read more [LWM] Entity recognition 2
[LWM] NLP: Bag-of-words

[LWM] NLP: Bag-of-words

06. 7. 2022Read more [LWM] NLP: Bag-of-words
[LWM] NLP: Text summarization

[LWM] NLP: Text summarization

06. 7. 2022Read more [LWM] NLP: Text summarization

Are you interested in our services?

Contact us

We collect anonymous data to monitor traffic and enhance our website. Do you agree to cookies?

YesNo, give me more information