Products and services
Modern data infrastructure for Slovak Telekom and access to new interesting data
The opportunity to utilize big data for a mobile operator today is too good to ignore. With the emergence of big data technologies, Slovak Telekom saw the opportunity to analyze and combine data sources that were too costly to process in the past at the speed that the business needed. Plans were big as many departments were hungry to utilize the new information, but resources and experience with big data technologies were very limited.
Slovak Telekom worked with us to reduce risks relating to executing their new big data strategy.
Once a data governance model was in place, it was time to design an architecture that would be flexible and scalable to the needs of Slovak Telekom’s business. At the same time resources were limited, so the solution needed to be manageable by a handful of engineers.
Suitable data architecture
We chose an architecture capable of processing batch and real-time data that wouldn’t need vast engineering work to enable data flows and process data. Within the architecture, we implemented the Cloudera distribution of Hadoop with Apache Kafka and Apache Spark. Tools like Apache Flume and Apache Sqoop were selected to ease the amount of data engineering required to process data at the beginning, whereas today engineers have adopted Apache Spark version 2 for better performance.
The system runs as more logical clusters that are independently scalable. The data integration is taken care of by two clusters with Apache Kafka as the core open-source technology. They work mainly as a protection layer to reduce the amount of “dirty” data flowing to the desired storage layers. We run these integration layers on the operator's private cloud infrastructure, allowing us to add resources much faster and tackle unexpected spikes in the data flows.
Analytics and machine learning
The data lake itself takes care of further processing and analytics including machine learning. This cluster runs on bare metal hardware for better performance gains. Resource management is fine-tuned to the needs of different teams like IT or data scientists accessing the platform. Python notebooks and SQL-based access methods are available to these end users.
The solution complies with the required security and data protection levels. Overall, the system today runs critical workloads, accessing more than 10 internal and external data sources, which helps make marketing and operational decisions on a daily basis.
Related products and services
Data integrations typically combine machine and contextual data, which is available in various structures. We will help you correctly join big dat...
We deliver secure and highly scalable data environments faster than traditional infrastructure providers. We are experienced in building data syst...
Cloudera is a technology company providing an ecosystem of software solutions for processing and effectively using big data. They develop and simp...
More case studies
Our recommendation engine is looking to match the best job offers with the best candidates.
Discover the needs of your customers at the right time and start talking to them ahead of the competition