Anomaly detection

The aim of our research was to look for irregularities in the behavior of electricity consumers during a year-and-a-half-long period. Anomaly detection, one of the most common parts of cyber security research, enables to detect threats based on unusual behavior.


Deviations from the normal situation can be looked for in virtually any data. A correctly performed analysis is based on the initial statistical analysis that maps the layout and status of the searched data. For the purpose of visual evaluation and better orientation in data, splitting into clusters based on the observed electricity consumption serves well. This can be done by using some of the commonly used data dimension reduction analyses, such as PCA or t-SNE. Anomalous values that do not fit into any of the clusters can be found even during splitting. However, dimension reduction methods weren't sufficient in this case.

Graf detekce anomalií
Figure 1: The behavior of four anomalous users. The yellow colour shows the predicted electricity consumption while the green color represents the real consumption.

External factors also play a role

The electricity consumption of each user is largely influenced by various external factors, and so enriching data with a range of additional information was also part of the research. These factors include data evidencing temperature on a particular day or the number of sunshine hours. It may also be important, for instance, whether or not the day was a working day.

Prediction of electricity consumption

Enriched data was used for training various tested regressors, based on which the prediction of power consumption was made for each user. The GBRT (Gradient Boosted Regression Trees) regressor, which uses decision trees for prediction, was best suited. Users with the worst prediction results were identified as individuals with anomalous behavior. Check the Open data website, where you'll find a large amount of digitized data, for an inspiration on what information to enrich your data with.


In order to detect anomalies in electricity consumption, we first performed a statistical analysis, which resulted in splitting data into clusters. Then we took into account the external factors that affect electricity consumption, such as temperature or hours of sunshine. We used this enriched data to train the GBRT regressor, which uses decision trees for prediction. Electricity consumers with the worst prediction results were labelled as anomalous consumers.

Tools used

  • Jupyter Notebook: The useful Jupyter Notebook tool helped carry out the research. The tool facilitates work and, if the defined principles are followed, enables to create clear code.
  • Pandas: The Pandas library was used for the basic statistics and analysis, on which the whole research was based.
  • Matplotlib and Plotly: For the visualization of individual steps and continuous results, the Matplotlib and Plotly libraries were used.
  • Scikit-learn: The scikit-learn library was one of the most important ones in the research. It enabled the implementation of methods for finding clusters of similar users as well as for the implementation of regressors, which predicted the future electricity consumption.
  • Holidays: This library was helpful in enriching data with the information whether a day was or was not a working day.
  • Astral: Another library necessary for data enrichment was used to determine the position of the Sun and the Moon, the day length, the night length and other data relevant for the analysis of electricity consumption.
  • Grafana: The open-source platform for time series analysis Grafana was used to visualize the prediction and detected anomalous electricity consumers.

Related products and services

Data cleaning
Data cleaning

Great analytic results rest on clean data. Our solutions and tools will dramatically speed up data cleaning efforts. We’ve worked on hundreds of d...

More info about Data cleaning
Data protection and information security
Data protection and information security

With ever-changing regulatory needs as well as increased cyber threats, you need a future-proof solution that will help you meet strict data secur...

More info about Data protection and information security

Contact us

Set up a meeting with our anomaly detection expert.

Jaroslav Vážný
Jaroslav Vážný Big data expert

Read our blog

[LWM] Entity recognition 2

[LWM] Entity recognition 2

29. 5. 2022Read more [LWM] Entity recognition 2
[LWM] NLP: Bag-of-words

[LWM] NLP: Bag-of-words

29. 5. 2022Read more [LWM] NLP: Bag-of-words
[LWM] NLP: Text summarization

[LWM] NLP: Text summarization

29. 5. 2022Read more [LWM] NLP: Text summarization

We must know, we will know

Expert team in big data and AI

Our team has presented hundreds of insights in many possible formats. We use tools and methods developed and used by scientific teams dedicated to research.

Tailored approach

We strongly consider the existing business environment, capabilities to execute and skill of the staff. This enables us to provide minimum risk and bring quick success to your company.

Working with the best innovators

Cloudera, Microsoft, Clever Analytics, Apache Kafka, Apache Spark, Power BI, Tableau, Jupyter Notebooks