What is big data analysis?
Big data analysis is the collective name for procedures and tools that you can use to obtain, collate and analyze big data files and find correlations within them.
When analyzing big data, you discover useful information hidden in the extensive data sets emerging every day both inside and outside your organization. This may be data arising from contact with clients or customers, but also reflecting market development. Information obtained in the analysis not only enable history evaluation for you, but also facilitate the preparation of predictive models describing their likely future development.
For what and where is it used?
Thanks to perfect knowledge of customer behavior and the competition you can increase the conversion ratio of an e-shop, personalize a menu or save on logistics.
Analyze risk investments, create models of the future development of financial markets or your own scoring model for the evaluation of companies.
What data can you analyze?
Website traffic, orders for e-commerce, communication in social networks, stock movements or developments in financial markets. Data containing interesting and useful information is all around us. Ask us how to get it.
- Structured data is data that is stored in precisely defined and described data fields. When people talk about databases they usually have in mind just structured data. A typical example is a customer database in which each record consists of a name, address, account number, and so on. Structured data have a clear model and description, and can therefore be well stored, processed and analyzed.
- Unstructured data on the contrary does not have a precisely defined structure. This category includes all the data that do not have a constant firmly defined structure, such as images, videos, websites or the content of e-mail and or other communications. Unstructured data forms the absolute majority of data generated and the term "big data" mainly refers to them.
- Semi-structured data is the intersection of the two given types above. This is a type of structured data without a precise model. An example might be a whole e-mail, which consists of unstructured data, a text body and attachments, and structured content, mail header with a precisely defined structure and fields (sender, recipient, date and time sent).
We use tools and methods developed and used by scientific teams dedicated to research, pushing the current boundaries of human knowledge. Our clients can benefit from the latest scientific processing data obtained from spacecraft or experiments on accelerators at CERN.
- SciPy is a library of Python programming language designed for scientific and engineering calculations and data analysis. The number of functions and methods contained in this library is constantly growing thanks to the work of the scientific community as well as extensive support for software companies.
- NumPy is an extension of the Python programming language, which includes an extensive library of advanced mathematical functions for the processing and analysis of large multi-dimensional fields and matrices.
- Scikit-learn is a Python library containing implementations of powerful algorithms for data analysis and machine learning. It uses the Numpy and Scipy libraries for calculations and modeling and the Matplotlib library for data visualization and results.
- Pandas is a library for processing and analyzing data in the Python programming language. It was originally designed as a tool for quantitative analysis of financial data, but today it is a powerful tool for the analysis of time series and working with big data tables.
- IPython and IPython Notebook are a web development environment for Python and an interactive shell for interactive computing in different programming languages. Thanks to its sophisticated and powerful architecture they are ideal for performing parallel and distributed computing. They allow linking of program code, its outputs and documentation using multimedia.
- Python is a dynamic interpreting language. Sometimes it is classified among the so-called scripting languages. Its possibilities are higher though. Python was designed to allow the formation of large, full-fledged applications (including graphical user interface). Python is much faster during calculations in comparison with advanced statistical programming languages such as R.
- Root is an object-oriented framework developed at CERN for extensive data analysis. It was originally designed for data analysis in particle physics and includes several features specific to this area, but it is also used in other applications such as astronomy, mining and data analysis.