A practical workshop of the modern data analysis in Python. Take your data analysis skills to next level. Adopt modern data analysis methods or machine learning and figure out the implementation using Python Scientific Stack and Jupyter Notebook. Data analysis is a complex, non-trivial process. See best practices, gotchas and tricks - based on lecturers' years of experience. Choose the date and sign up.
Thursday 9. 11. 2017, 10:00 to 18:00
- Your company has decided to adopt modern data analysis methods or machine learning and has asked you to figure out the implementation.
- You’ve opted for open source and realized it’s not that easy to meet enterprise demands.
- You plan to use software in the Python scientific stack for your end solution.
- Basic understanding of programming in any language
- Basic knowledge of operating system (Linux/Windows/macOS)
- Basic knowledge of Git (or any distributed version control system)
- Czech or English language
What do you need for the workshop
- Laptop with working Wi-Fi
- GitHub account (private or public)
Python Scientific Stack
Brief Introduction to fundamentals of Python scientific libraries for numerical computation, data manipulation and visualisation.
Overview of origin, architecture and basic usage of Jupyter notebook, an interactive computing environment that enables users to author notebook documents that include: - Live code - Interactive widgets - Plots - Narrative text - Equations - Images - Video
Data Analysis Workflow
Data analysis is a complex, non-trivial process. Best practices, gotchas and tricks - based on author's years of experience.
- Basic Data Types
- Flow Control
LABS: Simple Web Data Scraping
- Data Structures
- Data Sources
- Basic Data Operation
- Data Cleaning
LABS: California Housing
Predict the median house value for census block groups in California.
Machine Learning with scikit-learn
- Basic ML Algorithms
- Dimension Reduction and Feature Importance
- Model Evaluation
- Cross-validation and Hyperparameter tuning
Dimension Reduction Application
Data are multidimensional in nature. Dimension reduction is an important tool for data visualization as well as for Machine Learning preprocessing. Different method will be discussed and explored.
The same kind of machine learning model could require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can best solve the machine learning problem.
Where: Rohanské nábřeží 671/15, Praha 8
Mgr. Jiří Polcar, Ph.D.
Chief Data Scientist, founder
Specialist in the development of applications for Data Analysis and machine learning with a doctorate in Astrophysics. Marathon runner.
Price: 7 490 CZK (excluding of VAT)