Data Analysis and Machine Learning Introduction in Python

A practical workshop of the modern data analysis in Python. Take your data analysis skills to next level. Adopt modern data analysis methods or machine learning and figure out the implementation using Python Scientific Stack and Jupyter Notebook. Data analysis is a complex, non-trivial process. See best practices, gotchas and tricks - based on lecturers' years of experience. Choose the date and sign up.

When

Thursday 9. 11. 2017, 10:00 to 18:00

Why attend

  • Your company has decided to adopt modern data analysis methods or machine learning and has asked you to figure out the implementation.
  • You’ve opted for open source and realized it’s not that easy to meet enterprise demands.
  • You plan to use software in the Python scientific stack for your end solution.

Required knowledge

  • Basic understanding of programming in any language
  • Basic knowledge of operating system (Linux/Windows/macOS)
  • Basic knowledge of Git (or any distributed version control system)
  • Czech or English language 

What do you need for the workshop

  • Laptop with working Wi-Fi
  • GitHub account (private or public)

Workshop outline

Introduction

  • Python Scientific Stack
    Brief Introduction to fundamentals of Python scientific libraries for numerical computation, data manipulation and visualisation.
  • Jupyter Notebook
    Overview of origin, architecture and basic usage of Jupyter notebook, an interactive computing environment that enables users to author notebook documents that include: - Live code - Interactive widgets - Plots - Narrative text - Equations - Images - Video
  • Data Analysis Workflow
    Data analysis is a complex, non-trivial process. Best practices, gotchas and tricks - based on author's years of experience.

Python Introduction

  • Basic Data Types
  • Flow Control
  • Functions
  • Comprehensions
  • Exceptions
  • Debugging
  • Strings
  • Time

LABS: Simple Web Data Scraping

Numpy/Pandas Introduction

  • Data Structures
  • Data Sources
  • Basic Data Operation
  • Data Cleaning
  • Visualisation

LABS: California Housing
Predict the median house value for census block groups in California.

 

Machine Learning with scikit-learn

  • Motivation
  • Workflow
  • Basic ML Algorithms
  • Dimension Reduction and Feature Importance
  • Model Evaluation
  • Cross-validation and Hyperparameter tuning

LABS:

Dimension Reduction Application
Data are multidimensional in nature. Dimension reduction is an important tool for data visualization as well as for Machine Learning preprocessing. Different method will be discussed and explored.

Hyperparameter Tuning
The same kind of machine learning model could require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can best solve the machine learning problem.

Where: Rohanské nábřeží 671/15, Praha 8

Capacity: 6/10

Lecturers:

Mgr. Jiří Polcar, Ph.D.

Chief Data Scientist, founder

Jiří Polcar – Gauss Algorithmic Follow us on Linkedin

Specialist in the development of applications for Data Analysis and machine learning with a doctorate in Astrophysics. Marathon runner.

Mgr. Jaroslav Vážný

Big Data Expert

Jaroslav Vážný – Gauss Algorithmic Follow us on Linkedin

Working with big data, he mastered in astronomy and physics research. Sauna master.

Price: 7 490 CZK (excluding of VAT)

Czech