Course Outline
Introduction
- Overview of Dask features and advantages
- Parallel computing in Python
Getting Started
- Installing Dask
- Dask libraries, components, and APIs
- Best practices and tips
Scaling NumPy, SciPy, and Pandas
- Dask arrays examples and use cases
- Chunks and blocked algorithms
- Overlapping computations
- SciPy stats and LinearOperator
- Numpy slicing and assignment
- DataFrames and Pandas
Dask Internals and Graphical UI
- Supported interfaces
- Scheduler and diagnostics
- Analyzing performance
- Graph computation
Optimizing and Deploying Dask
- Setting up adaptive deployments
- Connecting to remote data
- Debugging parallel programs
- Deploying Dask clusters
- Working with GPUs
- Deploying Dask on cloud environments
Troubleshooting
Summary and Next Steps
Requirements
- Experience with data analysis
- Python programming experience
Audience
- Data scientists
- Software engineers
Testimonials (2)
Examples/exercices perfectly adapted to our domain
Luc - CS Group
Course - Scaling Data Analysis with Python and Dask
The fact of having more practical exercises using more similar data to what we use in our projects (satellite images in raster format)