This is an Interactive Tutorial designed as part of a workshop in ETH Zurich UP focussed on dask-powered xarray for high-resolution climate, weather, & ocean/atmosphere model outputs.
Projects such as EERIE as well as sub-2.5 kilometre resolution coupled climate models such as the ICON Model output from EXCLAIM have proven that efficient & effective use of dask is now increasingly necessary.
These scalable data analysis workflows are crucial to unlock the full potential of these rich datasets and advance our understanding of the Earth system.
-
Notebook 0: Fundamentals of task-based parallelism &
dask, and creating/manipulatingdask-backedxarraydata. -
Notebook 1: Avoiding common pitfalls, and grasping the science/art of chunking.
-
Notebook 2: Using the
DaskDashboard, design of task-based parallel algorithms, exploring advanced techniques & methods, and scaling up withDaskSLURM clusters.
To run the notebooks, you'll need python installed along with the following libraries:
xarraynumpydaskdask_jobqueue(for Notebook 2)flox(for Notebook 2)
You can install the dependencies using pip:
pip install xarray numpy dask dask_jobqueue flox
The notebooks use example climate datasets resampled from a model run as part of the EERIE Project, now stored in zarr and netCDF format. Update the data_dir variable in each notebook to point to the location of your datasets.