Overview
Dask is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks. Dask is ideal when you’re already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow.Why Use Dask for Time Series Forecasting?
Dask offers unique advantages for scaling time series forecasting:- Pandas-like API: Minimal code changes from your existing pandas workflows
- Easy scaling: Convert pandas DataFrames to Dask with a single line of code
- Python-native: Pure Python implementation, no JVM required (unlike Spark)
- Flexible deployment: Run on your laptop or scale to a cluster
- Memory efficiency: Process datasets larger than RAM through intelligent chunking
- Simplify distributed computing with Fugue
- Run TimeGPT at scale on a Dask cluster
- Seamlessly convert pandas DataFrames to Dask
Prerequisites
Before proceeding, make sure you have an API key from Nixtla.How to Use TimeGPT with Dask
Step 1: Install Fugue and Dask
Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask. You can install Fugue with:nixtla library is installed on all worker nodes.
Step 2: Load Your Data
You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:| unique_id | ds | y | |
|---|---|---|---|
| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
Step 3: Import Dask
Convert the pandas DataFrame into a Dask DataFrame for parallel processing.Step 4: Use TimeGPT on Dask
To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla’s client methods instead of a pandas DataFrame. Instantiate theNixtlaClient class to interact with Nixtla’s API:
NixtlaClient, such as forecast or cross_validation.
- Forecast Example
- Cross-validation Example
Forecast with TimeGPT and Dask
| unique_id | ds | TimeGPT | |
|---|---|---|---|
| 0 | BE | 2016-12-31 00:00:00 | 45.190453 |
| 1 | BE | 2016-12-31 01:00:00 | 43.244446 |
| 2 | BE | 2016-12-31 02:00:00 | 41.958389 |
| 3 | BE | 2016-12-31 03:00:00 | 39.796486 |
| 4 | BE | 2016-12-31 04:00:00 | 39.204533 |
Working with Exogenous Variables
TimeGPT with Dask also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical.Related Resources
Explore more distributed forecasting options:- Distributed Computing Overview - Compare Spark, Dask, and Ray
- Spark Integration - For datasets with 100M+ observations
- Ray Integration - For ML pipeline integration
- Fine-tuning TimeGPT - Improve accuracy at scale
- Cross-Validation - Validate distributed forecasts