Skip to main content

Overview

Dask is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks. Dask is ideal when you’re already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow.

Why Use Dask for Time Series Forecasting?

Dask offers unique advantages for scaling time series forecasting:
  • Pandas-like API: Minimal code changes from your existing pandas workflows
  • Easy scaling: Convert pandas DataFrames to Dask with a single line of code
  • Python-native: Pure Python implementation, no JVM required (unlike Spark)
  • Flexible deployment: Run on your laptop or scale to a cluster
  • Memory efficiency: Process datasets larger than RAM through intelligent chunking
Choose Dask when you need to scale from 10 million to 100 million observations and want the smoothest transition from pandas. What you’ll learn:
  • Simplify distributed computing with Fugue
  • Run TimeGPT at scale on a Dask cluster
  • Seamlessly convert pandas DataFrames to Dask

Prerequisites

Before proceeding, make sure you have an API key from Nixtla.

How to Use TimeGPT with Dask

Open In Colab

Step 1: Install Fugue and Dask

Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask. You can install Fugue with:
pip install fugue[dask]
If running on a distributed Dask cluster, ensure the nixtla library is installed on all worker nodes.

Step 2: Load Your Data

You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:
import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
)
df.head()
Example pandas DataFrame:
unique_iddsy
0BE2016-10-22 00:00:0070.00
1BE2016-10-22 01:00:0037.10
2BE2016-10-22 02:00:0037.10
3BE2016-10-22 03:00:0044.75
4BE2016-10-22 04:00:0037.10

Step 3: Import Dask

Convert the pandas DataFrame into a Dask DataFrame for parallel processing.
import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)
dask_df
When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources.

Step 4: Use TimeGPT on Dask

To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla’s client methods instead of a pandas DataFrame. Instantiate the NixtlaClient class to interact with Nixtla’s API:
from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)
You can use any method from the NixtlaClient, such as forecast or cross_validation.
  • Forecast Example
  • Cross-validation Example
Forecast with TimeGPT and Dask
fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()
unique_iddsTimeGPT
0BE2016-12-31 00:00:0045.190453
1BE2016-12-31 01:00:0043.244446
2BE2016-12-31 02:00:0041.958389
3BE2016-12-31 03:00:0039.796486
4BE2016-12-31 04:00:0039.204533

Working with Exogenous Variables

TimeGPT with Dask also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical. Explore more distributed forecasting options: