Time Series Forecasting with Dask

Overview

Dask is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks. Dask is ideal when you’re already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow.

Why Use Dask for Time Series Forecasting?

Dask offers unique advantages for scaling time series forecasting:

Pandas-like API: Minimal code changes from your existing pandas workflows
Easy scaling: Convert pandas DataFrames to Dask with a single line of code
Python-native: Pure Python implementation, no JVM required (unlike Spark)
Flexible deployment: Run on your laptop or scale to a cluster
Memory efficiency: Process datasets larger than RAM through intelligent chunking

Choose Dask when you need to scale from 10 million to 100 million observations and want the smoothest transition from pandas. What you’ll learn:

Simplify distributed computing with Fugue
Run TimeGPT at scale on a Dask cluster
Seamlessly convert pandas DataFrames to Dask

Prerequisites

Before proceeding, make sure you have an API key from Nixtla.

How to Use TimeGPT with Dask

Step 1: Install Fugue and Dask

Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask. You can install Fugue with:

pip install fugue[dask]

If running on a distributed Dask cluster, ensure the nixtla library is installed on all worker nodes.

Step 2: Load Your Data

You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:

import pandas as pd

df = pd.read_csv(
    'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
    parse_dates=['ds'],
)
df.head()

Example pandas DataFrame:

	unique_id	ds	y
0	BE	2016-10-22 00:00:00	70.00
1	BE	2016-10-22 01:00:00	37.10
2	BE	2016-10-22 02:00:00	37.10
3	BE	2016-10-22 03:00:00	44.75
4	BE	2016-10-22 04:00:00	37.10

Step 3: Import Dask

Convert the pandas DataFrame into a Dask DataFrame for parallel processing.

import dask.dataframe as dd

dask_df = dd.from_pandas(df, npartitions=2)
dask_df

When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources.

Step 4: Use TimeGPT on Dask

To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla’s client methods instead of a pandas DataFrame. Instantiate the NixtlaClient class to interact with Nixtla’s API:

from nixtla import NixtlaClient

nixtla_client = NixtlaClient(
    api_key='my_api_key_provided_by_nixtla'
)

You can use any method from the NixtlaClient, such as forecast or cross_validation.

Forecast Example
Cross-validation Example

Forecast with TimeGPT and Dask

fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()

	unique_id	ds	TimeGPT
0	BE	2016-12-31 00:00:00	45.190453
1	BE	2016-12-31 01:00:00	43.244446
2	BE	2016-12-31 02:00:00	41.958389
3	BE	2016-12-31 03:00:00	39.796486
4	BE	2016-12-31 04:00:00	39.204533

Working with Exogenous Variables

TimeGPT with Dask also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical. Explore more distributed forecasting options:

Distributed Computing Overview - Compare Spark, Dask, and Ray
Spark Integration - For datasets with 100M+ observations
Ray Integration - For ML pipeline integration
Fine-tuning TimeGPT - Improve accuracy at scale
Cross-Validation - Validate distributed forecasts

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

Overview

Why Use Dask for Time Series Forecasting?

Prerequisites

How to Use TimeGPT with Dask

Step 1: Install Fugue and Dask

Step 2: Load Your Data

Step 3: Import Dask

Step 4: Use TimeGPT on Dask

Working with Exogenous Variables

INTRODUCTION

SETUP

DATA REQUIREMENTS

FORECASTING

ANOMALY DETECTION

USE CASES

REFERENCE

About

​Overview

​Why Use Dask for Time Series Forecasting?

​Prerequisites

​How to Use TimeGPT with Dask

​Step 1: Install Fugue and Dask

​Step 2: Load Your Data

​Step 3: Import Dask

​Step 4: Use TimeGPT on Dask

​Working with Exogenous Variables

​Related Resources

Overview

Why Use Dask for Time Series Forecasting?

Prerequisites

How to Use TimeGPT with Dask

Step 1: Install Fugue and Dask

Step 2: Load Your Data

Step 3: Import Dask

Step 4: Use TimeGPT on Dask

Working with Exogenous Variables

Related Resources