Overview
Ray is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray. Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications.Why Use Ray for Time Series Forecasting?
Ray offers unique advantages for ML-focused time series forecasting:- ML pipeline integration: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve
- Task parallelism: Handle complex task dependencies beyond data parallelism
- Python-native: Pure Python with minimal boilerplate code
- Flexible architecture: Scale from laptop to cluster with the same code
- Actor model: Stateful computations for advanced forecasting scenarios
- Install Fugue with Ray support for distributed computing
- Initialize Ray clusters for distributed forecasting
- Run TimeGPT forecasting and cross-validation on Ray
Prerequisites
Before proceeding, make sure you have an API key from Nixtla. When executing on a distributed Ray cluster, ensure thenixtla library is installed on all workers.
How to Use TimeGPT with Ray
Step 1: Install Fugue and Ray
Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray. Install Fugue with Ray support:Step 2: Load Your Data
Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets:| unique_id | ds | y | |
|---|---|---|---|
| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
Step 3: Initialize Ray
Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment.Step 4: Use TimeGPT on Ray
To use TimeGPT with Ray, provide a Ray Dataset to Nixtla’s client methods instead of a pandas DataFrame. The API remains the same as local usage. Instantiate theNixtlaClient class to interact with Nixtla’s API:
NixtlaClient, such as forecast or cross_validation.
- Forecast Example
- Cross-validation Example
timegpt-1 (default) and timegpt-1-long-horizon. For long horizon forecasting, see the long-horizon model tutorial.Step 5: Shutdown Ray
Always shut down Ray after you finish your tasks to free up resources:Working with Exogenous Variables
TimeGPT with Ray also supports exogenous variables. Refer to the Exogenous Variables Tutorial for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical.Related Resources
Explore more distributed forecasting options:- Distributed Computing Overview - Compare Spark, Dask, and Ray
- Spark Integration - For datasets with 100M+ observations
- Dask Integration - For datasets with 10M-100M observations
- Fine-tuning TimeGPT - Improve accuracy at scale
- Cross-Validation - Validate distributed forecasts