Model Distillation in the API

We’re introducing a new Model Distillation offering to provide developers with an integrated workflow to manage the entire distillation pipeline directly within the OpenAI platform. This lets developers easily use the outputs of frontier models like o1‑preview and GPT‑4o to fine-tune and improve the performance of more cost-efficient models like GPT‑4o mini.

Model distillation involves fine-tuning smaller, cost-efficient models using outputs from more capable models, allowing them to match the performance of advanced models on specific tasks at a much lower cost. Until now, distillation has been a multi-step, error-prone process, which required developers to manually orchestrate multiple operations across disconnected tools, from generating datasets to fine-tuning models and measuring performance improvements. Since distillation is inherently iterative, developers needed to repeatedly run each step, adding significant effort and complexity.

Our new Model Distillation suite includes:

### How to use Model Distillation

First, create an evaluation⁠(opens in a new window) to measure the performance of the model you want to distill into, which in this example will be GPT‑4o mini. This evaluation will be used to continuously test the distilled model’s performance, to help you decide whether to deploy it.

Next, use Stored Completions to create a distillation dataset of real-world examples using GPT‑4o’s outputs for the tasks on which you want to fine-tune GPT‑4o mini. You can do this by setting the ‘store:true’ flag in the Chat Completions API to automatically store these input-output pairs without any latency impact. These stored completions can be reviewed, filtered, and tagged to create high-quality datasets for fine-tuning or evaluation.

`1response = client.chat.completions.create(2 model="gpt-4o",3 messages=[4 {5 "role": "user",6 "content": [7 {8 "type": "text",9 "text": "what's the capital of the USA?"10 }11 ]12 }13 ],14 store=True,15 metadata={"username": "user123", "user_id": "123", "session_id": "123"}`

Finally, use this dataset to fine-tune GPT‑4o mini. Stored Completions can be used as a training file when creating a fine-tuned model. Once the model is fine-tuned, you can go back to Evals to test whether the fine-tuned GPT‑4o mini model meets your performance criteria when compared to GPT‑4o.

Fine-tuning is an iterative process. If the initial results aren’t satisfactory, you may need to refine the dataset, adjust the training parameters, or capture more specific examples where the model is underperforming. The goal is to incrementally improve the distilled model until it performs well enough for production use.

## Availability & Pricing

Model Distillation is available today to all developers and can be used to distill any of our models, including GPT‑4o and o1‑preview. As a reminder, we’re also offering 2M free training tokens per day on GPT‑4o mini and 1M free training tokens per day on GPT‑4o until October 31 to help developers get started with distillation. Beyond that limit, the cost of training and running a distilled model is the same as our standard fine-tuning prices, which you can find on our API pricing page⁠.

Stored Completions is available for free. Evals, which are available in beta, are charged at standard model prices based on the tokens used. Through the end of the year, developers can run evaluations for free (up to 7 per week) when they opt in⁠(opens in a new window) to share their Evals with OpenAI. Evals shared with us will be used to help us improve and evaluate our future models.

For more information, check out our Model Distillation docs⁠(opens in a new window).

Our Research * Research Index * Research Overview * Research Residency * OpenAI for Science * Economic Research

Latest Advancements * GPT-5.3 Instant * GPT-5.3-Codex * GPT-5 * Codex

Safety * Safety Approach * Security & Privacy * Trust & Transparency

ChatGPT * Explore ChatGPT(opens in a new window) * Business * Enterprise * Education * Pricing(opens in a new window) * Download(opens in a new window)

Sora * Sora Overview * Features * Pricing * Sora log in(opens in a new window)

API Platform * Platform Overview * Pricing * API log in(opens in a new window) * Documentation(opens in a new window) * Developer Forum(opens in a new window)

For Business * Business Overview * Solutions * Contact Sales

Company * About Us * Our Charter * Foundation * Careers * Brand

Support * Help Center(opens in a new window)

More * News * Stories * Livestreams * Podcast * RSS

Terms & Policies * Terms of Use * Privacy Policy * Other Policies

(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)(opens in a new window)

English United States

Model Distillation in the API

The unpaid, unrecognised burden of the women-led care economy of India

Andrej Karpathy Transitions from Coding to Directing AI Agents

Musk and Hassabis Discuss AI's Impact on Scientific Discovery

Perfios Reports 46% Profit Increase to ₹104 Cr in FY25, Revenue Surpasses ₹700 Cr

Latest Briefs