What this does

NVIDIA's Synthetic Data Generation for Agentic AI is an approach and set of tools for producing high-quality, domain-specific synthetic data to train and fine-tune models for agentic workflows. It uses NVIDIA's Nemotron models to generate synthetic responses and then rank and filter them, creating training data that mimics real-world characteristics.

Key capabilities

  • A synthetic data generation pipeline built on Nemotron instruct and reward models
  • Generation, ranking, and filtering of synthetic responses to build high-quality datasets
  • Demonstrations of multi-step task completion for fine-tuning agentic models
  • Open models, recipes, and data curation tooling covering the post-training lifecycle

Who it's for

This is aimed at developers and enterprises building specialized AI agents who need large volumes of high-quality training data without relying solely on scarce real-world data. It supports teams fine-tuning custom LLMs for reasoning and agentic AI use cases across various domains.