Skip to main content

Audio Scene Generation

Note: Audio scene generation is currently in beta.

An audio scene is a formal representation of a listening scenario in which one or more source recordings are positioned within a modeled acoustic environment and rendered to a listener using a specified device configuration. This capability enables the efficient generation of realistic, controllable, and reproducible datasets without manually constructing each mixture. Common applications include training data generation for speech enhancement and source separation, audio-AI benchmarking, evaluation of hearing-device and headset algorithms, and many more.

Audio scene generation in the Treble SDK lets you define realistic audio mixtures from:

  • room acoustics (impulse responses),
  • recording material (speech, background noise, natural sounds, music, and other source recordings),
  • and a listener configuration (device, orientation, device noise specifications and filters).

You can create audio scenes through two complementary workflows:

  1. Manual scene generation
    Explicitly define each AudioScene by assigning IRs, track content, and listener configurations.

  2. Automated bulk generation
    Define reusable SceneRules and use SceneGenerator to produce a SceneCollection of randomized AudioScene instances for scalable dataset generation and benchmarking.