Skip to main content

Bulk scene generation

Building scenes one at a time is useful for understanding the components, but for training data you typically need hundreds or thousands of varied scenes. The SceneGenerator automates this by combining a set of rules with an IRCollection to produce randomized scenes at scale.

The process has three parts:

  1. Define source groups -- each SourceGroup specifies a TrackGenerator, how many sources to allocate, which IRs to select from (via Polars filter expressions on the collection), and a priority level for IR allocation
  2. Define listener rules -- device, orientation, and optional device noise
  3. Generate -- the SceneGenerator iterates over receiver positions in the collection and produces as many valid scenes as the IR budget allows

Define scene rules and source groups

SceneRules is the top-level container. Source groups are added to it, each describing a category of sources in the scene.

Each SourceGroup declares:

  • which tracks to generate (TrackGenerator),
  • candidate IR selection (source_selection with Polars expressions),
  • source count constraints (n_sources, min_n_sources),
  • semantic grouping (group_tag),
  • and allocation priority (priority_on_ir_collection).

Example:

scene_rules = scene.SceneRules(duration_s=20)

conversation = scene.TrackGenerator(
audio_dataset=speech_dataset,
rules=scene.ConversationRules.from_preset(
scene.ConversationRulesPresets.sequential_talkers_increased_overlap,
in_track_level_range_db_spl=(69, 70),
),
talker_identifier="speaker_id",
)

scene_rules.add_source_group(
scene.SourceGroup(
name="conversation",
tracks=conversation,
source_selection=(~pl.col("IS_NOISE_SOURCE")),
n_sources=3,
min_n_sources=3,
group_tag=scene.GroupTag.TARGET,
priority_on_ir_collection=scene.GroupPriority.HIGH,
)
)

Example noise group:

hvac = scene.TrackGenerator(
audio_dataset=hvac_dataset,
rules=scene.NoiseSourceRules(
free_field_level_db_spl=(55, 56),
reuse_single_sample=True,
),
)

scene_rules.add_source_group(
scene.SourceGroup(
name="hvac_noise",
tracks=hvac,
min_n_sources=1,
source_selection=pl.col("IS_NOISE_SOURCE"),
group_tag=scene.GroupTag.BACKGROUND,
priority_on_ir_collection=scene.GroupPriority.MEDIUM,
)
)

Define listener rules

ListenerRules describes how generated scenes should be captured:

  • output device,
  • orientation or orientation ranges,
  • optional device noise/filter specifications.
device_specs = scene.DeviceSpecs(
noise_rules=scene.StaticNoiseRules.from_noise_type_and_level(
noise_type=scene.StaticNoiseType.mems_noise_profile,
level_db_spl=(30, 32),
),
)
device = tsdk.device_library.get_device_by_name("KEMAR051123_1")
scene_rules.set_listener(
scene.ListenerRules(
device=device,
orientation=scene.OrientationRange(
azimuth_range=(-180, 180),
elevation_range=(-10, 10),
roll_range=(0, 0),
),
device_specs=device_specs,
)
)

Generate scenes

The SceneGenerator takes the IR collection and the scene rules, then produces a SceneCollection. It automatically handles receiver selection, source-to-IR assignment (respecting priorities and filter constraints), and track randomization.

The max_n_scenes parameter caps the number of scenes to generate. Each generated scene in the collection can be individually plotted, rendered, or serialized—just like the manually built scene in Scene rendering and targets.

scene_generator = scene.SceneGenerator(
ir_collection=ir_collection,
scene_rules=scene_rules,
)

scene_collection = scene_generator.generate_scenes(max_n_scenes=100)

The result is a SceneCollection that supports indexing, slicing, iteration, and dataframe access:

scene_collection[0].plot()
scene_collection.head(5)

Plot scene from collection

Each item in the collection is a standard AudioScene, so you can:

  • render the full mixture,
  • render target variants,
  • export metadata with to_struct(),
  • and extract transcripts.