Rendering, targets, and metadata

An AudioScene is a lightweight, declarative definition until explicitly rendered. Rendering can be initiated either from within the interactive scene plot or using the render() method. During rendering, the required IR and audio samples are retrieved, followed by application of leveling, filtering, and convolution processes. The outputs are then mixed according to the scene configuration.

After rendering, you can also play back the audio directly using the playback() method, or save the resulting audio to a WAV or HDF5 file for further use.

Render a scene

Use render() to generate the mixture signal. Set output_separated_tracks=True if you also want per-track rendered outputs.

scene_audio_signal, track_audio_signals = my_scene.render(
    sampling_rate=32000,
    output_separated_tracks=True,
)

scene_audio_signal.plot()

Extract target signals

For ML pipelines, you often need target signals aligned with the mixture. The render_target() method supports several rendering modes:

TargetRenderMode.DRY_MONO: mono, dry source signal (no room/device processing),
TargetRenderMode.WET_MONO: room-convolved mono target,
WetMonoWindowed(): the source convolved with the room IR, windowed around the direct sound (mono). The window length is configurable,
TargetRenderMode.WET_DEVICE: full device-rendered target,
WetDeviceWindowed(): same as WET_DEVICE but windowed around the direct sound. The window length is configurable.

Example:

# First select the relevant track index:
target_track_idx = my_scene.get_track_indices_by_tag(scene.GroupTag.TARGET)[0]

# Render target
target_audio_signal = my_scene.render_target(
    track_index=target_track_idx,
    sampling_rate=32000,
    render_mode=scene.TargetRenderMode.DRY_MONO,
)

Export metadata and transcripts

Each scene carries rich metadata -- source positions, receiver properties, device info, track timing, and levels. This can be exported as a JSON-serializable dictionary via to_struct() for use in downstream data pipelines.

# Metadata
scene_metadata = my_scene.to_struct()

# Transcript
scene_full_transcript = my_scene.transcript(track_index=None)
target_transcript = my_scene.transcript(track_index=target_track_idx)

Render a scene​

Extract target signals​

Export metadata and transcripts​

Render a scene

Extract target signals

Export metadata and transcripts