This guide covers all data transformation options in DeepFense, including padding, cropping, resampling, and augmentations.
DeepFense applies transforms to audio data in two stages:
- Base Transforms: Always applied (train/val/test) - preprocessing like padding, resampling
- Augmentations: Only during training (probabilistic) - data augmentation like noise, RIR, etc.
Base transforms are deterministic preprocessing steps applied to all data.
Purpose: Load audio files from disk and perform initial preprocessing
Parameters:
target_sr(int, default: 16000): Target sample rate (audio is resampled if needed)mono(bool, default: True): Convert to mono (averages channels if multi-channel)
Example:
data:
train:
base_transform:
- type: "load_audio"
args:
target_sr: 16000
mono: TrueHow to Check: Audio is automatically resampled if file SR ≠ target_sr. Set mono: False to keep stereo.
Purpose: Ensure all audio has the same length for batching
Parameters:
max_len(int, required): Target length in samples- Example:
160000= 10 seconds at 16kHz - Example:
64000= 4 seconds at 16kHz
- Example:
random_pad(bool, default: False):False: Crop from start if audio > max_lenTrue: Randomly crop (random start position) if audio > max_len
pad_type(str, default: "repeat"):"repeat": Repeat the waveform to fill length if audio < max_len- Other types: Currently only "repeat" is supported
Example:
data:
train:
base_transform:
- type: "pad"
args:
max_len: 160000 # 10 seconds at 16kHz
random_pad: True # Random crop if longer
pad_type: "repeat" # Repeat if shorterCommon Lengths:
160000samples = 10 seconds @ 16kHz64000samples = 4 seconds @ 16kHz32000samples = 2 seconds @ 16kHz
How to Change:
# For longer audio (e.g., 15 seconds)
max_len: 240000 # 15 * 16000
# For shorter audio (e.g., 5 seconds)
max_len: 80000 # 5 * 16000
# To always crop from start (no randomness)
random_pad: FalsePurpose: Randomly crop audio to fixed length (alternative to pad with random_pad: True)
Parameters:
output_size(int, required): Target length in samples
Example:
data:
train:
base_transform:
- type: "RandomCrop"
args:
output_size: 160000 # 10 seconds at 16kHzNote: RandomCrop is similar to pad with random_pad: True, but doesn't pad short audio (truncates instead).
Augmentations are probabilistic transforms applied only during training to improve robustness.
Purpose: Advanced audio augmentation suite (noise, filtering, etc.)
Parameters:
noise_ratio(float, 0.0-1.0, default: 1.0): Probability of applying augmentationalgo(int, default: 5): Algorithm variant (1-5)*(various): Additional RawBoost-specific parameters
Example:
data:
train:
augment_transform:
- type: "rawboost"
args:
noise_ratio: 0.5 # Apply 50% of the time
algo: 5Purpose: Simulate room acoustics using impulse responses
Parameters:
noise_ratio(float, 0.0-1.0): Probability of applyingcsv_file(str, required): Path to CSV file with RIR paths
Example:
data:
train:
augment_transform:
- type: "rir"
args:
noise_ratio: 0.3
csv_file: "/path/to/rir_files.csv"Purpose: Simulate audio codec compression artifacts
Parameters:
noise_ratio(float, 0.0-1.0): Probability of applying*(various): Codec-specific parameters
Example:
data:
train:
augment_transform:
- type: "codec"
args:
noise_ratio: 0.2Purpose: Add Gaussian noise to audio
Parameters:
noise_ratio(float, 0.0-1.0): Probability of applyingsnr_range(list, default: [5, 15]): Signal-to-noise ratio range [min, max] in dB
Example:
data:
train:
augment_transform:
- type: "add_noise" # or "AdditiveNoise"
args:
noise_ratio: 0.3
snr_range: [10, 20] # SNR between 10-20 dBPurpose: Vary playback speed (time stretching)
Parameters:
noise_ratio(float, 0.0-1.0): Probability of applyingspeed_range(list): Speed variation range [min, max]- Example:
[0.9, 1.1]= 90% to 110% speed
- Example:
Example:
data:
train:
augment_transform:
- type: "speed_perturb"
args:
noise_ratio: 0.5
speed_range: [0.95, 1.05]morph: Audio morphingadd_babble: Add babble noisedrop_freq: Frequency dropoutdrop_chunk: Time dropoutdo_clip: Clipping augmentation
See Augmentations Documentation for complete list.
data:
sampling_rate: 16000 # Global sample rate
train:
dataset_type: "DetectionDataset"
parquet_files: ["/path/to/train.parquet"]
# Base transforms (always applied)
base_transform:
- type: "load_audio"
args:
target_sr: 16000
mono: True
- type: "pad"
args:
max_len: 160000 # 10 seconds
random_pad: True # Random crop if longer
pad_type: "repeat" # Repeat if shorter
# Augmentations (probabilistic, training only)
augment_transform:
- type: "rawboost"
args:
noise_ratio: 0.5
algo: 5
- type: "rir"
args:
noise_ratio: 0.3
csv_file: "/path/to/rir.csv"
- type: "add_noise"
args:
noise_ratio: 0.2
snr_range: [5, 15]
- type: "speed_perturb"
args:
noise_ratio: 0.3
speed_range: [0.9, 1.1]
val:
# Validation: only base transforms, no augmentations
base_transform:
- type: "load_audio"
args:
target_sr: 16000
mono: True
- type: "pad"
args:
max_len: 160000
random_pad: False # No random crop for validation
pad_type: "repeat"
augment_transform: [] # No augmentations# View your config file
cat config/train.yaml | grep -A 20 "base_transform\|augment_transform"After training, check the saved config:
cat outputs/your_experiment/config.yaml | grep -A 20 "base_transform\|augment_transform"import yaml
with open("config/train.yaml", "r") as f:
config = yaml.safe_load(f)
# Check base transforms
print("Base Transforms:")
for transform in config["data"]["train"]["base_transform"]:
print(f" - {transform['type']}: {transform.get('args', {})}")
# Check augmentations
print("\nAugmentations:")
for aug in config["data"]["train"].get("augment_transform", []):
print(f" - {aug['type']}: {aug.get('args', {})}")base_transform:
- type: "pad"
args:
max_len: 160000 # 10 seconds @ 16kHz
random_pad: True # Random crop if > 10s
pad_type: "repeat" # Repeat if < 10sFor variable-length batches, you can skip padding and handle it in collate function (advanced):
base_transform:
- type: "load_audio"
args:
target_sr: 16000
mono: True
# No pad transform - handled in DataLoaderbase_transform:
- type: "pad"
args:
max_len: 64000 # 4 seconds @ 16kHz
random_pad: False # Always crop from start
pad_type: "repeat"base_transform:
- type: "pad"
args:
max_len: 240000 # 15 seconds @ 16kHz
random_pad: True
pad_type: "repeat"augment_transform:
- type: "rawboost"
args:
noise_ratio: 0.8 # Apply 80% of the time
- type: "rir"
args:
noise_ratio: 0.6
- type: "add_noise"
args:
noise_ratio: 0.5
snr_range: [0, 10] # Lower SNR = more noise
- type: "speed_perturb"
args:
noise_ratio: 0.4
speed_range: [0.85, 1.15] # Wider rangeaugment_transform:
- type: "add_noise"
args:
noise_ratio: 0.2 # Apply 20% of the time
snr_range: [15, 25] # Higher SNR = less noiseTransforms are applied in the order specified:
- Base transforms are applied first (in order)
- Augmentations are applied after base transforms (in order)
- Each augmentation is applied independently with its
noise_ratioprobability
Example:
base_transform:
- type: "load_audio" # 1. Load audio
- type: "pad" # 2. Pad/crop
augment_transform:
- type: "rawboost" # 3. Apply RawBoost (50% chance)
- type: "add_noise" # 4. Apply noise (30% chance, independent)Problem: "RuntimeError: Expected input batch_size (X) to match target batch_size (Y)"
Solution: Ensure all audio is padded to the same length:
base_transform:
- type: "pad"
args:
max_len: 160000 # Must match your target lengthProblem: GPU out of memory during training
Solutions:
- Reduce
max_len(shorter audio):max_len: 80000 # 5 seconds instead of 10
- Reduce
batch_size - Reduce number of augmentations
Problem: Augmentations seem to have no effect
Check:
- Verify
noise_ratio > 0.0 - Ensure augmentations are in
trainsection, notval - Check that transform is registered:
deepfense list --component-type transforms
Problem: Too much augmentation causing poor training
Solution: Reduce augmentation probabilities:
augment_transform:
- type: "rawboost"
args:
noise_ratio: 0.3 # Reduce from 0.5
- type: "add_noise"
args:
noise_ratio: 0.1 # Reduce from 0.3
snr_range: [15, 25] # Increase SNR (less noise)| Transform | Type | Purpose | Key Parameters | When Applied |
|---|---|---|---|---|
load_audio |
Base | Load & resample | target_sr, mono |
Always |
pad |
Base | Pad/crop to fixed length | max_len, random_pad, pad_type |
Always |
RandomCrop |
Base | Random crop | output_size |
Always |
rawboost |
Aug | Advanced augmentation | noise_ratio, algo |
Training (probabilistic) |
rir |
Aug | Room simulation | noise_ratio, csv_file |
Training (probabilistic) |
codec |
Aug | Codec simulation | noise_ratio |
Training (probabilistic) |
add_noise |
Aug | Add noise | noise_ratio, snr_range |
Training (probabilistic) |
speed_perturb |
Aug | Speed variation | noise_ratio, speed_range |
Training (probabilistic) |
- See Configuration Reference for all parameters
- See Augmentations Documentation for complete augmentation list
- See Adding Augmentations to create custom transforms
- See Data Preparation in README for parquet format