This GAN demonstrates the implementation of a Generative Adversarial Network (GAN) for generating audio samples. It utilizes PyTorch, a powerful library for deep learning, to train both generator and discriminator models capable of producing and evaluating audio data, respectively.
I wanted to generate novel drum samples for producing music. Synths will often feature a randomization feature for obtaining unexpected results. Using deep learning to randomly generate new sounds may be inspiring and useful for music creation.
To leverage GPU acceleration for faster training times, follow the instructions provided in this detailed guide.
-
Create a Conda Environment
PyTorch requires Python version 3.7 or above. Create a new conda environment named
simple-ganwith Python 3.8:conda create -n simple-gan python=3.8
-
Activate the Conda Environment
Activate the newly created environment:
conda activate simple-gan- Install PyTorch
Visit the PyTorch website to get the installation command tailored to your platform, package manager, and CUDA version. The command for conda and pip installations typically look as follows:
- Conda:
conda install pytorch torchvision torchaudio pytorch-cuda=[CUDA_VERSION] -c pytorch -c nvidia- Pip:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu[CUDA_VERSION]Replace [CUDA_VERSION] with the version compatible with your GPU drivers. Use nvcc --version to check the CUDA version installed on your system.
Note: nvidia-smi shows the GPU driver version, which is different from the CUDA runtime version. It's essential to install the PyTorch version corresponding to the CUDA runtime version for compatibility.
- Install Other Dependencies
conda install -c conda-forge librosaTo check if PyTorch can access your GPU:
import torch
print(torch.cuda.is_available())This command should return True if a GPU is detected.
If you encounter issues with GPU detection:
print(torch.zeros(1).cuda())This will attempt to create a tensor on the GPU and may provide useful error messages for troubleshooting.
If you need to start over:
conda activate base
conda remove -n simple-gan --allAfter successful setup, you can use the following commands to get information about the GPUs:
print(torch.cuda.current_device()) # The ID of the current GPU.
print(torch.cuda.get_device_name(0)) # The name of the first GPU.
print(torch.cuda.device_count()) # The number of GPUs available.This project includes Python scripts and modules for training the GAN, along with a dataset module for loading and processing audio files. Here's a brief overview:
train.py: Main script for training the GAN.modules/: Contains the Python modules for the generator (generator.py), discriminator (discriminator.py), and audio dataset (AudioDataset.py).
To train the GAN, ensure you are in the project's root directory and activate the simple-gan environment. Then run:
python train.pyDetails about the generator and discriminator architectures are provided in the modules/ directory.
- Generator: Uses transposed convolutional layers to generate audio samples from noise.
- Discriminator: Consists of convolutional layers to classify audio samples as real or fake.
The 'AudioDataset' class in AudioDataset.py handles loading and preprocessing of audio files for training. I utilized the training set downloadable from FSDKaggle2018 for our experiments. However, users can experiment with any audio data they want, as long as it's organized like this: /data/train/*.wav so that the AudioDataset class can properly connect to it.
For background information on GANs and their applications in audio generation, this article provides a comprehensive introduction.
If you'd like to contribute, please fork the repository and open a pull request to the main branch.