Skip to content

Player124413/rvc-data-prep

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RVC Data Prep: An Open-Source RVC Data Preparation Tool

a Dubverse Black initiative

Open In Colab Discord Shield


Description

RVC Data Prep is an advanced tool for transforming audio/video content into isolated vocals. If a video contains multiple speakers, it will generate separate files for each one. The core functionality leverages Facebook's Demucs to isolate vocals and Pyannote embeddings to ideally identify and differentiate speakers.

Features

  1. Isolate vocals from YouTube videos
  2. Distinguish multiple speakers and provide separate files
  3. Trim silences greated than 300ms from the audio
  4. (Beta) Separate multi-singer Acapellas

Prerequisites

Before you start using this tool, ensure that you have the following installed:

How to use

Clone the repository

git clone https://github.com/dubverse-ai/rvc-data-prep.git

Change working directory, install dependencies and import the utils.py script

cd rvc-data-prep
pip install -r requirements.txt

The clean function in utils.py provides automatic processing of a given file (wav, mp3 and flac only). You need to specify different parameters depending on your needs. Parameters:

  • local (bool): Set this to True if you intend to give a file locally; False if you intend to create a dataset from a YouTube link.

  • file_path (str): This should be either a local path or a YouTube URL file depending on what you set local to be.

  • project_name (str): This will be the name of the project which the processed file will be saved under.

  • acapella_output (bool): (BETA) If this is True, the function insert blank audio segments while separating and segregrating speakers. The output files will add up in the time domain to create the original file.

  • single_speaker_file (bool): If True, this will flag the file as having a single speaker.

  • token (str): It is client secret key or token of your Hugging Face account. You would only need this if you're working with files involving multiple speakers. You can leave this blank in that case.

Here is an example to use the clean function:

from utils import clean

clean(local=False, 
      file_path="https://www.youtube.com/watch?v=someVideoId", 
      project_name="myProject", 
      acapella_output=True, 
      token="yourToken", 
      single_speaker_file=False)

In this example, we are providing a YouTube video url to file_path, setting the project_name as "myProject", and requesting for an acapella output by setting acapella_output to True. We indicate there may be more than one speaker by setting single_speaker_file to False, and pass our account token as token.

YouTube Tutorial

YOUTUBE TUTORIAL

Examples

Input Video Separated Files
Shahrukh Khan's Speech Vocals
Yeh Ladka Haaye Allah - Bollywood Song Udit Narayan's Vocals, Alka Yagnik's Vocals, Chorous, Other ambigous sounds
Perfect - Ed Sheeran Duet Ed Sheeran's Vocals, Beyonce's Vocals

Known Issues

  • Messes up when there are multiple people speaking at the same time
  • When using acapella = True, this sometimes skips some audio segments which makes it hard to sync manually.

Contributing

We welcome contributions from anyone and everyone. Details about how to contribute, what we are looking for and how to get started can be found in our contributing guidelines.

Support

For any issues, queries, and suggestions, join our Discord server. Will be glad to help!

Future Scope

  • Add multispeaker Acapella support
  • Integrate this in the RVC workflow - base data preparation and creating AI covers
  • Improve the efficiencies of speaker identification using other models like Titanet

About Us

We, at Dubverse.ai, are a dedicated and passionate group of developers who have been working for over three years on generative AI with a specific emphasis on audio. We deeply believe in the potential of AI to revolutionize the fields of video, voiceover, podcasts and other media-related applications.

Our passion and dedication don't stop at development. We believe in sharing knowledge and nurturing a community of like-minded enthusiasts. That's why we maintain a deep tech blog where we talk about our latest research, development, trends in the field, and insights about generative AI and audio technologies.

Check out some of our RVC blog posts:

  1. Evals are all we need
  2. Running RVC Models on the Easy GUI

We are always open to hear from others who share our passion. Whether you're an expert in the field, a hobbyist, or just someone intrigued by AI and audio, feel free to reach out and connect with us.

License

RVC Data Prep is licensed under the MIT License - see the LICENSE file for details

Disclaimer: This repo is not affiliated with YouTube, Facebook AI Research, or Pyannote. All trademarks referred to are the property of their respective owners.

Acknowledgements

  1. FaceBook Demucs, Pyannote Audio, Librosa, FFMPEG, and other audio related libraries.
  2. The Dubverse Black Discord and the AI Hub Discord for quick and actionable feedback.

We value your feedback and encourage you to provide us with any suggestions or issues that you may encounter. Let's make this tool better together!

About

extract and isolate vocals from media files. supports multispeaker media as well.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%