{ "info": { "author": "Antoine CAILLON", "author_email": "caillon@ircam.fr", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "\n\n# RAVE: Realtime Audio Variational autoEncoder\n\nOfficial implementation of _RAVE: A variational autoencoder for fast and high-quality neural audio synthesis_ ([article link](https://arxiv.org/abs/2111.05011)) by Antoine Caillon and Philippe Esling.\n\nIf you use RAVE as a part of a music performance or installation, be sure to cite either this repository or the article !\n\nIf you want to share / discuss / ask things about RAVE you can do so in our [discord server](https://discord.gg/dhX73sPTBb) !\n\n## Previous versions\n\nThe original implementation of the RAVE model can be restored using\n\n```bash\ngit checkout v1\n```\n\n## Installation\n\nInstall RAVE using\n\n```bash\npip install acids-rave\n```\n\nYou will need **ffmpeg** on your computer. You can install it locally inside your virtual environment using\n\n```bash\nconda install ffmpeg\n```\n\n\n\n## Colab\n\nA colab to train RAVEv2 is now available thanks to [hexorcismos](https://github.com/moiseshorta) !\n[](https://colab.research.google.com/drive/1ih-gv1iHEZNuGhHPvCHrleLNXvooQMvI?usp=sharing)\n\n## Usage\n\nTraining a RAVE model usually involves 3 separate steps, namely _dataset preparation_, _training_ and _export_.\n\n### Dataset preparation\n\nYou can know prepare a dataset using two methods: regular and lazy. Lazy preprocessing allows RAVE to be trained directly on the raw files (i.e. mp3, ogg), without converting them first. **Warning**: lazy dataset loading will increase your CPU load by a large margin during training, especially on Windows. This can however be useful when training on large audio corpus which would not fit on a hard drive when uncompressed. In any case, prepare your dataset using\n\n```bash\nrave preprocess --input_path /audio/folder --output_path /dataset/path (--lazy)\n```\n\n### Training\n\nRAVEv2 has many different configurations. The improved version of the v1 is called `v2`, and can therefore be trained with\n\n```bash\nrave train --config v2 --db_path /dataset/path --out_path /model/out --name give_a_name\n```\n\nWe also provide a discrete configuration, similar to SoundStream or EnCodec\n\n```bash\nrave train --config discrete ...\n```\n\nBy default, RAVE is built with non-causal convolutions. If you want to make the model causal (hence lowering the overall latency of the model), you can use the causal mode\n\n```bash\nrave train --config discrete --config causal ...\n```\n\nNew in 2.3, data augmentations are also available to improve the model's generalization in low data regimes. You can add data augmentation by adding augmentation configuration files with the `--augment` keyword\n\n```bash\nrave train --config v2 --augment mute --augment compress\n```\n\nMany other configuration files are available in `rave/configs` and can be combined. Here is a list of all the available configurations & augmentations :\n\n
Type | \nName | \nDescription | \n
---|---|---|
Architecture | \nv1 | \nOriginal continuous model | \n
v2 | \nImproved continuous model (faster, higher quality) | \n|
v2_small | \nv2 with a smaller receptive field, adpated adversarial training, and noise generator, adapted for timbre transfer for stationary signals | \n|
v2_nopqmf | \n(experimental) v2 without pqmf in generator (more efficient for bending purposes) | \n|
v3 | \nv2 with Snake activation, descript discriminator and Adaptive Instance Normalization for real style transfer | \n|
discrete | \nDiscrete model (similar to SoundStream or EnCodec) | \n|
onnx | \nNoiseless v1 configuration for onnx usage | \n|
raspberry | \nLightweight configuration compatible with realtime RaspberryPi 4 inference | \n|
Regularization (v2 only) | \ndefault | \nVariational Auto Encoder objective (ELBO) | \n
wasserstein | \nWasserstein Auto Encoder objective (MMD) | \n|
spherical | \nSpherical Auto Encoder objective | \n|
Discriminator | \nspectral_discriminator | \nUse the MultiScale discriminator from EnCodec. | \n
Others | \ncausal | \nUse causal convolutions | \n
noise | \nEnables noise synthesizer V2 | \n|
hybrid | \nEnable mel-spectrogram input | \n|
Augmentations | \nmute | \nRandomly mutes data batches (default prob : 0.1). Enforces the model to learn silence | \n
compress | \nRandomly compresses the waveform (equivalent to light non-linear amplification of batches) | \n|
gain | \nApplies a random gain to waveform (default range : [-6, 3]) | \n