{ "info": { "author": "Antoine CAILLON", "author_email": "caillon@ircam.fr", "bugtrack_url": null, "classifiers": [ "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3" ], "description": "# MSPrior\n\n### A multi(scale/stream) prior model for realtime temporal learning\n\n## Disclaimer\n\nThis is an experimental project that *will* be subject to lots of changes.\n\n## Installation\n\n```bash\npip install acids-msprior\n```\n\n## Usage\n\nMSPrior assumes you have\n\n1. A pretrained RAVE model exported **without streaming** as a torchscript `.ts` file\n2. The dataset on which RAVE has been trained (a folder of audio files).\n\n### 1. Preprocessing\n\nMSPrior operates on the latent representation yielded by RAVE. Therefore, we start by encoding the entirety of the audio dataset into a latent dataset. \n\n```bash\nmsprior preprocess --audio /path/to/audio/folder --out_path /path/to/output/folder --rave /path/to/pretrained/rave.ts\n```\n\n### 2. Training\n\nMSPrior has several possible configurations. The default is a ALiBi-Transformer with a skip prediction backend, which can run in realtime on powerful computers (e.g. Apple M1-2 chips, GPU enabled Linux stations). A less demanding configuration is a large GRU. Both configurations can launched be using\n\n```bash\nmsprior train --config configuration --db_path /path/to/preprocessed/dataset --name training_name --pretrained_embedding /path/to/pretrained/rave.ts\n```\n\nHere are the different configurations available\n\n\n
Name | \nDescription | \n
---|---|
decoder_only | \nUnconditional autoregressive models, relying solely on previous samples to produce a prediction. The recurrent mode uses a Gated Recurrent Unit instead of a Transformer, suitable for small datasets and lower computational requirements. | \n
recurrent | \n|
encoder_decoder | \nEncoder / decoder autoregressive mode, where the generation process is conditioned by an external input (aka seq2seq). The continuous version is based on continuous features instead of a discrete token sequence. | \n
encoder_decoder_continuous | \n