非死book开源VoiceLoop,一种在多个扬声器合成语音的方法

jopen 2年前
   <p> </p>    <p style="text-align:center"><img alt="非死book开源VoiceLoop,根据开放场景语音文字合成新语音" src="https://simg.open-open.com/show/181c60b9fd639296266ac248df40e051.jpg" /></p>    <p><a href="/misc/goto?guid=4959010596809650528">PyTorch通过语音循环实现了野外演讲者的语音合成中描述的方法。</a></p>    <p style="text-align:center"><img alt="非死book开源VoiceLoop,根据开放场景语音文字合成新语音" src="https://simg.open-open.com/show/054f8903fdc37c0b4913bb545cbb3649.jpg" /></p>    <p>VoiceLoop是一种神经文本到语音(TTS),能够在野外采样的语音中将文本转换为语音。 一些演示样品可以在<a href="/misc/goto?guid=4959010596904752839">这里找到</a>。</p>    <h2>快速链接</h2>    <ul>     <li><a href="/misc/goto?guid=4959010596904752839">Demo Samples</a></li>     <li><a href="/misc/goto?guid=4959010597005332521">Quick Start</a></li>     <li><a href="/misc/goto?guid=4959010597097717936">Setup</a></li>     <li><a href="/misc/goto?guid=4959010597186326568">Training</a></li>    </ul>    <h2>快速开始</h2>    <p>按照安装程序中的说明,然后简单地执行:</p>    <pre>  python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 13 --checkpoint models/vctk/bestmodel.pth</pre>    <p>结果将放在models / vctk / results中。 它将生成2个样本:</p>    <ul>     <li>The <a href="/misc/goto?guid=4959010597284644995">generated sample</a> will be saved with the gen_10.wav extension.</li>     <li>Its <a href="/misc/goto?guid=4959010597370135642">ground-truth (test) sample</a> is also generated and is saved with the orig.wav extension.</li>    </ul>    <p>You can also generate the same text but with a different speaker, specifically:</p>    <pre>  python generate.py  --npz data/vctk/numpy_features_valid/p318_212.npz --spkr 18 --checkpoint models/vctk/bestmodel.pth</pre>    <p>Which will generate the following <a href="/misc/goto?guid=4959010597467728044">sample</a>.</p>    <p>Here is the corresponding attention plot:</p>    <h2>安装</h2>    <p>Requirements: Linux/OSX, Python2.7 and <a href="/misc/goto?guid=4959010597544472141">PyTorch 0.1.12</a>. The current version of the code requires CUDA support for training. Generation can be done on the CPU.</p>    <pre>  git clone https://github.com/非死bookresearch/loop.git  cd loop  pip install -r scripts/requirements.txt</pre>    <h3>Data</h3>    <p>用于训练本文中模型的数据可以通过以下方式下载:</p>    <pre>  bash scripts/download_data.sh</pre>    <p>The script downloads and preprocesses a subset of <a href="/misc/goto?guid=4959010597633912820">VCTK</a>. This subset contains speakers with american accent.</p>    <p>The dataset was preprocessed using <a href="/misc/goto?guid=4959010597722901893">Merlin</a> - from each audio clip we extracted vocoder features using the <a href="/misc/goto?guid=4959010597807645675">WORLD</a> vocoder. After downloading, the dataset will be located under subfolder <code>data</code> as follows:</p>    <pre>  <code>loop  ├── data      └── vctk          ├── norm_info          │   ├── norm.dat          ├── numpy_feautres          │   ├── p294_001.npz          │   ├── p294_002.npz          │   └── ...          └── numpy_features_valid  </code></pre>    <p>The preprocess pipeline can be executed using the following script by Kyle Kastner:<a href="/misc/goto?guid=4959010597899575985">https://gist.github.com/kastnerkyle/cc0ac48d34860c5bb3f9112f4d9a0300</a>.</p>    <h3>预训模型</h3>    <p>Pretrainde models can be downloaded via:</p>    <pre>  bash scripts/download_models.sh</pre>    <p>After downloading, the models will be located under subfolder <code>models</code> as follows:</p>    <pre>  <code>loop  ├── data  ├── models      ├── vctk      │   ├── args.pth      │   └── bestmodel.pth      └── vctk_alt  </code></pre>    <h3>SPTK and WORLD</h3>    <p>Finally, speech generation requires <a href="/misc/goto?guid=4959010597985636440">SPTK3.9</a> and <a href="/misc/goto?guid=4959010597807645675">WORLD</a> vocoder as done in Merlin. To download the executables:</p>    <pre>  bash scripts/download_tools.sh</pre>    <p>Which results the following sub directories:</p>    <pre>  <code>loop  ├── data  ├── models  ├── tools      ├── SPTK-3.9      └── WORLD  </code></pre>    <h2>训练</h2>    <p>在vctk上训练一个新的模型,首先使用4的噪声级别和100的输入序列长度训练模型:</p>    <pre>  python train.py --expName vctk --data data/vctk --noise 4 --seq-len 100 --epochs 90</pre>    <p>然后,继续训练模型使用2的噪声水平,完整序列:</p>    <pre>  python train.py --expName vctk_noise_2 --data data/vctk --checkpoint checkpoints/vctk/bestmodel.pth --noise 2 --seq-len 1000 --epochs 90</pre>    <h2>引文</h2>    <p>如果您发现这段代码在您的研究中有用,请引用:</p>    <pre>  <code>@article{taigman2017voice,    title           = {Voice Synthesis for in-the-Wild Speakers via a Phonological Loop},    author          = {Taigman, Yaniv and Wolf, Lior and Polyak, Adam and Nachmani, Eliya},    journal         = {ArXiv e-prints},    archivePrefix   = "arXiv",    eprinttype      = {arxiv},    eprint          = {1705.03122},    primaryClass    = "cs.CL",    year            = {2017}    month           = July,  }  </code></pre>    <h2>许可</h2>    <p>Loop has a CC-BY-NC license.</p>    <p> </p>    <p>代码地址:<a href="/misc/goto?guid=4959010598092446744">https://github.com/非死bookresearch/loop</a></p>    <p>论文地址:https://arxiv.org/abs/1707.06588</p>    <p> </p>