add JVS and VCTK models

Browse files

Files changed (6) hide show

models/tts/tungnaa_117_jvs.ckpt +3 -0
models/tts/tungnaa_117_jvs.md +45 -0
models/tts/tungnaa_119_vctk.ckpt +3 -0
models/tts/tungnaa_119_vctk.md +46 -0
models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts +3 -0
models/vocoder/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts +3 -0

models/tts/tungnaa_117_jvs.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2e30ae3127807958238196228b76d493de4f5f8483364b15161afddc43eaa80f
+size 1711942430

models/tts/tungnaa_117_jvs.md ADDED Viewed

	@@ -0,0 +1,45 @@

+---
+block_size: 2048
+sample_rate: 44100
+latent_size: 12
+vocoder: "042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts"
+dataset: "John Van Stan (LibriTTS)"
+vocoder_type: "RAVE"
+alignment_type: "DCA"
+likelihood_type: "NSF"
+text_encoder_type: "CANINE"
+---
+# tungnaa_116_jvs
+### dimensions
+block size: 2048
+sample rate: 44100
+latent size: 12
+### dataset
+JVS (Hi-Fi TTS speaker 9017)
+### vocoder
+`models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts`
+### training
+tungnaa commit  `09ecdcd532eac3d454a8b4e28e896bca5bccbf9f`
+```bash
+tungnaa trainer --experiment 117-jvs-e2emulti-mask-ends --model-dir /data/users/victor/ivoice-models --log-dir /data/users/victor/ivoice-logs --manifest /data/users/victor/tmp/ivoice_prep_100m_0abe_multi/9017_manifest_clean_train.json --rave-model /data/users/victor/rave-v2/runs/042-jvs-100m-xfermulti_0abe2b072b/version_0/checkpoints/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts --lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 --device cuda:0 train
+```
+### notes
+trained with full JVS dataset, no annotations.
+uses a 12-dimensional vocoder trained with a subset of JVS, fine tuned from a multivoice model.
+this model uses a neural spline flow likelihood.

models/tts/tungnaa_119_vctk.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8f83755dfa999b03881a1a4386cad9d5ad3c89453c925ec020bba2a602906165
+size 1711642462

models/tts/tungnaa_119_vctk.md ADDED Viewed

	@@ -0,0 +1,46 @@

+---
+block_size: 2048
+sample_rate: 48000
+latent_size: 11
+vocoder: "046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts"
+dataset: "VCTK"
+vocoder_type: "RAVE"
+alignment_type: "DCA"
+likelihood_type: "NSF"
+text_encoder_type: "CANINE"
+---
+# tungnaa_119_vctk
+### dimensions
+block size: 2048
+sample rate: 48000
+latent size: 11
+### dataset
+VCTK
+### vocoder
+`models/vocoder/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts`
+### training
+```bash
+tungnaa prep --datasets '{kind:"vctk", path:"/data/datasets/VCTK"}' --rave-path /data/users/victor/rave-v2/runs/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc/version_0/checkpoints/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts --out-path /data/users/victor/tmp/ivoice_prep_824a/
+tungnaa trainer --experiment 119-vctk --model-dir /data/users/victor/ivoice-models --log-dir /data/users/victor/ivoice-logs --manifest /data/users/victor/tmp/ivoice_prep_824a/vctk.json --concat-speakers 2 --speaker-annotate --device cuda:1 --batch-size 32 --rave-model /data/users/victor/rave-v2/runs/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc/version_0/checkpoints/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts --lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 train
+```
+### notes
+trained with concatation of utterance pairs plus speaker annotations. example syntax: `[p225] this is an utterance. [p330] this is another.`
+uses a multi-dataset vocoder which was *not* fine tuned to only VCTK, so it should have a lot of play in the latent biases.
+this model uses a neural spline flow likelihood.

models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ab33a3050e7269b20b455438811662231a42a825648400025e57720c39061ee
+size 149351311

models/vocoder/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5a1b28220e41a9148147286ea80f7286c8c249862639ba985f776492d4631845
+size 150205512