no-op-ul-se commited on
Commit
9a604e2
·
1 Parent(s): 97eec03

add JVS and VCTK models

Browse files
models/tts/tungnaa_117_jvs.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e30ae3127807958238196228b76d493de4f5f8483364b15161afddc43eaa80f
3
+ size 1711942430
models/tts/tungnaa_117_jvs.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ block_size: 2048
3
+ sample_rate: 44100
4
+ latent_size: 12
5
+ vocoder: "042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts"
6
+ dataset: "John Van Stan (LibriTTS)"
7
+ vocoder_type: "RAVE"
8
+ alignment_type: "DCA"
9
+ likelihood_type: "NSF"
10
+ text_encoder_type: "CANINE"
11
+ ---
12
+
13
+ # tungnaa_116_jvs
14
+
15
+ ### dimensions
16
+
17
+ block size: 2048
18
+
19
+ sample rate: 44100
20
+
21
+ latent size: 12
22
+
23
+ ### dataset
24
+
25
+ JVS (Hi-Fi TTS speaker 9017)
26
+
27
+ ### vocoder
28
+
29
+ `models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts`
30
+
31
+ ### training
32
+
33
+ tungnaa commit `09ecdcd532eac3d454a8b4e28e896bca5bccbf9f`
34
+
35
+ ```bash
36
+ tungnaa trainer --experiment 117-jvs-e2emulti-mask-ends --model-dir /data/users/victor/ivoice-models --log-dir /data/users/victor/ivoice-logs --manifest /data/users/victor/tmp/ivoice_prep_100m_0abe_multi/9017_manifest_clean_train.json --rave-model /data/users/victor/rave-v2/runs/042-jvs-100m-xfermulti_0abe2b072b/version_0/checkpoints/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts --lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 --device cuda:0 train
37
+ ```
38
+
39
+ ### notes
40
+
41
+ trained with full JVS dataset, no annotations.
42
+
43
+ uses a 12-dimensional vocoder trained with a subset of JVS, fine tuned from a multivoice model.
44
+
45
+ this model uses a neural spline flow likelihood.
models/tts/tungnaa_119_vctk.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f83755dfa999b03881a1a4386cad9d5ad3c89453c925ec020bba2a602906165
3
+ size 1711642462
models/tts/tungnaa_119_vctk.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ block_size: 2048
3
+ sample_rate: 48000
4
+ latent_size: 11
5
+ vocoder: "046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts"
6
+ dataset: "VCTK"
7
+ vocoder_type: "RAVE"
8
+ alignment_type: "DCA"
9
+ likelihood_type: "NSF"
10
+ text_encoder_type: "CANINE"
11
+ ---
12
+
13
+ # tungnaa_119_vctk
14
+
15
+ ### dimensions
16
+
17
+ block size: 2048
18
+
19
+ sample rate: 48000
20
+
21
+ latent size: 11
22
+
23
+ ### dataset
24
+
25
+ VCTK
26
+
27
+ ### vocoder
28
+
29
+ `models/vocoder/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts`
30
+
31
+
32
+ ### training
33
+
34
+ ```bash
35
+ tungnaa prep --datasets '{kind:"vctk", path:"/data/datasets/VCTK"}' --rave-path /data/users/victor/rave-v2/runs/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc/version_0/checkpoints/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts --out-path /data/users/victor/tmp/ivoice_prep_824a/
36
+
37
+ tungnaa trainer --experiment 119-vctk --model-dir /data/users/victor/ivoice-models --log-dir /data/users/victor/ivoice-logs --manifest /data/users/victor/tmp/ivoice_prep_824a/vctk.json --concat-speakers 2 --speaker-annotate --device cuda:1 --batch-size 32 --rave-model /data/users/victor/rave-v2/runs/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc/version_0/checkpoints/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts --lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 train
38
+ ```
39
+
40
+ ### notes
41
+
42
+ trained with concatation of utterance pairs plus speaker annotations. example syntax: `[p225] this is an utterance. [p330] this is another.`
43
+
44
+ uses a multi-dataset vocoder which was *not* fine tuned to only VCTK, so it should have a lot of play in the latent biases.
45
+
46
+ this model uses a neural spline flow likelihood.
models/vocoder/042-jvs-100m-xfermulti_0abe2b072b_streaming_norm.ts ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ab33a3050e7269b20b455438811662231a42a825648400025e57720c39061ee
3
+ size 149351311
models/vocoder/046-multivoice-2048-48k-vlobeta-specdis-noise_824a15d4dc_streaming_norm.ts ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5a1b28220e41a9148147286ea80f7286c8c249862639ba985f776492d4631845
3
+ size 150205512