bash utility tools¶
ESPnet provides several command-line bash tools under utils/
convert_fbank.sh¶
Usage: convert_fbank.sh [options] <data-dir> [<log-dir> [<fbank-dir>] ]
e.g.: convert_fbank.sh data/train exp/griffin_lim/train wav
Note: <log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
--nj <nj> # number of parallel jobs
--fs <fs> # sampling rate
--fmax <fmax> # maximum frequency
--fmin <fmin> # minimum frequency
--n_fft <n_fft> # number of FFT points (default=1024)
--n_shift <n_shift> # shift size in point (default=256)
--win_length <win_length> # window length in point (default=)
--n_mels <n_mels> # number of mel basis (default=80)
--iters <iters> # number of Griffin-lim iterations (default=64)
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
data2json.sh¶
Usage: data2json.sh <data-dir> <dict>
e.g. data2json.sh data/train data/lang_1char/train_units.txt
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
--feat <feat-scp> # feat.scp
--oov <oov-word> # Default: <unk>
--out <outputfile> # If omitted, write in stdout
--filetype <mat|hdf5|sound.hdf5> # Specify the format of feats file
--preprocess-conf <json> # Apply preprocess to feats when creating shape.scp
--verbose <num> # Default: 0
download_from_google_drive.sh¶
Usage: download_from_google_drive.sh <share-url> [<download_dir> <file_ext>]
e.g.: download_from_google_drive.sh https://drive.google.com/open?id=1zF88bRNbJhw9hNBq3NrDg8vnGGibREmg downloads zip
Options:
<download_dir>: directory to save downloaded file. (Default=downloads)
<file_ext>: file extension of the file to be downloaded. (Default=zip)
dump.sh¶
Usage: dump.sh <scp> <cmvnark> <logdir> <dumpdir>
dump_pcm.sh¶
Usage: dump_pcm.sh [options] <data-dir> [<log-dir> [<pcm-dir>] ]
e.g.: dump_pcm.sh data/train exp/dump_pcm/train pcm
Note: <log-dir> defaults to <data-dir>/log, and <pcm-dir> defaults to <data-dir>/data
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
--write-utt2num-frames <true|false> # If true, write utt2num_frames file.
--filetype <mat|hdf5|sound.hdf5> # Specify the format of feats file
eval_source_separation.sh¶
Usage: eval_source_separation.sh reffiles enffiles <dir>
e.g. eval_source_separation.sh reference.scp enhanced.scp outdir
And also supporting multiple sources:
e.g. eval_source_separation.sh "ref1.scp,ref2.scp" "enh1.scp,enh2.scp" outdir
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
feat_to_shape.sh¶
Usage: feat_to_shape.sh [options] <input-scp> <output-scp> [<log-dir>]
e.g.: feat_to_shape.sh data/train/feats.scp data/train/shape.scp data/train/log
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
--filetype <mat|hdf5|sound.hdf5> # Specify the format of feats file
--preprocess-conf <json> # Apply preprocess to feats when creating shape.scp
--verbose <num> # Default: 0
generate_wav.sh¶
Usage:
generate_wav.sh [options] <model-path> <data-dir> [<log-dir> [<fbank-dir>] ]
Example:
generate_wav.sh ljspeech.wavenet.ns.v1/checkpoint-1000000.pkl data/train exp/wavenet_vocoder/train wav
Note:
<log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
--nj <nj> # number of parallel jobs
--fs <fs> # sampling rate (default=22050)
--n_fft <n_fft> # number of FFT points (default=1024)
--n_shift <n_shift> # shift size in point (default=256)
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
make_fbank.sh¶
Usage: make_fbank.sh [options] <data-dir> [<log-dir> [<fbank-dir>] ]
e.g.: make_fbank.sh data/train exp/make_fbank/train mfcc
Note: <log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
--filetype <mat|hdf5|sound.hdf5> # Specify the format of feats file
make_stft.sh¶
Usage: make_stft.sh [options] <data-dir> [<log-dir> [<stft-dir>] ]
e.g.: make_stft.sh data/train exp/make_stft/train stft
Note: <log-dir> defaults to <data-dir>/log, and <stft-dir> defaults to <data-dir>/data
Options:
--nj <nj> # number of parallel jobs
--cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
--filetype <mat|hdf5|sound.hdf5> # Specify the format of feats file
pack_model.sh¶
Usage: pack_model.sh <tr_conf> <dec_conf> <cmvn> <e2e>, for example:
<tr_conf>: conf/train.yaml
<dec_conf>: conf/decode.yaml
<cmvn>: data/tr_it/cmvn.ark
<e2e>: exp/tr_it_pytorch_train/results/model.last10.avg.best
recog_wav.sh¶
Usage:
recog_wav.sh [options] <wav_file>
Options:
--backend <chainer|pytorch> # chainer or pytorch (Default: pytorch)
--ngpu <ngpu> # Number of GPUs (Default: 0)
--decode_dir <directory_name> # Name of directory to store decoding temporary data
--models <model_name> # Model name (e.g. tedlium2.tacotron2.v1)
--cmvn <path> # Location of cmvn.ark
--lang_model <path> # Location of language model
--recog_model <path> # Location of E2E model
--decode_config <path> # Location of configuration file
--api <api_version> # API version (v1 or v2, available in only pytorch backend)
Example:
# Record audio from microphone input as example.wav
rec -c 1 -r 16000 example.wav trim 0 5
# Decode using model name
recog_wav.sh --models tedlium2.rnn.v1 example.wav
# Decode using model file
recog_wav.sh --cmvn cmvn.ark --lang_model rnnlm.model.best --recog_model model.acc.best --decode_config conf/decode.yaml example.wav
Available models:
- tedlium2.rnn.v1
- tedlium2.transformer.v1
- tedlium3.transformer.v1
- librispeech.transformer.v1
- commonvoice.transformer.v1
reduce_data_dir.sh¶
usage: reduce_data_dir.sh srcdir turnlist destdir
remove_longshortdata.sh¶
usage: remove_longshortdata.sh olddatadir newdatadir
score_bleu.sh¶
No help found.
score_sclite.sh¶
Usage: score_sclite.sh <data-dir> <dict>
show_result.sh¶
No help found.
synth_wav.sh¶
Usage:
$ synth_wav.sh <text>
Example:
# make text file and then generate it
echo "This is a demonstration of text to speech." > example.txt
synth_wav.sh example.txt
# you can specify the pretrained models
synth_wav.sh --models ljspeech.tacotron2.v3 example.txt
# if you want to try wavenet vocoder, extend stage
synth_wav.sh --models ljspeech.tacotron2.v3 --stop_stage 4 example.txt
# also you can specify vocoder model
synth_wav.sh --models ljspeech.tacotron2.v3 --vocoder_models ljspeech.wavenet.ns.v1.1000k_iters --stop_stage 4 example.txt
Available models:
- libritts.tacotron2.v1
- ljspeech.tacotron2.v1
- ljspeech.tacotron2.v2
- ljspeech.tacotron2.v3
- ljspeech.transformer.v1
- ljspeech.transformer.v2
- ljspeech.fastspeech.v1
- ljspeech.fastspeech.v2
- libritts.transformer.v1
Available vocoder models:
- ljspeech.wavenet.ns.v1.100k_iters
- ljspeech.wavenet.ns.v1.1000k_iters
spm_decode¶
usage: spm_decode [-h] --model MODEL [--input INPUT]
[--input_format {piece,id}]
optional arguments:
--model MODEL sentencepiece model to use for decoding
--input INPUT input file to decode
--input_format {piece,id}
spm_encode¶
usage: spm_encode [-h] --model MODEL [--inputs INPUTS [INPUTS ...]]
[--outputs OUTPUTS [OUTPUTS ...]]
[--output_format {piece,id}] [--min-len N] [--max-len N]
optional arguments:
--model MODEL sentencepiece model to use for encoding
--inputs INPUTS [INPUTS ...]
input files to filter/encode
--outputs OUTPUTS [OUTPUTS ...]
path to save encoded outputs
--output_format {piece,id}
--min-len N filter sentence pairs with fewer than N tokens
--max-len N filter sentence pairs with more than N tokens