bash utility tools¶

ESPnet provides several command-line bash tools under utils/

convert_fbank.sh¶

Usage: convert_fbank.sh [options] <data-dir> [<log-dir> [<fbank-dir>] ]
e.g.: convert_fbank.sh data/train exp/griffin_lim/train wav
Note: <log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
  --nj <nj>                  # number of parallel jobs
  --fs <fs>                  # sampling rate
  --fmax <fmax>              # maximum frequency
  --fmin <fmin>              # minimum frequency
  --n_fft <n_fft>            # number of FFT points (default=1024)
  --n_shift <n_shift>        # shift size in point (default=256)
  --win_length <win_length>  # window length in point (default=)
  --n_mels <n_mels>          # number of mel basis (default=80)
  --iters <iters>            # number of Griffin-lim iterations (default=64)
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.

data2json.sh¶

Usage: data2json.sh <data-dir> <dict>
e.g. data2json.sh data/train data/lang_1char/train_units.txt
Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
  --feat <feat-scp>                                # feat.scp
  --oov <oov-word>                                 # Default: <unk>
  --out <outputfile>                               # If omitted, write in stdout
  --filetype <mat|hdf5|sound.hdf5>                 # Specify the format of feats file
  --preprocess-conf <json>                         # Apply preprocess to feats when creating shape.scp
  --verbose <num>                                  # Default: 0

download_from_google_drive.sh¶

Usage: download_from_google_drive.sh <share-url> [<download_dir> <file_ext>]
e.g.: download_from_google_drive.sh https://drive.google.com/open?id=1zF88bRNbJhw9hNBq3NrDg8vnGGibREmg downloads zip
Options:
    <download_dir>: directory to save downloaded file. (Default=downloads)
    <file_ext>: file extension of the file to be downloaded. (Default=zip)

dump.sh¶

Usage: dump.sh <scp> <cmvnark> <logdir> <dumpdir>

dump_pcm.sh¶

Usage: dump_pcm.sh [options] <data-dir> [<log-dir> [<pcm-dir>] ]
e.g.: dump_pcm.sh data/train exp/dump_pcm/train pcm
Note: <log-dir> defaults to <data-dir>/log, and <pcm-dir> defaults to <data-dir>/data
Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
  --write-utt2num-frames <true|false>     # If true, write utt2num_frames file.
  --filetype <mat|hdf5|sound.hdf5>                 # Specify the format of feats file

eval_source_separation.sh¶

Usage: eval_source_separation.sh reffiles enffiles <dir>
    e.g. eval_source_separation.sh reference.scp enhanced.scp outdir

And also supporting multiple sources:
    e.g. eval_source_separation.sh "ref1.scp,ref2.scp" "enh1.scp,enh2.scp" outdir

Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.

feat_to_shape.sh¶

Usage: feat_to_shape.sh [options] <input-scp> <output-scp> [<log-dir>]
e.g.: feat_to_shape.sh data/train/feats.scp data/train/shape.scp data/train/log
Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
  --filetype <mat|hdf5|sound.hdf5>                 # Specify the format of feats file
  --preprocess-conf <json>                         # Apply preprocess to feats when creating shape.scp
  --verbose <num>                                  # Default: 0

generate_wav.sh¶

Usage:
  generate_wav.sh [options] <model-path> <data-dir> [<log-dir> [<fbank-dir>] ]
Example:
  generate_wav.sh ljspeech.wavenet.ns.v1/checkpoint-1000000.pkl data/train exp/wavenet_vocoder/train wav
Note:
  <log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
  --nj <nj>             # number of parallel jobs
  --fs <fs>             # sampling rate (default=22050)
  --n_fft <n_fft>       # number of FFT points (default=1024)
  --n_shift <n_shift>   # shift size in point (default=256)
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.

make_fbank.sh¶

Usage: make_fbank.sh [options] <data-dir> [<log-dir> [<fbank-dir>] ]
e.g.: make_fbank.sh data/train exp/make_fbank/train mfcc
Note: <log-dir> defaults to <data-dir>/log, and <fbank-dir> defaults to <data-dir>/data
Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
  --filetype <mat|hdf5|sound.hdf5>                 # Specify the format of feats file

make_stft.sh¶

Usage: make_stft.sh [options] <data-dir> [<log-dir> [<stft-dir>] ]
e.g.: make_stft.sh data/train exp/make_stft/train stft
Note: <log-dir> defaults to <data-dir>/log, and <stft-dir> defaults to <data-dir>/data
Options:
  --nj <nj>                                        # number of parallel jobs
  --cmd (utils/run.pl|utils/queue.pl <queue opts>) # how to run jobs.
  --filetype <mat|hdf5|sound.hdf5>                 # Specify the format of feats file

pack_model.sh¶

Usage: pack_model.sh <tr_conf> <dec_conf> <cmvn> <e2e>, for example:
<tr_conf>:  conf/train.yaml
<dec_conf>: conf/decode.yaml
<cmvn>:     data/tr_it/cmvn.ark
<e2e>:      exp/tr_it_pytorch_train/results/model.last10.avg.best

recog_wav.sh¶

Usage:
    recog_wav.sh [options] <wav_file>

Options:
    --backend <chainer|pytorch>     # chainer or pytorch (Default: pytorch)
    --ngpu <ngpu>                   # Number of GPUs (Default: 0)
    --decode_dir <directory_name>   # Name of directory to store decoding temporary data
    --models <model_name>           # Model name (e.g. tedlium2.tacotron2.v1)
    --cmvn <path>                   # Location of cmvn.ark
    --lang_model <path>             # Location of language model
    --recog_model <path>            # Location of E2E model
    --decode_config <path>          # Location of configuration file
    --api <api_version>             # API version (v1 or v2, available in only pytorch backend)

Example:
    # Record audio from microphone input as example.wav
    rec -c 1 -r 16000 example.wav trim 0 5

    # Decode using model name
    recog_wav.sh --models tedlium2.rnn.v1 example.wav

    # Decode using model file
    recog_wav.sh --cmvn cmvn.ark --lang_model rnnlm.model.best --recog_model model.acc.best --decode_config conf/decode.yaml example.wav

Available models:
    - tedlium2.rnn.v1
    - tedlium2.transformer.v1
    - tedlium3.transformer.v1
    - librispeech.transformer.v1
    - commonvoice.transformer.v1

reduce_data_dir.sh¶

usage: reduce_data_dir.sh srcdir turnlist destdir

remove_longshortdata.sh¶

usage: remove_longshortdata.sh olddatadir newdatadir

score_bleu.sh¶

No help found.

score_sclite.sh¶

Usage: score_sclite.sh <data-dir> <dict>

show_result.sh¶

No help found.

synth_wav.sh¶

Usage:
    $ synth_wav.sh <text>

Example:
    # make text file and then generate it
    echo "This is a demonstration of text to speech." > example.txt
    synth_wav.sh example.txt

    # you can specify the pretrained models
    synth_wav.sh --models ljspeech.tacotron2.v3 example.txt

    # if you want to try wavenet vocoder, extend stage
    synth_wav.sh --models ljspeech.tacotron2.v3 --stop_stage 4 example.txt

    # also you can specify vocoder model
    synth_wav.sh --models ljspeech.tacotron2.v3 --vocoder_models ljspeech.wavenet.ns.v1.1000k_iters --stop_stage 4 example.txt

Available models:
    - libritts.tacotron2.v1
    - ljspeech.tacotron2.v1
    - ljspeech.tacotron2.v2
    - ljspeech.tacotron2.v3
    - ljspeech.transformer.v1
    - ljspeech.transformer.v2
    - ljspeech.fastspeech.v1
    - ljspeech.fastspeech.v2
    - libritts.transformer.v1

Available vocoder models:
    - ljspeech.wavenet.ns.v1.100k_iters
    - ljspeech.wavenet.ns.v1.1000k_iters

spm_decode¶

usage: spm_decode [-h] --model MODEL [--input INPUT]
                  [--input_format {piece,id}]

optional arguments:
  --model MODEL         sentencepiece model to use for decoding
  --input INPUT         input file to decode
  --input_format {piece,id}

spm_encode¶

usage: spm_encode [-h] --model MODEL [--inputs INPUTS [INPUTS ...]]
                  [--outputs OUTPUTS [OUTPUTS ...]]
                  [--output_format {piece,id}] [--min-len N] [--max-len N]

optional arguments:
  --model MODEL         sentencepiece model to use for encoding
  --inputs INPUTS [INPUTS ...]
                        input files to filter/encode
  --outputs OUTPUTS [OUTPUTS ...]
                        path to save encoded outputs
  --output_format {piece,id}
  --min-len N           filter sentence pairs with fewer than N tokens
  --max-len N           filter sentence pairs with more than N tokens