espnet2.bin package¶

espnet2.bin.init¶

espnet2.bin.aggregate_stats_dirs¶

espnet2.bin.aggregate_stats_dirs.aggregate_stats_dirs(input_dir: Iterable[Union[str, pathlib.Path]], output_dir: Union[str, pathlib.Path], log_level: str, skip_sum_stats: bool)[source]¶

espnet2.bin.aggregate_stats_dirs.get_parser() → argparse.ArgumentParser[source]¶

espnet2.bin.aggregate_stats_dirs.main(cmd=None)[source]¶

espnet2.bin.asr_align¶

espnet2.bin.asr_inference¶

class espnet2.bin.asr_inference.Speech2Text(asr_train_config: Union[pathlib.Path, str] = None, asr_model_file: Union[pathlib.Path, str] = None, transducer_conf: dict = None, lm_train_config: Union[pathlib.Path, str] = None, lm_file: Union[pathlib.Path, str] = None, ngram_scorer: str = 'full', ngram_file: Union[pathlib.Path, str] = None, token_type: str = None, bpemodel: str = None, device: str = 'cpu', maxlenratio: float = 0.0, minlenratio: float = 0.0, batch_size: int = 1, dtype: str = 'float32', beam_size: int = 20, ctc_weight: float = 0.5, lm_weight: float = 1.0, ngram_weight: float = 0.9, penalty: float = 0.0, nbest: int = 1, streaming: bool = False)[source]¶

Bases: object

Speech2Text class

Examples

>>> import soundfile
>>> speech2text = Speech2Text("asr_config.yml", "asr.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> speech2text(audio)
[(text, token, token_int, hypothesis object), ...]

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build Speech2Text instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: Speech2Text instance.
Return type: Speech2Text

espnet2.bin.asr_inference.get_parser()[source]¶

espnet2.bin.asr_inference.inference(output_dir: str, maxlenratio: float, minlenratio: float, batch_size: int, dtype: str, beam_size: int, ngpu: int, seed: int, ctc_weight: float, lm_weight: float, ngram_weight: float, penalty: float, nbest: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], asr_train_config: Optional[str], asr_model_file: Optional[str], lm_train_config: Optional[str], lm_file: Optional[str], word_lm_train_config: Optional[str], word_lm_file: Optional[str], ngram_file: Optional[str], model_tag: Optional[str], token_type: Optional[str], bpemodel: Optional[str], allow_variable_data_keys: bool, transducer_conf: Optional[dict], streaming: bool)[source]¶

espnet2.bin.asr_inference.main(cmd=None)[source]¶

espnet2.bin.asr_inference_k2¶

espnet2.bin.asr_inference_maskctc¶

class espnet2.bin.asr_inference_maskctc.Speech2Text(asr_train_config: Union[pathlib.Path, str], asr_model_file: Union[pathlib.Path, str] = None, token_type: str = None, bpemodel: str = None, device: str = 'cpu', batch_size: int = 1, dtype: str = 'float32', maskctc_n_iterations: int = 10, maskctc_threshold_probability: float = 0.99)[source]¶

Bases: object

Speech2Text class

Examples

>>> import soundfile
>>> speech2text = Speech2Text("asr_config.yml", "asr.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> speech2text(audio)
[(text, token, token_int, hypothesis object), ...]

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build Speech2Text instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: Speech2Text instance.
Return type: Speech2Text

espnet2.bin.asr_inference_maskctc.get_parser()[source]¶

espnet2.bin.asr_inference_maskctc.inference(output_dir: str, batch_size: int, dtype: str, ngpu: int, seed: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], asr_train_config: str, asr_model_file: str, model_tag: Optional[str], token_type: Optional[str], bpemodel: Optional[str], allow_variable_data_keys: bool, maskctc_n_iterations: int, maskctc_threshold_probability: float)[source]¶

espnet2.bin.asr_inference_maskctc.main(cmd=None)[source]¶

espnet2.bin.asr_inference_streaming¶

class espnet2.bin.asr_inference_streaming.Speech2TextStreaming(asr_train_config: Union[pathlib.Path, str], asr_model_file: Union[pathlib.Path, str] = None, lm_train_config: Union[pathlib.Path, str] = None, lm_file: Union[pathlib.Path, str] = None, token_type: str = None, bpemodel: str = None, device: str = 'cpu', maxlenratio: float = 0.0, minlenratio: float = 0.0, batch_size: int = 1, dtype: str = 'float32', beam_size: int = 20, ctc_weight: float = 0.5, lm_weight: float = 1.0, penalty: float = 0.0, nbest: int = 1, disable_repetition_detection=False, decoder_text_length_limit=0, encoded_feat_length_limit=0)[source]¶

Bases: object

Speech2TextStreaming class

Details in “Streaming Transformer ASR with Blockwise Synchronous Beam Search” (https://arxiv.org/abs/2006.14941)

Examples

>>> import soundfile
>>> speech2text = Speech2TextStreaming("asr_config.yml", "asr.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> speech2text(audio)
[(text, token, token_int, hypothesis object), ...]

apply_frontend(speech: torch.Tensor, prev_states=None, is_final: bool = False)[source]¶

assemble_hyps(hyps)[source]¶

reset()[source]¶

espnet2.bin.asr_inference_streaming.get_parser()[source]¶

espnet2.bin.asr_inference_streaming.inference(output_dir: str, maxlenratio: float, minlenratio: float, batch_size: int, dtype: str, beam_size: int, ngpu: int, seed: int, ctc_weight: float, lm_weight: float, penalty: float, nbest: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], asr_train_config: str, asr_model_file: str, lm_train_config: Optional[str], lm_file: Optional[str], word_lm_train_config: Optional[str], word_lm_file: Optional[str], token_type: Optional[str], bpemodel: Optional[str], allow_variable_data_keys: bool, sim_chunk_length: int, disable_repetition_detection: bool, encoded_feat_length_limit: int, decoder_text_length_limit: int)[source]¶

espnet2.bin.asr_inference_streaming.main(cmd=None)[source]¶

espnet2.bin.asr_train¶

espnet2.bin.asr_train.get_parser()[source]¶

espnet2.bin.asr_train.main(cmd=None)[source]¶

ASR training.

Example

% python asr_train.py asr –print_config –optim adadelta: > conf/train_asr.yaml

% python asr_train.py –config conf/train_asr.yaml

espnet2.bin.diar_inference¶

class espnet2.bin.diar_inference.DiarizeSpeech(train_config: Union[pathlib.Path, str] = None, model_file: Union[pathlib.Path, str] = None, segment_size: Optional[float] = None, normalize_segment_scale: bool = False, show_progressbar: bool = False, num_spk: Optional[int] = None, device: str = 'cpu', dtype: str = 'float32')[source]¶

Bases: object

DiarizeSpeech class

Examples

>>> import soundfile
>>> diarization = DiarizeSpeech("diar_config.yaml", "diar.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> diarization(audio)
[(spk_id, start, end), (spk_id2, start2, end2)]

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build DiarizeSpeech instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: DiarizeSpeech instance.
Return type: DiarizeSpeech

espnet2.bin.diar_inference.get_parser()[source]¶

espnet2.bin.diar_inference.inference(output_dir: str, batch_size: int, dtype: str, fs: int, ngpu: int, seed: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], train_config: Optional[str], model_file: Optional[str], model_tag: Optional[str], allow_variable_data_keys: bool, segment_size: Optional[float], show_progressbar: bool, num_spk: Optional[int])[source]¶

espnet2.bin.diar_inference.main(cmd=None)[source]¶

espnet2.bin.diar_train¶

espnet2.bin.diar_train.get_parser()[source]¶

espnet2.bin.diar_train.main(cmd=None)[source]¶

Speaker diarization training.

Example

% python diar_train.py diar –print_config –optim adadelta: > conf/train_diar.yaml

% python diar_train.py –config conf/train_diar.yaml

espnet2.bin.enh_inference¶

class espnet2.bin.enh_inference.SeparateSpeech(train_config: Union[pathlib.Path, str] = None, model_file: Union[pathlib.Path, str] = None, segment_size: Optional[float] = None, hop_size: Optional[float] = None, normalize_segment_scale: bool = False, show_progressbar: bool = False, ref_channel: Optional[int] = None, normalize_output_wav: bool = False, device: str = 'cpu', dtype: str = 'float32')[source]¶

Bases: object

SeparateSpeech class

Examples

>>> import soundfile
>>> separate_speech = SeparateSpeech("enh_config.yml", "enh.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> separate_speech(audio)
[separated_audio1, separated_audio2, ...]

cal_permumation(ref_wavs, enh_wavs, criterion='si_snr')[source]¶

Calculate the permutation between seaprated streams in two adjacent segments.

Parameters

ref_wavs (List[torch.Tensor]) – [(Batch, Nsamples)]
enh_wavs (List[torch.Tensor]) – [(Batch, Nsamples)]
criterion (str) – one of (“si_snr”, “mse”, “corr)

Returns

permutation for enh_wavs (Batch, num_spk)

Return type

perm (torch.Tensor)

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build SeparateSpeech instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: SeparateSpeech instance.
Return type: SeparateSpeech

espnet2.bin.enh_inference.get_parser()[source]¶

espnet2.bin.enh_inference.humanfriendly_or_none(value: str)[source]¶

espnet2.bin.enh_inference.inference(output_dir: str, batch_size: int, dtype: str, fs: int, ngpu: int, seed: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], train_config: Optional[str], model_file: Optional[str], model_tag: Optional[str], allow_variable_data_keys: bool, segment_size: Optional[float], hop_size: Optional[float], normalize_segment_scale: bool, show_progressbar: bool, ref_channel: Optional[int], normalize_output_wav: bool)[source]¶

espnet2.bin.enh_inference.main(cmd=None)[source]¶

espnet2.bin.enh_scoring¶

espnet2.bin.enh_scoring.get_parser()[source]¶

espnet2.bin.enh_scoring.main(cmd=None)[source]¶

espnet2.bin.enh_scoring.scoring(output_dir: str, dtype: str, log_level: Union[int, str], key_file: str, ref_scp: List[str], inf_scp: List[str], ref_channel: int)[source]¶

espnet2.bin.enh_train¶

espnet2.bin.enh_train.get_parser()[source]¶

espnet2.bin.enh_train.main(cmd=None)[source]¶

Enhancemnet frontend training.

Example

% python enh_train.py asr –print_config –optim adadelta: > conf/train_enh.yaml

% python enh_train.py –config conf/train_enh.yaml

espnet2.bin.gan_tts_train¶

espnet2.bin.gan_tts_train.get_parser()[source]¶

espnet2.bin.gan_tts_train.main(cmd=None)[source]¶

GAN-based TTS training

Example

% python gan_tts_train.py –print_config –optim1 adadelta % python gan_tts_train.py –config conf/train.yaml

espnet2.bin.hubert_train¶

espnet2.bin.hubert_train.get_parser()[source]¶

espnet2.bin.hubert_train.main(cmd=None)[source]¶

Hubert pretraining.

Example

% python hubert_train.py asr –print_config –optim adadelta > conf/hubert_asr.yaml % python hubert_train.py –config conf/train_asr.yaml

espnet2.bin.launch¶

espnet2.bin.launch.get_parser()[source]¶

espnet2.bin.launch.main(cmd=None)[source]¶

espnet2.bin.lm_calc_perplexity¶

espnet2.bin.lm_calc_perplexity.calc_perplexity(output_dir: str, batch_size: int, dtype: str, ngpu: int, seed: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], train_config: Optional[str], model_file: Optional[str], log_base: Optional[float], allow_variable_data_keys: bool)[source]¶

espnet2.bin.lm_calc_perplexity.get_parser()[source]¶

espnet2.bin.lm_calc_perplexity.main(cmd=None)[source]¶

espnet2.bin.lm_train¶

espnet2.bin.lm_train.get_parser()[source]¶

espnet2.bin.lm_train.main(cmd=None)[source]¶

LM training.

Example

% python lm_train.py asr –print_config –optim adadelta % python lm_train.py –config conf/train_asr.yaml

espnet2.bin.mt_inference¶

class espnet2.bin.mt_inference.Text2Text(mt_train_config: Union[pathlib.Path, str] = None, mt_model_file: Union[pathlib.Path, str] = None, lm_train_config: Union[pathlib.Path, str] = None, lm_file: Union[pathlib.Path, str] = None, ngram_scorer: str = 'full', ngram_file: Union[pathlib.Path, str] = None, token_type: str = None, bpemodel: str = None, device: str = 'cpu', maxlenratio: float = 0.0, minlenratio: float = 0.0, batch_size: int = 1, dtype: str = 'float32', beam_size: int = 20, lm_weight: float = 1.0, ngram_weight: float = 0.9, penalty: float = 0.0, nbest: int = 1)[source]¶

Bases: object

Text2Text class

Examples

>>> text2text = Text2Text("mt_config.yml", "mt.pth")
>>> text2text(audio)
[(text, token, token_int, hypothesis object), ...]

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build Text2Text instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: Text2Text instance.
Return type: Text2Text

espnet2.bin.mt_inference.get_parser()[source]¶

espnet2.bin.mt_inference.inference(output_dir: str, maxlenratio: float, minlenratio: float, batch_size: int, dtype: str, beam_size: int, ngpu: int, seed: int, lm_weight: float, ngram_weight: float, penalty: float, nbest: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], mt_train_config: Optional[str], mt_model_file: Optional[str], lm_train_config: Optional[str], lm_file: Optional[str], word_lm_train_config: Optional[str], word_lm_file: Optional[str], ngram_file: Optional[str], model_tag: Optional[str], token_type: Optional[str], bpemodel: Optional[str], allow_variable_data_keys: bool)[source]¶

espnet2.bin.mt_inference.main(cmd=None)[source]¶

espnet2.bin.mt_train¶

espnet2.bin.mt_train.get_parser()[source]¶

espnet2.bin.mt_train.main(cmd=None)[source]¶

MT training.

Example

% python mt_train.py st –print_config –optim adadelta: > conf/train_mt.yaml

% python mt_train.py –config conf/train_mt.yaml

espnet2.bin.pack¶

class espnet2.bin.pack.ASRPackedContents[source]¶

Bases: espnet2.bin.pack.PackedContents

files = ['asr_model_file', 'lm_file']¶

yaml_files = ['asr_train_config', 'lm_train_config']¶

class espnet2.bin.pack.DiarPackedContents[source]¶

Bases: espnet2.bin.pack.PackedContents

files = ['model_file']¶

yaml_files = ['train_config']¶

class espnet2.bin.pack.EnhPackedContents[source]¶

Bases: espnet2.bin.pack.PackedContents

files = ['model_file']¶

yaml_files = ['train_config']¶

class espnet2.bin.pack.PackedContents[source]¶

Bases: object

files = []¶

yaml_files = []¶

class espnet2.bin.pack.STPackedContents[source]¶

Bases: espnet2.bin.pack.PackedContents

files = ['st_model_file']¶

yaml_files = ['st_train_config']¶

class espnet2.bin.pack.TTSPackedContents[source]¶

Bases: espnet2.bin.pack.PackedContents

files = ['model_file']¶

yaml_files = ['train_config']¶

espnet2.bin.pack.add_arguments(parser: argparse.ArgumentParser, contents: Type[espnet2.bin.pack.PackedContents])[source]¶

espnet2.bin.pack.get_parser() → argparse.ArgumentParser[source]¶

espnet2.bin.pack.main(cmd=None)[source]¶

espnet2.bin.split_scps¶

espnet2.bin.split_scps.get_parser() → argparse.ArgumentParser[source]¶

espnet2.bin.split_scps.main(cmd=None)[source]¶

espnet2.bin.split_scps.split_scps(scps: List[str], num_splits: int, names: Optional[List[str]], output_dir: str, log_level: str)[source]¶

espnet2.bin.st_inference¶

class espnet2.bin.st_inference.Speech2Text(st_train_config: Union[pathlib.Path, str] = None, st_model_file: Union[pathlib.Path, str] = None, lm_train_config: Union[pathlib.Path, str] = None, lm_file: Union[pathlib.Path, str] = None, ngram_scorer: str = 'full', ngram_file: Union[pathlib.Path, str] = None, token_type: str = None, bpemodel: str = None, device: str = 'cpu', maxlenratio: float = 0.0, minlenratio: float = 0.0, batch_size: int = 1, dtype: str = 'float32', beam_size: int = 20, lm_weight: float = 1.0, ngram_weight: float = 0.9, penalty: float = 0.0, nbest: int = 1)[source]¶

Bases: object

Speech2Text class

Examples

>>> import soundfile
>>> speech2text = Speech2Text("st_config.yml", "st.pth")
>>> audio, rate = soundfile.read("speech.wav")
>>> speech2text(audio)
[(text, token, token_int, hypothesis object), ...]

static from_pretrained(model_tag: Optional[str] = None, **kwargs)[source]¶

Build Speech2Text instance from the pretrained model.

Parameters: model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
Returns: Speech2Text instance.
Return type: Speech2Text

espnet2.bin.st_inference.get_parser()[source]¶

espnet2.bin.st_inference.inference(output_dir: str, maxlenratio: float, minlenratio: float, batch_size: int, dtype: str, beam_size: int, ngpu: int, seed: int, lm_weight: float, ngram_weight: float, penalty: float, nbest: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], st_train_config: Optional[str], st_model_file: Optional[str], lm_train_config: Optional[str], lm_file: Optional[str], word_lm_train_config: Optional[str], word_lm_file: Optional[str], ngram_file: Optional[str], model_tag: Optional[str], token_type: Optional[str], bpemodel: Optional[str], allow_variable_data_keys: bool)[source]¶

espnet2.bin.st_inference.main(cmd=None)[source]¶

espnet2.bin.st_train¶

espnet2.bin.st_train.get_parser()[source]¶

espnet2.bin.st_train.main(cmd=None)[source]¶

ST training.

Example

% python st_train.py st –print_config –optim adadelta: > conf/train_st.yaml

% python st_train.py –config conf/train_st.yaml

espnet2.bin.tokenize_text¶

espnet2.bin.tokenize_text.field2slice(field: Optional[str]) → slice[source]¶

Convert field string to slice

Note that field string accepts 1-based integer.

Examples

>>> field2slice("1-")
slice(0, None, None)
>>> field2slice("1-3")
slice(0, 3, None)
>>> field2slice("-3")
slice(None, 3, None)

espnet2.bin.tokenize_text.get_parser() → argparse.ArgumentParser[source]¶

espnet2.bin.tokenize_text.main(cmd=None)[source]¶

espnet2.bin.tokenize_text.tokenize(input: str, output: str, field: Optional[str], delimiter: Optional[str], token_type: str, space_symbol: str, non_linguistic_symbols: Optional[str], bpemodel: Optional[str], log_level: str, write_vocabulary: bool, vocabulary_size: int, remove_non_linguistic_symbols: bool, cutoff: int, add_symbol: List[str], cleaner: Optional[str], g2p: Optional[str])[source]¶

espnet2.bin.tts_inference¶

Script to run the inference of text-to-speeech model.

class espnet2.bin.tts_inference.Text2Speech(train_config: Union[pathlib.Path, str] = None, model_file: Union[pathlib.Path, str] = None, threshold: float = 0.5, minlenratio: float = 0.0, maxlenratio: float = 10.0, use_teacher_forcing: bool = False, use_att_constraint: bool = False, backward_window: int = 1, forward_window: int = 3, speed_control_alpha: float = 1.0, noise_scale: float = 0.667, noise_scale_dur: float = 0.8, vocoder_config: Union[pathlib.Path, str] = None, vocoder_file: Union[pathlib.Path, str] = None, dtype: str = 'float32', device: str = 'cpu', seed: int = 777, always_fix_seed: bool = False)[source]¶

Bases: object

Text2Speech class.

Examples

>>> from espnet2.bin.tts_inference import Text2Speech
>>> # Case 1: Load the local model and use Griffin-Lim vocoder
>>> text2speech = Text2Speech(
>>>     train_config="/path/to/config.yml",
>>>     model_file="/path/to/model.pth",
>>> )
>>> # Case 2: Load the local model and the pretrained vocoder
>>> text2speech = Text2Speech.from_pretrained(
>>>     train_config="/path/to/config.yml",
>>>     model_file="/path/to/model.pth",
>>>     vocoder_tag="kan-bayashi/ljspeech_tacotron2",
>>> )
>>> # Case 3: Load the pretrained model and use Griffin-Lim vocoder
>>> text2speech = Text2Speech.from_pretrained(
>>>     model_tag="kan-bayashi/ljspeech_tacotron2",
>>> )
>>> # Case 4: Load the pretrained model and the pretrained vocoder
>>> text2speech = Text2Speech.from_pretrained(
>>>     model_tag="kan-bayashi/ljspeech_tacotron2",
>>>     vocoder_tag="parallel_wavegan/ljspeech_parallel_wavegan.v1",
>>> )
>>> # Run inference and save as wav file
>>> import soundfile as sf
>>> wav = text2speech("Hello, World")["wav"]
>>> sf.write("out.wav", wav.numpy(), text2speech.fs, "PCM_16")

Initialize Text2Speech module.

static from_pretrained(model_tag: Optional[str] = None, vocoder_tag: Optional[str] = None, **kwargs)[source]¶

Build Text2Speech instance from the pretrained model.

Parameters

model_tag (Optional[str]) – Model tag of the pretrained models. Currently, the tags of espnet_model_zoo are supported.
vocoder_tag (Optional[str]) – Vocoder tag of the pretrained vocoders. Currently, the tags of parallel_wavegan are supported, which should start with the prefix “parallel_wavegan/”.

Returns

Text2Speech instance.

Return type

Text2Speech

property fs¶: Return sampling rate.

property use_lids¶: Return sid is needed or not in the inference.

property use_sids¶: Return sid is needed or not in the inference.

property use_speech¶: Return speech is needed or not in the inference.

property use_spembs¶: Return spemb is needed or not in the inference.

espnet2.bin.tts_inference.get_parser()[source]¶: Get argument parser.

espnet2.bin.tts_inference.inference(output_dir: str, batch_size: int, dtype: str, ngpu: int, seed: int, num_workers: int, log_level: Union[int, str], data_path_and_name_and_type: Sequence[Tuple[str, str, str]], key_file: Optional[str], train_config: Optional[str], model_file: Optional[str], model_tag: Optional[str], threshold: float, minlenratio: float, maxlenratio: float, use_teacher_forcing: bool, use_att_constraint: bool, backward_window: int, forward_window: int, speed_control_alpha: float, noise_scale: float, noise_scale_dur: float, always_fix_seed: bool, allow_variable_data_keys: bool, vocoder_config: Optional[str], vocoder_file: Optional[str], vocoder_tag: Optional[str])[source]¶: Run text-to-speech inference.

espnet2.bin.tts_inference.main(cmd=None)[source]¶: Run TTS model inference.

espnet2.bin.tts_train¶

espnet2.bin.tts_train.get_parser()[source]¶

espnet2.bin.tts_train.main(cmd=None)[source]¶

TTS training

Example

% python tts_train.py asr –print_config –optim adadelta % python tts_train.py –config conf/train_asr.yaml

espnet2.bin package¶

espnet2.bin.__init__¶

espnet2.bin.aggregate_stats_dirs¶

espnet2.bin.asr_align¶

espnet2.bin.asr_inference¶

espnet2.bin.asr_inference_k2¶

espnet2.bin.asr_inference_maskctc¶

espnet2.bin.asr_inference_streaming¶

espnet2.bin.asr_train¶

espnet2.bin.diar_inference¶

espnet2.bin.diar_train¶

espnet2.bin.enh_inference¶

espnet2.bin.enh_scoring¶

espnet2.bin.enh_train¶

espnet2.bin.gan_tts_train¶

espnet2.bin.hubert_train¶

espnet2.bin.launch¶

espnet2.bin.lm_calc_perplexity¶

espnet2.bin.lm_train¶

espnet2.bin.mt_inference¶

espnet2.bin.mt_train¶

espnet2.bin.pack¶

espnet2.bin.split_scps¶

espnet2.bin.st_inference¶

espnet2.bin.st_train¶

espnet2.bin.tokenize_text¶

espnet2.bin.tts_inference¶

espnet2.bin.tts_train¶

espnet2.bin.init¶