espnet.asr package¶

espnet.asr.asr_mix_utils¶

class espnet.asr.asr_mix_utils.PlotAttentionReport(att_vis_fn, data, outdir, converter, device, reverse=False)[source]¶

Bases: chainer.training.extension.Extension

Plot attention reporter.

Parameters

att_vis_fn (espnet.nets.*_backend.e2e_asr.calculate_all_attentions) – Function of attention visualization.
data (list[tuple(str, dict[str, dict[str, Any]])]) – List json utt key items.
outdir (str) – Directory to save figures.
converter (espnet.asr.*_backend.asr.CustomConverter) – CustomConverter object. Function to convert data.
device (torch.device) – The destination device to send tensor.
reverse (bool) – If True, input and output length are reversed.

draw_attention_plot(att_w)[source]¶

Visualize attention weights matrix.

Parameters: att_w (Tensor) – Attention weight matrix.
Returns: pyplot object with attention matrix image.
Return type: matplotlib.pyplot

get_attention_weight(idx, att_w, spkr_idx)[source]¶: Transform attention weight in regard to self.reverse.

get_attention_weights()[source]¶

Return attention weights.

Returns

attention weights. It’s shape would be: differ from bachend.dtype=float * pytorch-> 1) multi-head case => (B, H, Lmax, Tmax). 2) other case => (B, Lmax, Tmax). * chainer-> attention weights (B, Lmax, Tmax).

Return type

arr_ws_sd (numpy.ndarray)

log_attentions(logger, step)[source]¶: Add image files of attention matrix to tensorboard.

espnet.asr.asr_mix_utils.add_results_to_json(js, nbest_hyps_sd, char_list)[source]¶

Add N-best results to json.

Parameters

js (dict[str, Any]) – Groundtruth utterance dict.
nbest_hyps_sd (list[dict[str, Any]]) – List of hypothesis for multi_speakers (# Utts x # Spkrs).
char_list (list[str]) – List of characters.

Returns

N-best results added utterance dict.

Return type

dict[str, Any]

espnet.asr.asr_mix_utils.make_batchset(data, batch_size, max_length_in, max_length_out, num_batches=0, min_batch_size=1)[source]¶

Make batch set from json dictionary.

Parameters

data (Dict[str, List[Any]]) – Dictionary loaded from data.json.
batch_size (int) – Batch size.
max_length_in (int) – Maximum length of input to decide adaptive batch size.
max_length_out (int) – Maximum length of output to decide adaptive batch size.
num_batches (int) – Number of batches to use (for debug).
min_batch_size (int) – Mininum batch size (for multi-gpu).

Returns

List of batches.

Return type

List[Tuple(str, Dict[str, List[dict[str, Any]]])]

espnet.asr.asr_utils¶

class espnet.asr.asr_utils.CompareValueTrigger(key, compare_fn, trigger=(1, 'epoch'))[source]¶

Bases: object

Trigger invoked when key value getting bigger or lower than before.

Parameters

key (str) – Key of value.
compare_fn ((float, float) -> bool) – Function to compare the values.
trigger (tuple(int, str)) – Trigger that decide the comparison interval.

class espnet.asr.asr_utils.PlotAttentionReport(att_vis_fn, data, outdir, converter, transform, device, reverse=False, ikey='input', iaxis=0, okey='output', oaxis=0)[source]¶

Bases: chainer.training.extension.Extension

Plot attention reporter.

Parameters

att_vis_fn (espnet.nets.*_backend.e2e_asr.E2E.calculate_all_attentions) – Function of attention visualization.
data (list[tuple(str, dict[str, list[Any]])]) – List json utt key items.
outdir (str) – Directory to save figures.
converter (espnet.asr.*_backend.asr.CustomConverter) – Function to convert data.
device (int | torch.device) – Device.
reverse (bool) – If True, input and output length are reversed.
ikey (str) – Key to access input (for ASR ikey=”input”, for MT ikey=”output”.)
iaxis (int) – Dimension to access input (for ASR iaxis=0, for MT iaxis=1.)
okey (str) – Key to access output (for ASR okey=”input”, MT okay=”output”.)

draw_attention_plot(att_w)[source]¶

Plot the att_w matrix.

Returns: pyplot object with attention matrix image.
Return type: matplotlib.pyplot

get_attention_weight(idx, att_w)[source]¶: Transform attention matrix with regard to self.reverse.

get_attention_weights()[source]¶

Return attention weights.

Returns

attention weights.float. Its shape would be: differ from backend. * pytorch-> 1) multi-head case => (B, H, Lmax, Tmax), 2) other case => (B, Lmax, Tmax). * chainer-> (B, Lmax, Tmax)

Return type

numpy.ndarray

log_attentions(logger, step)[source]¶: Add image files of att_ws matrix to the tensorboard.

espnet.asr.asr_utils.adadelta_eps_decay(eps_decay)[source]¶

Extension to perform adadelta eps decay.

Parameters: eps_decay (float) – Decay rate of eps.
Returns: An extension function.

espnet.asr.asr_utils.add_gradient_noise(model, iteration, duration=100, eta=1.0, scale_factor=0.55)[source]¶

Adds noise from a standard normal distribution to the gradients.

The standard deviation (sigma) is controlled by the three hyper-parameters below. sigma goes to zero (no noise) with more iterations.

Parameters

model (torch.nn.model) – Model.
iteration (int) – Number of iterations.
duration (int) – Number of durations to control the interval of the sigma change.
eta (float) – The magnitude of sigma.
scale_factor (float) – The scale of sigma.

espnet.asr.asr_utils.add_results_to_json(js, nbest_hyps, char_list)[source]¶

Add N-best results to json.

Parameters

js (dict[str, Any]) – Groundtruth utterance dict.
nbest_hyps_sd (list[dict[str, Any]]) – List of hypothesis for multi_speakers: nutts x nspkrs.
char_list (list[str]) – List of characters.

Returns

N-best results added utterance dict.

Return type

dict[str, Any]

espnet.asr.asr_utils.chainer_load(path, model)[source]¶

Load chainer model parameters.

Parameters

path (str) – Model path or snapshot file path to be loaded.
model (chainer.Chain) – Chainer model.

espnet.asr.asr_utils.get_model_conf(model_path, conf_path=None)[source]¶

Get model config information by reading a model config file (model.json).

Parameters

model_path (str) – Model path.
conf_path (str) – Optional model config path.

Returns

Config information loaded from json file.

Return type

list[int, int, dict[str, Any]]

espnet.asr.asr_utils.parse_hypothesis(hyp, char_list)[source]¶

Parse hypothesis.

Parameters

hyp (list[dict[str, Any]]) – Recognition hypothesis.
char_list (list[str]) – List of characters.

Returns

tuple(str, str, str, float)

espnet.asr.asr_utils.plot_spectrogram(plt, spec, mode='db', fs=None, frame_shift=None, bottom=True, left=True, right=True, top=False, labelbottom=True, labelleft=True, labelright=True, labeltop=False, cmap='inferno')[source]¶

Plot spectrogram using matplotlib.

Parameters

plt (matplotlib.pyplot) – pyplot object.
spec (numpy.ndarray) – Input stft (Freq, Time)
mode (str) – db or linear.
fs (int) – Sample frequency. To convert y-axis to kHz unit.
frame_shift (int) – The frame shift of stft. To convert x-axis to second unit.
bottom (bool) – Whether to draw the respective ticks.
left (bool) –
right (bool) –
top (bool) –
labelbottom (bool) – Whether to draw the respective tick labels.
labelleft (bool) –
labelright (bool) –
labeltop (bool) –
cmap (str) – Colormap defined in matplotlib.

espnet.asr.asr_utils.restore_snapshot(model, snapshot, load_fn=<function load_npz>)[source]¶

Extension to restore snapshot.

Returns: An extension function.

espnet.asr.asr_utils.snapshot_object(target, filename)[source]¶

Returns a trainer extension to take snapshots of a given object.

Parameters

target (model) – Object to serialize.
filename (str) – Name of the file into which the object is serialized.It can be a format string, where the trainer object is passed to the :meth: str.format method. For example, 'snapshot_{.updater.iteration}' is converted to 'snapshot_10000' at the 10,000th iteration.

Returns

An extension function.

espnet.asr.asr_utils.torch_load(path, model)[source]¶

Load torch model states.

Parameters

path (str) – Model path or snapshot file path to be loaded.
model (torch.nn.Module) – Torch model.

espnet.asr.asr_utils.torch_resume(snapshot_path, trainer)[source]¶

Resume from snapshot for pytorch.

Parameters

snapshot_path (str) – Snapshot file path.
trainer (chainer.training.Trainer) – Chainer’s trainer instance.

espnet.asr.asr_utils.torch_save(path, model)[source]¶

Save torch model states.

Parameters

path (str) – Model path to be saved.
model (torch.nn.Module) – Torch model.

espnet.asr.asr_utils.torch_snapshot(savefun=<function save>, filename='snapshot.ep.{.updater.epoch}')[source]¶

Extension to take snapshot of the trainer for pytorch.

Returns: An extension function.

espnet.asr.chainer_backend.asr¶

espnet.asr.chainer_backend.asr.recog(args)[source]¶

Decode with the given args.

Parameters: args (namespace) – The program arguments.

espnet.asr.chainer_backend.asr.train(args)[source]¶

Train with the given args.

Parameters: args (namespace) – The program arguments.

espnet.asr.pytorch_backend.asr¶

class espnet.asr.pytorch_backend.asr.CustomConverter(subsampling_factor=1, dtype=torch.float32)[source]¶

Bases: object

Custom batch converter for Pytorch.

Parameters

subsampling_factor (int) – The subsampling factor.
dtype (torch.dtype) – Data type to convert.

class espnet.asr.pytorch_backend.asr.CustomEvaluator(model, iterator, target, converter, device, ngpu=None)[source]¶

Bases: espnet.utils.training.evaluator.BaseEvaluator

Custom Evaluator for Pytorch.

Parameters

model (torch.nn.Module) – The model to evaluate.
iterator (chainer.dataset.Iterator) – The train iterator.
target (link | dict[str, link]) – Link object or a dictionary of links to evaluate. If this is just a link object, the link is registered by the name 'main'.
converter (espnet.asr.pytorch_backend.asr.CustomConverter) – Converter function to build input arrays. Each batch extracted by the main iterator and the device option are passed to this function. chainer.dataset.concat_examples() is used by default.
device (torch.device) – The device used.
ngpu (int) – The number of GPUs.

evaluate()[source]¶: Main evaluate routine for CustomEvaluator.

class espnet.asr.pytorch_backend.asr.CustomUpdater(model, grad_clip_threshold, train_iter, optimizer, converter, device, ngpu, grad_noise=False, accum_grad=1, use_apex=False)[source]¶

Bases: chainer.training.updaters.standard_updater.StandardUpdater

Custom Updater for Pytorch.

Parameters

model (torch.nn.Module) – The model to update.
grad_clip_threshold (float) – The gradient clipping value to use.
train_iter (chainer.dataset.Iterator) – The training iterator.
optimizer (torch.optim.optimizer) – The training optimizer.
converter (espnet.asr.pytorch_backend.asr.CustomConverter) – Converter function to build input arrays. Each batch extracted by the main iterator and the device option are passed to this function. chainer.dataset.concat_examples() is used by default.
device (torch.device) – The device to use.
ngpu (int) – The number of gpus to use.
use_apex (bool) – The flag to use Apex in backprop.

update()[source]¶

Updates the parameters of the target model.

This method implements an update formula for the training task, including data loading, forward/backward computations, and actual updates of parameters.

This method is called once at each iteration of the training loop.

update_core()[source]¶: Main update routine of the CustomUpdater.

espnet.asr.pytorch_backend.asr.enhance(args)[source]¶

Dumping enhanced speech and mask.

Parameters: args (namespace) – The program arguments.

espnet.asr.pytorch_backend.asr.recog(args)[source]¶

Decode with the given args.

Parameters: args (namespace) – The program arguments.

espnet.asr.pytorch_backend.asr.train(args)[source]¶

Train with the given args.

Parameters: args (namespace) – The program arguments.

espnet.asr.pytorch_backend.asr_init¶

espnet.asr.pytorch_backend.asr_init.filter_modules(model_state_dict, modules)[source]¶

Filter non-matched modules in module_state_dict

Parameters

model_state_dict (odict) – trained model state_dict
modules (list) – specified module list for transfer

Returns

the update module list

Return type

new_mods (list)

espnet.asr.pytorch_backend.asr_init.get_partial_asr_mt_state_dict(model_state_dict, modules)[source]¶

Create state_dict with specified modules matching input model modules.

Parameters

model_state_dict (odict) – trained model state_dict
modules (list) – specified module list for transfer

Returns

the updated state_dict

Return type

new_state_dict (odict)

espnet.asr.pytorch_backend.asr_init.get_partial_lm_state_dict(model_state_dict, modules)[source]¶

Create compatible ASR state_dict from model_state_dict (LM).

The keys for specified modules are modified to match ASR decoder modules keys.

Parameters

model_state_dict (odict) – trained model state_dict
modules (list) – specified module list for transfer

Returns

the updated state_dict new_mods (list): the updated module list

Return type

new_state_dict (odict)

espnet.asr.pytorch_backend.asr_init.get_trained_model_state_dict(model_path)[source]¶

Extract the trained model state dict for pre-initialization.

Parameters: model_path (str) – Path to model.***.best
Returns: the loaded model state_dict (str): Type of model. Either ASR/MT or LM.
Return type: model.state_dict() (odict)

espnet.asr.pytorch_backend.asr_init.load_trained_model(model_path)[source]¶

Load the trained model for recognition.

Parameters: model_path (str) – Path to model.***.best

espnet.asr.pytorch_backend.asr_init.load_trained_modules(idim, odim, args)[source]¶

Load model encoder or/and decoder modules with ESPNET pre-trained model(s).

Parameters

idim (int) – initial input dimension.
odim (int) – initial output dimension.
args (namespace) – The initial model arguments.

Returns

The model with pretrained modules.

Return type

model (torch.nn.Module)

espnet.asr.pytorch_backend.asr_init.transfer_verification(model_state_dict, partial_state_dict, modules)[source]¶

Verify tuples (key, shape) for input model modules match specified modules.

Parameters

model_state_dict (odict) – the initial model state_dict
partial_state_dict (odict) – the trained model state_dict
modules (list) – specified module list for transfer

Returns

allow transfer

Return type

(boolean)

espnet.asr.pytorch_backend.asr_mix¶

espnet.asr.pytorch_backend.recog¶

V2 backend for asr_recog.py using py:class:espnet.nets.beam_search.BeamSearch.

espnet.asr.pytorch_backend.recog.recog_v2(args)[source]¶

Decode with custom models that implements ScorerInterface.

Notes

The previous backend espnet.asr.pytorch_backend.asr.recog only supports E2E and RNNLM

Parameters: args (namespace) – The program arguments. See py:func:espnet.bin.asr_recog.get_parser for details