espnet2.layers package¶

espnet2.layers.init¶

espnet2.layers.abs_normalize¶

class espnet2.layers.abs_normalize.AbsNormalize[source]¶

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, input_lengths: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.layers.global_mvn¶

class espnet2.layers.global_mvn.GlobalMVN(stats_file: Union[pathlib.Path, str], norm_means: bool = True, norm_vars: bool = True, eps: float = 1e-20)[source]¶

Bases: espnet2.layers.abs_normalize.AbsNormalize, espnet2.layers.inversible_interface.InversibleInterface

Apply global mean and variance normalization

TODO(kamo): Make this class portable somehow

Parameters

stats_file – npy file
norm_means – Apply mean normalization
norm_vars – Apply var normalization
eps –

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Forward function

Parameters

x – (B, L, …)
ilens – (B,)

inverse(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

espnet2.layers.inversible_interface¶

class espnet2.layers.inversible_interface.InversibleInterface[source]¶

Bases: abc.ABC

abstract inverse(input: torch.Tensor, input_lengths: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

espnet2.layers.label_aggregation¶

class espnet2.layers.label_aggregation.LabelAggregate(win_length: int = 512, hop_length: int = 128, center: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(input: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

LabelAggregate forward function.

Parameters

input – (Batch, Nsamples, Label_dim)
ilens – (Batch)

Returns

(Batch, Frames, Label_dim)

Return type

output

espnet2.layers.log_mel¶

class espnet2.layers.log_mel.LogMel(fs: int = 16000, n_fft: int = 512, n_mels: int = 80, fmin: float = None, fmax: float = None, htk: bool = False, log_base: float = None)[source]¶

Bases: torch.nn.modules.module.Module

Convert STFT to fbank feats

The arguments is same as librosa.filters.mel

Parameters

fs – number > 0 [scalar] sampling rate of the incoming signal
n_fft – int > 0 [scalar] number of FFT components
n_mels – int > 0 [scalar] number of Mel bands to generate
fmin – float >= 0 [scalar] lowest frequency (in Hz)
fmax – float >= 0 [scalar] highest frequency (in Hz). If None, use fmax = fs / 2.0
htk – use HTK formula instead of Slaney

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(feat: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.layers.mask_along_axis¶

class espnet2.layers.mask_along_axis.MaskAlongAxis(mask_width_range: Union[int, Sequence[int]] = (0, 30), num_mask: int = 2, dim: Union[int, str] = 'time', replace_with_zero: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(spec: torch.Tensor, spec_lengths: torch.Tensor = None)[source]¶

Forward function.

Parameters: spec – (Batch, Length, Freq)

class espnet2.layers.mask_along_axis.MaskAlongAxisVariableMaxWidth(mask_width_ratio_range: Union[float, Sequence[float]] = (0.0, 0.05), num_mask: int = 2, dim: Union[int, str] = 'time', replace_with_zero: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

Mask input spec along a specified axis with variable maximum width.

Formula:: max_width = max_width_ratio * seq_len

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(spec: torch.Tensor, spec_lengths: torch.Tensor = None)[source]¶

Forward function.

Parameters: spec – (Batch, Length, Freq)

espnet2.layers.mask_along_axis.mask_along_axis(spec: torch.Tensor, spec_lengths: torch.Tensor, mask_width_range: Sequence[int] = (0, 30), dim: int = 1, num_mask: int = 2, replace_with_zero: bool = True)[source]¶

Apply mask along the specified direction.

Parameters

spec – (Batch, Length, Freq)
spec_lengths – (Length): Not using lengths in this implementation
mask_width_range – Select the width randomly between this range

espnet2.layers.sinc_conv¶

Sinc convolutions.

class espnet2.layers.sinc_conv.BarkScale[source]¶

Bases: object

Bark frequency scale.

Has wider bandwidths at lower frequencies, see: Critical bandwidth: BARK Zwicker and Terhardt, 1980

classmethod bank(channels: int, fs: float) → torch.Tensor[source]¶

Obtain initialization values for the Bark scale.

Parameters

channels – Number of channels.
fs – Sample rate.

Returns

Filter start frequencíes. torch.Tensor: Filter stop frequencíes.

Return type

torch.Tensor

static convert(f)[source]¶: Convert Hz to Bark.

static invert(x)[source]¶: Convert Bark to Hz.

class espnet2.layers.sinc_conv.LogCompression[source]¶

Bases: torch.nn.modules.module.Module

Log Compression Activation.

Activation function log(abs(x) + 1).

Initialize.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Forward.

Applies the Log Compression function elementwise on tensor x.

class espnet2.layers.sinc_conv.MelScale[source]¶

Bases: object

Mel frequency scale.

classmethod bank(channels: int, fs: float) → torch.Tensor[source]¶

Obtain initialization values for the mel scale.

Parameters

channels – Number of channels.
fs – Sample rate.

Returns

Filter start frequencíes. torch.Tensor: Filter stop frequencies.

Return type

torch.Tensor

static convert(f)[source]¶: Convert Hz to mel.

static invert(x)[source]¶: Convert mel to Hz.

class espnet2.layers.sinc_conv.SincConv(in_channels: int, out_channels: int, kernel_size: int, stride: int = 1, padding: int = 0, dilation: int = 1, window_func: str = 'hamming', scale_type: str = 'mel', fs: Union[int, float] = 16000)[source]¶

Bases: torch.nn.modules.module.Module

Sinc Convolution.

This module performs a convolution using Sinc filters in time domain as kernel. Sinc filters function as band passes in spectral domain. The filtering is done as a convolution in time domain, and no transformation to spectral domain is necessary.

This implementation of the Sinc convolution is heavily inspired by Ravanelli et al. https://github.com/mravanelli/SincNet, and adapted for the ESpnet toolkit. Combine Sinc convolutions with a log compression activation function, as in: https://arxiv.org/abs/2010.07597

Notes: Currently, the same filters are applied to all input channels. The windowing function is applied on the kernel to obtained a smoother filter, and not on the input values, which is different to traditional ASR.

Initialize Sinc convolutions.

Parameters

in_channels – Number of input channels.
out_channels – Number of output channels.
kernel_size – Sinc filter kernel size (needs to be an odd number).
stride – See torch.nn.functional.conv1d.
padding – See torch.nn.functional.conv1d.
dilation – See torch.nn.functional.conv1d.
window_func – Window function on the filter, one of [“hamming”, “none”].
fs (str, int, float) – Sample rate of the input data

forward(xs: torch.Tensor) → torch.Tensor[source]¶

Sinc convolution forward function.

Parameters: xs – Batch in form of torch.Tensor (B, C_in, D_in).
Returns: Batch in form of torch.Tensor (B, C_out, D_out).
Return type: xs

get_odim(idim: int) → int[source]¶: Obtain the output dimension of the filter.

static hamming_window(x: torch.Tensor) → torch.Tensor[source]¶: Hamming Windowing function.

init_filters()[source]¶: Initialize filters with filterbank values.

static none_window(x: torch.Tensor) → torch.Tensor[source]¶: Identity-like windowing function.

static sinc(x: torch.Tensor) → torch.Tensor[source]¶: Sinc function.

espnet2.layers.stft¶

class espnet2.layers.stft.Stft(n_fft: int = 512, win_length: int = None, hop_length: int = 128, window: Optional[str] = 'hann', center: bool = True, normalized: bool = False, onesided: bool = True)[source]¶

Bases: torch.nn.modules.module.Module, espnet2.layers.inversible_interface.InversibleInterface

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(input: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

STFT forward function.

Parameters

input – (Batch, Nsamples) or (Batch, Nsample, Channels)
ilens – (Batch)

Returns

(Batch, Frames, Freq, 2) or (Batch, Frames, Channels, Freq, 2)

Return type

output

inverse(input: Union[torch.Tensor, torch_complex.tensor.ComplexTensor], ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

Inverse STFT.

Parameters

input – Tensor(batch, T, F, 2) or ComplexTensor(batch, T, F)
ilens – (batch,)

Returns

(batch, samples) ilens: (batch,)

Return type

wavs

espnet2.layers.time_warp¶

Time warp module.

class espnet2.layers.time_warp.TimeWarp(window: int = 80, mode: str = 'bicubic')[source]¶

Bases: torch.nn.modules.module.Module

Time warping using torch.interpolate.

Parameters

window – time warp parameter
mode – Interpolate mode

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, x_lengths: torch.Tensor = None)[source]¶

Forward function.

Parameters

x – (Batch, Time, Freq)
x_lengths – (Batch,)

espnet2.layers.time_warp.time_warp(x: torch.Tensor, window: int = 80, mode: str = 'bicubic')[source]¶

Time warping using torch.interpolate.

Parameters

x – (Batch, Time, Freq)
window – time warp parameter
mode – Interpolate mode

espnet2.layers.utterance_mvn¶

class espnet2.layers.utterance_mvn.UtteranceMVN(norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20)[source]¶

Bases: espnet2.layers.abs_normalize.AbsNormalize

extra_repr()[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Forward function

Parameters

x – (B, L, …)
ilens – (B,)

espnet2.layers.utterance_mvn.utterance_mvn(x: torch.Tensor, ilens: torch.Tensor = None, norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20) → Tuple[torch.Tensor, torch.Tensor][source]¶

Apply utterance mean and variance normalization

Parameters

x – (B, T, D), assumed zero padded
ilens – (B,)
norm_means –
norm_vars –
eps –

espnet2.layers package¶

espnet2.layers.__init__¶

espnet2.layers.abs_normalize¶

espnet2.layers.global_mvn¶

espnet2.layers.inversible_interface¶

espnet2.layers.label_aggregation¶

espnet2.layers.log_mel¶

espnet2.layers.mask_along_axis¶

espnet2.layers.sinc_conv¶

espnet2.layers.stft¶

espnet2.layers.time_warp¶

espnet2.layers.utterance_mvn¶

espnet2.layers.init¶