espnet2.diar package¶

espnet2.diar.init¶

espnet2.diar.abs_diar¶

class espnet2.diar.abs_diar.AbsDiarization[source]¶

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

abstract forward_rawwav(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, collections.OrderedDict][source]¶

espnet2.diar.espnet_model¶

class espnet2.diar.espnet_model.ESPnetDiarizationModel(frontend: Optional[espnet2.asr.frontend.abs_frontend.AbsFrontend], specaug: Optional[espnet2.asr.specaug.abs_specaug.AbsSpecAug], normalize: Optional[espnet2.layers.abs_normalize.AbsNormalize], label_aggregator: torch.nn.modules.module.Module, encoder: espnet2.asr.encoder.abs_encoder.AbsEncoder, decoder: espnet2.diar.decoder.abs_decoder.AbsDecoder, attractor: Optional[espnet2.diar.attractor.abs_attractor.AbsAttractor], attractor_weight: float = 1.0)[source]¶

Bases: espnet2.train.abs_espnet_model.AbsESPnetModel

Speaker Diarization model

If “attractor” is “None”, SA-EEND will be used. Else if “attractor” is not “None”, EEND-EDA will be used. For the details about SA-EEND and EEND-EDA, refer to the following papers: SA-EEND: https://arxiv.org/pdf/1909.06247.pdf EEND-EDA: https://arxiv.org/pdf/2005.09921.pdf, https://arxiv.org/pdf/2106.10654.pdf

attractor_loss(att_prob, label)[source]¶

static calc_diarization_error(pred, label, length)[source]¶

collect_feats(speech: torch.Tensor, speech_lengths: torch.Tensor, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None) → Dict[str, torch.Tensor][source]¶

create_length_mask(length, max_len, num_output)[source]¶

encode(speech: torch.Tensor, speech_lengths: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Frontend + Encoder

Parameters

speech – (Batch, Length, …)
speech_lengths – (Batch,)

forward(speech: torch.Tensor, speech_lengths: torch.Tensor = None, spk_labels: torch.Tensor = None, spk_labels_lengths: torch.Tensor = None) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶

Frontend + Encoder + Decoder + Calc loss

Parameters

speech – (Batch, samples)
speech_lengths – (Batch,) default None for chunk interator, because the chunk-iterator does not have the speech_lengths returned. see in espnet2/iterators/chunk_iter_factory.py
spk_labels – (Batch, )

pit_loss(pred, label, lengths)[source]¶

pit_loss_single_permute(pred, label, length)[source]¶

espnet2.diar.label_processor¶

class espnet2.diar.label_processor.LabelProcessor(win_length: int = 512, hop_length: int = 128, center: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

Label aggregator for speaker diarization

forward(input: torch.Tensor, ilens: torch.Tensor)[source]¶

Forward.

Parameters

input – (Batch, Nsamples, Label_dim)
ilens – (Batch)

Returns

(Batch, Frames, Label_dim) olens: (Batch)

Return type

output

espnet2.diar.attractor.init¶

espnet2.diar.attractor.abs_attractor¶

class espnet2.diar.attractor.abs_attractor.AbsAttractor[source]¶

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.diar.attractor.rnn_attractor¶

class espnet2.diar.attractor.rnn_attractor.RnnAttractor(encoder_output_size: int, layer: int = 1, unit: int = 512, dropout: float = 0.1, attractor_grad: bool = True)[source]¶

Bases: espnet2.diar.attractor.abs_attractor.AbsAttractor

encoder decoder attractor for speaker diarization

forward(enc_input: torch.Tensor, ilens: torch.Tensor, dec_input: torch.Tensor)[source]¶

Forward.

Parameters

enc_input (torch.Tensor) – hidden_space [Batch, T, F]
ilens (torch.Tensor) – input lengths [Batch]
dec_input (torch.Tensor) – decoder input (zeros) [Batch, num_spk + 1, F]

Returns

[Batch, num_spk + 1, F] att_prob: [Batch, num_spk + 1, 1]

Return type

attractor

espnet2.diar.decoder.init¶

espnet2.diar.decoder.abs_decoder¶

class espnet2.diar.decoder.abs_decoder.AbsDecoder[source]¶

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, ilens: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

abstract property num_spk¶

espnet2.diar.decoder.linear_decoder¶

class espnet2.diar.decoder.linear_decoder.LinearDecoder(encoder_output_size: int, num_spk: int = 2)[source]¶

Bases: espnet2.diar.decoder.abs_decoder.AbsDecoder

Linear decoder for speaker diarization

forward(input: torch.Tensor, ilens: torch.Tensor)[source]¶

Forward.

Parameters

input (torch.Tensor) – hidden_space [Batch, T, F]
ilens (torch.Tensor) – input lengths [Batch]

property num_spk¶

espnet2.diar package¶

espnet2.diar.__init__¶

espnet2.diar.abs_diar¶

espnet2.diar.espnet_model¶

espnet2.diar.label_processor¶

espnet2.diar.attractor.__init__¶

espnet2.diar.attractor.abs_attractor¶

espnet2.diar.attractor.rnn_attractor¶

espnet2.diar.decoder.__init__¶

espnet2.diar.decoder.abs_decoder¶

espnet2.diar.decoder.linear_decoder¶

espnet2.diar.init¶

espnet2.diar.attractor.init¶

espnet2.diar.decoder.init¶