Trainers

Criterions

Contains different loss functions used in various speech recognition models.

CTCLoss: Connectionist Temporal Classification loss function.
CrossEntropyLoss: Cross-entropy loss function.
NLLLoss: Negative log-likelihood loss function.
RNNTLoss: Recurrent Neural Network Transducer loss function.

class speeq.trainers.criterions.CTCLoss(blank_id: int, reduction='mean', zero_infinity=False, *args, **kwargs)[source]

Bases: CTCLoss

The CTC loss.

Args:

blank_id (int): The blank id.

reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.

zero_infinity (bool, optional): Whether to zero infinite losses and the associated gradients. Default: False Infinite losses mainly occur when the inputs are too short to be aligned to the targets.

blank: int

zero_infinity: bool

class speeq.trainers.criterions.CrossEntropyLoss(pad_id: int, reduction='mean', label_smoothing=0.0, *args, **kwargs)[source]

Bases: CrossEntropyLoss

computes the cross entropy loss between input logits and target.

Args:

pad_id (int): The padding id.

reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.

label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss. Default 0.0.

forward(input, target, *args, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

ignore_index: int

label_smoothing: float

class speeq.trainers.criterions.NLLLoss(pad_id: int, reduction='mean', *args, **kwargs)[source]

Bases: NLLLoss

computes the negative log likelihood loss.

Args:

pad_id (int): The padding id.

reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.

forward(input, target, *args, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

ignore_index: int

class speeq.trainers.criterions.RNNTLoss(blank_id: int, reduction='mean', *args, **kwargs)[source]

Bases: RNNTLoss

computes the RNNT loss.

Args:

blank_id (int): The blank id.

reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.

forward(logits: Tensor, logits_len: Tensor, targets: Tensor, target_len: Tensor) → Tensor[source]

Args:

logits (Tensor): Tensor of dimension (batch, max seq length, max target length + 1, class): containing output from joiner

targets (Tensor): Tensor of dimension (batch, max target length) containing targets with zero padded logit_lengths (Tensor): Tensor of dimension (batch) containing lengths of each sequence from encoder target_lengths (Tensor): Tensor of dimension (batch) containing lengths of targets for each sequence

Returns:

Tensor: Loss with the reduction option applied. If reduction is "none", then size (batch), otherwise scalar.

training: bool

speeq.trainers.criterions.get_flatten_results(input: Tensor, target: Tensor) → Tuple[Tensor, Tensor][source]

Flatten the results by making the input that of shape [B, M, C] to be of shape [B * M, C] and the target of shape [B, M] to be of shape [B * M]

Args:

input (Tensor): The predictions of shape [B, M, C].

target (Tensor): The target tensor of shape [B, M]

Returns:

Tuple[Tensor, Tensor]: Atuple of the flatten results.

speeq.trainers.criterions.remove_positionals(input: Tensor, target: Tensor) → Tuple[Tensor, Tensor][source]

Removes the SOS from the target and EOS prediction from the input

Args:

input (Tensor): The input tensor of shape [B, M, C].

target (Tensor): The target tensor of shape [B, C]

Returns:

Tuple[Tensor, Tensor]: The input and target.

Registry

A factory for creating speech task trainers.

This module provides functions to create various objects needed for training speech models, such as loss functions, optimizers, and trainers. The functions are implemented as factory methods, which allow for abstracting object creation and facilitate the creation of customized trainers.

Functions:

get_criterion(name: str, blank_id: int, pad_id: int) -> torch.nn.Module:: Returns a PyTorch module that computes the loss for a speech recognition task. The name argument specifies the type of loss to use, and the blank_id and pad_id arguments are used to configure the loss function.
get_optimizer(model: torch.nn.Module, trainer_config) -> Union[torch.optim.Optimizer, IScheduler]:: Returns a PyTorch optimizer or learning rate scheduler for training a speech model. The model argument is the PyTorch module to be trained, and the trainer_config argument is a configuration object containing the hyperparameters for training.
get_trainer(trainer_config, data_config, model_config, rank=0, world_size=1) -> ITrainer:: Returns a speech task trainer object. The trainer_config argument is a configuration object containing the hyperparameters for training, the data_config argument is a configuration object containing the parameters the training data, the model_config argument is a configuration object containing the parameters for building the speech model, and the rank and world_size arguments are used for distributed training.

speeq.trainers.registry.get_asr_trainer(trainer_config: TrainerConfig, data_config: ASRDataConfig, model_config: ModelConfig, rank: int = 0, world_size: int = 1) → ITrainer[source]

Creates an ASR trainer object for training a speech recognition model.

Args:

trainer_config (TrainerConfig): A configuration object that specifies settings for the trainer.

data_config (ASRDataConfig): A configuration object that specifies settings for the data used in training.

model_config (ModelConfig): A configuration object that specifies settings for the model architecture.

rank (int, optional): The rank of the current process, for distributed training. Defaults to 0.

world_size (int, optional): The number of processes for distributed training. Defaults to 1.

Returns:: ITrainer: An object that encapsulates the ASR trainer functionality.

speeq.trainers.registry.get_criterion(name: str, blank_id: int, pad_id: int, *args, **kwargs)[source]

This function generates and returns a module representing a criterion.

Args:

name (str): The name of the criterion.

blank_id (int): The ID for the blank symbol used in the criterion.

pad_id (int): The ID for the padding symbol used in the criterion.

Returns:

Module: The desired criterion module.

speeq.trainers.registry.get_optimizer(model, trainer_config) → Union[Optimizer, IScheduler][source]

This function generates and provides an optimizer or scheduler, based on the input model and training configuration.

Args:

model (Module): The model.

trainer_config (object): The configuration object for training.

Returns:

Union[Optimizer, IScheduler]: The optimizer or scheduler object that will be used for training.

Schedulers

This module provides various scheduler classes for adjusting learning rates during training, including the base Scheduler class and its implementation NoamScheduler. The SqueezeformerNoamScheduler is a modified version of the NoamScheduler specific to the Squeezeformer model.

Classes:

Scheduler: Implements the base scheduler class.
NoamScheduler: Implements the Noam scheduler.
SqueezeformerNoamScheduler: Implements the Noam scheduler with modifications for the Squeezeformer model.

class speeq.trainers.schedulers.NoamScheduler(params, optimizer: str, optimizer_args: dict, warmup_staps: int, d_model: int, *args, **kwargs)[source]

Bases: Scheduler

Implements the noam scheduler proposed in https://arxiv.org/abs/1706.03762

Args:

params (Iterable): The mdoel’s parameters.

optimizer (str): The name of the optimizer.

optimizer_args (dict): The optimizer’s arguments.

warmup_staps (int): The warmup steps.

d_model (int): The model dimension.

get_lr() → float[source]

state_dict() → dict[source]

class speeq.trainers.schedulers.Scheduler(params: Iterable, optimizer: str, optimizer_args: dict)[source]

Bases: IScheduler

Implements the base scheduler class.

Args:

params (Iterable): The mdoel’s parameters.

optimizer (str): The name of the optimizer ot be used.

optimizer_args (dict): The optimizer’s arguments.

load_state_dict(state_dict: dict) → None[source]

state_dict()[source]

step() → None[source]

zero_grad() → None[source]

class speeq.trainers.schedulers.SqueezeformerNoamScheduler(params: Iterable, optimizer: str, optimizer_args: dict, warmup_staps: int, lr_peak: Number, decay_rate: Number, t_peak: int, *args, **kwargs)[source]

Bases: NoamScheduler

Implements The Noam scheduler with the modifications presented in https://arxiv.org/abs/2206.00888

Args:

params (Iterable): The mdoel’s parameters.

optimizer (str): The name of the optimizer.

optimizer_args (dict): The optimizer’s arguments.

warmup_staps (int): The warmup steps.

lr_peak (Number): The peak value of the learning rate.

decay_rate (Number): The decay rate of the learning rate.

t_peak (Number): The number of steps to keep the peak learning rate for.

get_lr() → float[source]

state_dict() → dict[source]

Templates

Defines configuration templates for learning rate schedulers.

Classes:

BaseSchedulerTemplate(ITemplate): Base template for scheduler configuration.
NoamSchedulerTemp(BaseSchedulerTemplate): Template for Noam learning rate scheduler.
SqueezeformerNoamSchedulerTemp(BaseSchedulerTemplate): Template for modified Noam scheduler used in Squeezeformer models.

class speeq.trainers.templates.BaseSchedulerTemplate[source]

Bases: ITemplate

get_dict() → dict[source]

property name

property type

class speeq.trainers.templates.NoamSchedulerTemp(warmup_staps: int, d_model: int)[source]

Bases: BaseSchedulerTemplate

Noam scheduler template

Args:

warmup_staps (int): The warmup steps.

d_model (int): The model dimension.

d_model: int

warmup_staps: int

class speeq.trainers.templates.SqueezeformerNoamSchedulerTemp(warmup_staps: int, lr_peak: Number, decay_rate: Number, t_peak: int)[source]

Bases: BaseSchedulerTemplate

Noam scheduler with changes proposed in Squeezeformer paper template.

Args:

warmup_staps (int): The warmup steps.

lr_peak (Number): The peak value of the learning rate.

decay_rate (Number): The decay rate of the learning rate.

t_peak (Number): The number of steps to keep the peak learning rate for.

decay_rate: Number

lr_peak: Number

t_peak: int

warmup_staps: int

Trainers

This module contains different trainer classes, some of which utilize distributed data parallelism (DDP), as well as a launch_training_job function.

Trainers:

BaseTrainer: A basic trainer module.
BaseDistTrainer: A basic distributed data parallel trainer module that is a subclass of BaseTrainer.
CTCTrainer: A trainer module for CTC-based models that is a subclass of BaseTrainer.
DistCTCTrainer: A trainer module for CTC models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and CTCTrainer.
Seq2SeqTrainer: A trainer module for Seq2Seq models that is a subclass of BaseTrainer.
DistSeq2SeqTrainer: A trainer module for Seq2Seq models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and Seq2SeqTrainer.
TransducerTrainer: A trainer module for transducer-based models that is a subclass of BaseTrainer.
DistTransducerTrainer: A trainer module for transducer models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and TransducerTrainer.

Function:

launch_training_job: A function that launches a training job for a given configuration of trainer, data, and model objects. It takes in three arguments: trainer_config which is an object containing the configuration for the trainer, data_config which is an object containing the configuration for the data, and model_config which is an object containing the configuration for the model. The function returns None.

class speeq.trainers.trainers.BaseDistTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, logger: ILogger, outdir: Union[str, Path], rank: int, world_size: int, dist_address: str, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history={})[source]

Bases: BaseTrainer

Builds the basic distributed data parallel trainer module

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

rank (int): The process index.

world_size (int): The number of nodes/processes.

dist_address (str): The address of the master node.

dist_port (int): The port of the master node.

dist_backend (str): The backend used for DDP.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

backward_pass(loss: Tensor) → None[source]

This method performs a backward pass on the model parameters to update them based on the provided loss tensor.

Args:: loss (Tensor): The loss tensor.

fit()[source]: Fits the model on the training data, and logs the results on the master node only.

init_dist()[source]: initialize the distributed training process

property is_master

train() → float[source]

The main training loop that run on one of the processes, where the function iterate over the training examples and perform forward and backward pass.

Returns:

float: The average loss over all training examples from all processes.

class speeq.trainers.trainers.BaseTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: ITrainer

Builds the basic trainer module

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

backward_pass(loss: Tensor) → None[source]

This method performs a backward pass on the model parameters to update them based on the provided loss tensor.

Args:: loss (Tensor): The loss tensor.

fit()[source]: Fits the model on the training data.

inline_log(key: str, category: str, value: int)[source]

property is_master

test() → float[source]

Performing a model test on the testing data

Returns:: float: The average test loss.

train() → float[source]

The main training loop, where the function iterate over the training examples and perform forward and backward pass.

Returns:

float: The average loss over all training examples.

train_step(batch: Tuple[Tensor]) → float[source]: This method represents a single step in the training process. It performs a forward pass, calculates the loss, and then performs a backward pass to update the model parameters.

Args:

batch (Tuple[Tensor]): The input batch to be processed. Returns:

float: The loss value for this step.

class speeq.trainers.trainers.CTCTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseTrainer

A trainer module for CTC-based models.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

forward_pass(batch: Tuple[Tensor]) → Tensor[source]

class speeq.trainers.trainers.DistCTCTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseDistTrainer, CTCTrainer

A trainer module for CTC models that utilizes distributed data parallelism.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

rank (int): The process index.

world_size (int): The number of nodes/processes.

dist_address (str): The address of the master node.

dist_port (int): The port of the master node.

dist_backend (str): The backend used for DDP.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

class speeq.trainers.trainers.DistSeq2SeqTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseDistTrainer, Seq2SeqTrainer

A trainer module for Seq2Seq models that utilizes distributed data parallelism.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

rank (int): The process index.

world_size (int): The number of nodes/processes.

dist_address (str): The address of the master node.

dist_port (int): The port of the master node.

dist_backend (str): The backend used for DDP.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

class speeq.trainers.trainers.DistTransducerTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseDistTrainer, TransducerTrainer

A trainer module for transducer models that utilizes distributed data parallelism.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

rank (int): The process index.

world_size (int): The number of nodes/processes.

dist_address (str): The address of the master node.

dist_port (int): The port of the master node.

dist_backend (str): The backend used for DDP.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

class speeq.trainers.trainers.Seq2SeqTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseTrainer

A trainer module for Seq2Seq models.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

forward_pass(batch: Tuple[Tensor]) → Tensor[source]

class speeq.trainers.trainers.TransducerTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]

Bases: BaseTrainer

A trainer module for transducer-based models.

Args:

optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.

criterion (Module): The loss fucntion that will be used during the training process.

model (Module): The model.

train_loader (ILoader): The loader for the training data.

test_loader (ILoader): The loader for the testing data.

epochs (int): The number of epochs.

log_steps_frequency (int): The frequency at which to log results.

logger (ILogger): The logger to be used.

outdir (Union[str, Path]): The directory to save checkpoints.

grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.

grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.

grad_clip_norm_type (float): The type of p-norm used. Default 2.0.

history (dict): The training history, if available. Default {}.

forward_pass(batch: Tuple[Tensor]) → Tensor[source]

This method conducts a forward pass on the CTC model.

Args:

batch (Tuple[Tensor]): The input batch containing the speech, speech length, text, and text length tensors, in that order.

Returns:

Tensor: A tensor representing the loss.

speeq.trainers.trainers.launch_training_job(trainer_config: object, data_config: object, model_config: object) → None[source]

Launches ASR training job by constructing a trainer from the passed configuration and run it on single or multiple GPUS.

Args:: trainer_config (object): Trainer configuration object. data_config (object): Data configuration object. model_config (object): Model configuration object.