Trainers
Criterions
Contains different loss functions used in various speech recognition models.
CTCLoss: Connectionist Temporal Classification loss function.
CrossEntropyLoss: Cross-entropy loss function.
NLLLoss: Negative log-likelihood loss function.
RNNTLoss: Recurrent Neural Network Transducer loss function.
- class speeq.trainers.criterions.CTCLoss(blank_id: int, reduction='mean', zero_infinity=False, *args, **kwargs)[source]
Bases:
CTCLossThe CTC loss.
Args:
blank_id (int): The blank id.
reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.
zero_infinity (bool, optional): Whether to zero infinite losses and the associated gradients. Default: False Infinite losses mainly occur when the inputs are too short to be aligned to the targets.
- blank: int
- zero_infinity: bool
- class speeq.trainers.criterions.CrossEntropyLoss(pad_id: int, reduction='mean', label_smoothing=0.0, *args, **kwargs)[source]
Bases:
CrossEntropyLosscomputes the cross entropy loss between input logits and target.
Args:
pad_id (int): The padding id.
reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.
label_smoothing (float, optional): A float in [0.0, 1.0]. Specifies the amount of smoothing when computing the loss. Default 0.0.
- forward(input, target, *args, **kwargs)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- ignore_index: int
- label_smoothing: float
- class speeq.trainers.criterions.NLLLoss(pad_id: int, reduction='mean', *args, **kwargs)[source]
Bases:
NLLLosscomputes the negative log likelihood loss.
Args:
pad_id (int): The padding id.
reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.
- forward(input, target, *args, **kwargs)[source]
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- ignore_index: int
- class speeq.trainers.criterions.RNNTLoss(blank_id: int, reduction='mean', *args, **kwargs)[source]
Bases:
RNNTLosscomputes the RNNT loss.
Args:
blank_id (int): The blank id.
reduction (str, optional): Specifies the reduction to apply to the output. Default to “mean”.
- forward(logits: Tensor, logits_len: Tensor, targets: Tensor, target_len: Tensor) Tensor[source]
- Args:
- logits (Tensor): Tensor of dimension (batch, max seq length, max target length + 1, class)
containing output from joiner
targets (Tensor): Tensor of dimension (batch, max target length) containing targets with zero padded logit_lengths (Tensor): Tensor of dimension (batch) containing lengths of each sequence from encoder target_lengths (Tensor): Tensor of dimension (batch) containing lengths of targets for each sequence
- Returns:
Tensor: Loss with the reduction option applied. If
reductionis"none", then size (batch), otherwise scalar.
- training: bool
- speeq.trainers.criterions.get_flatten_results(input: Tensor, target: Tensor) Tuple[Tensor, Tensor][source]
Flatten the results by making the input that of shape [B, M, C] to be of shape [B * M, C] and the target of shape [B, M] to be of shape [B * M]
Args:
input (Tensor): The predictions of shape [B, M, C].
target (Tensor): The target tensor of shape [B, M]
Returns:
Tuple[Tensor, Tensor]: Atuple of the flatten results.
- speeq.trainers.criterions.remove_positionals(input: Tensor, target: Tensor) Tuple[Tensor, Tensor][source]
Removes the SOS from the target and EOS prediction from the input
Args:
input (Tensor): The input tensor of shape [B, M, C].
target (Tensor): The target tensor of shape [B, C]
Returns:
Tuple[Tensor, Tensor]: The input and target.
Registry
A factory for creating speech task trainers.
This module provides functions to create various objects needed for training speech models, such as loss functions, optimizers, and trainers. The functions are implemented as factory methods, which allow for abstracting object creation and facilitate the creation of customized trainers.
- Functions:
- get_criterion(name: str, blank_id: int, pad_id: int) -> torch.nn.Module:
Returns a PyTorch module that computes the loss for a speech recognition task. The name argument specifies the type of loss to use, and the blank_id and pad_id arguments are used to configure the loss function.
- get_optimizer(model: torch.nn.Module, trainer_config) -> Union[torch.optim.Optimizer, IScheduler]:
Returns a PyTorch optimizer or learning rate scheduler for training a speech model. The model argument is the PyTorch module to be trained, and the trainer_config argument is a configuration object containing the hyperparameters for training.
- get_trainer(trainer_config, data_config, model_config, rank=0, world_size=1) -> ITrainer:
Returns a speech task trainer object. The trainer_config argument is a configuration object containing the hyperparameters for training, the data_config argument is a configuration object containing the parameters the training data, the model_config argument is a configuration object containing the parameters for building the speech model, and the rank and world_size arguments are used for distributed training.
- speeq.trainers.registry.get_asr_trainer(trainer_config: TrainerConfig, data_config: ASRDataConfig, model_config: ModelConfig, rank: int = 0, world_size: int = 1) ITrainer[source]
Creates an ASR trainer object for training a speech recognition model.
Args:
trainer_config (TrainerConfig): A configuration object that specifies settings for the trainer.
data_config (ASRDataConfig): A configuration object that specifies settings for the data used in training.
model_config (ModelConfig): A configuration object that specifies settings for the model architecture.
rank (int, optional): The rank of the current process, for distributed training. Defaults to 0.
world_size (int, optional): The number of processes for distributed training. Defaults to 1.
- Returns:
ITrainer: An object that encapsulates the ASR trainer functionality.
- speeq.trainers.registry.get_criterion(name: str, blank_id: int, pad_id: int, *args, **kwargs)[source]
This function generates and returns a module representing a criterion.
Args:
name (str): The name of the criterion.
blank_id (int): The ID for the blank symbol used in the criterion.
pad_id (int): The ID for the padding symbol used in the criterion.
Returns:
Module: The desired criterion module.
- speeq.trainers.registry.get_optimizer(model, trainer_config) Union[Optimizer, IScheduler][source]
This function generates and provides an optimizer or scheduler, based on the input model and training configuration.
Args:
model (Module): The model.
trainer_config (object): The configuration object for training.
Returns:
Union[Optimizer, IScheduler]: The optimizer or scheduler object that will be used for training.
Schedulers
This module provides various scheduler classes for adjusting learning rates during training, including the base Scheduler class and its implementation NoamScheduler. The SqueezeformerNoamScheduler is a modified version of the NoamScheduler specific to the Squeezeformer model.
Classes:
Scheduler: Implements the base scheduler class.
NoamScheduler: Implements the Noam scheduler.
SqueezeformerNoamScheduler: Implements the Noam scheduler with modifications for the Squeezeformer model.
- class speeq.trainers.schedulers.NoamScheduler(params, optimizer: str, optimizer_args: dict, warmup_staps: int, d_model: int, *args, **kwargs)[source]
Bases:
SchedulerImplements the noam scheduler proposed in https://arxiv.org/abs/1706.03762
Args:
params (Iterable): The mdoel’s parameters.
optimizer (str): The name of the optimizer.
optimizer_args (dict): The optimizer’s arguments.
warmup_staps (int): The warmup steps.
d_model (int): The model dimension.
- class speeq.trainers.schedulers.Scheduler(params: Iterable, optimizer: str, optimizer_args: dict)[source]
Bases:
ISchedulerImplements the base scheduler class.
Args:
params (Iterable): The mdoel’s parameters.
optimizer (str): The name of the optimizer ot be used.
optimizer_args (dict): The optimizer’s arguments.
- class speeq.trainers.schedulers.SqueezeformerNoamScheduler(params: Iterable, optimizer: str, optimizer_args: dict, warmup_staps: int, lr_peak: Number, decay_rate: Number, t_peak: int, *args, **kwargs)[source]
Bases:
NoamSchedulerImplements The Noam scheduler with the modifications presented in https://arxiv.org/abs/2206.00888
Args:
params (Iterable): The mdoel’s parameters.
optimizer (str): The name of the optimizer.
optimizer_args (dict): The optimizer’s arguments.
warmup_staps (int): The warmup steps.
lr_peak (Number): The peak value of the learning rate.
decay_rate (Number): The decay rate of the learning rate.
t_peak (Number): The number of steps to keep the peak learning rate for.
Templates
Defines configuration templates for learning rate schedulers.
Classes:
BaseSchedulerTemplate(ITemplate): Base template for scheduler configuration.
NoamSchedulerTemp(BaseSchedulerTemplate): Template for Noam learning rate scheduler.
SqueezeformerNoamSchedulerTemp(BaseSchedulerTemplate): Template for modified Noam scheduler used in Squeezeformer models.
- class speeq.trainers.templates.BaseSchedulerTemplate[source]
Bases:
ITemplate- property name
- property type
- class speeq.trainers.templates.NoamSchedulerTemp(warmup_staps: int, d_model: int)[source]
Bases:
BaseSchedulerTemplateNoam scheduler template
Args:
warmup_staps (int): The warmup steps.
d_model (int): The model dimension.
- d_model: int
- warmup_staps: int
- class speeq.trainers.templates.SqueezeformerNoamSchedulerTemp(warmup_staps: int, lr_peak: Number, decay_rate: Number, t_peak: int)[source]
Bases:
BaseSchedulerTemplateNoam scheduler with changes proposed in Squeezeformer paper template.
Args:
warmup_staps (int): The warmup steps.
lr_peak (Number): The peak value of the learning rate.
decay_rate (Number): The decay rate of the learning rate.
t_peak (Number): The number of steps to keep the peak learning rate for.
- decay_rate: Number
- lr_peak: Number
- t_peak: int
- warmup_staps: int
Trainers
This module contains different trainer classes, some of which utilize distributed data parallelism (DDP), as well as a launch_training_job function.
Trainers:
BaseTrainer: A basic trainer module.
BaseDistTrainer: A basic distributed data parallel trainer module that is a subclass of BaseTrainer.
CTCTrainer: A trainer module for CTC-based models that is a subclass of BaseTrainer.
DistCTCTrainer: A trainer module for CTC models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and CTCTrainer.
Seq2SeqTrainer: A trainer module for Seq2Seq models that is a subclass of BaseTrainer.
DistSeq2SeqTrainer: A trainer module for Seq2Seq models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and Seq2SeqTrainer.
TransducerTrainer: A trainer module for transducer-based models that is a subclass of BaseTrainer.
DistTransducerTrainer: A trainer module for transducer models that utilizes distributed data parallelism, which is a subclass of both BaseDistTrainer and TransducerTrainer.
Function:
launch_training_job: A function that launches a training job for a given configuration of trainer, data, and model objects. It takes in three arguments: trainer_config which is an object containing the configuration for the trainer, data_config which is an object containing the configuration for the data, and model_config which is an object containing the configuration for the model. The function returns None.
- class speeq.trainers.trainers.BaseDistTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, logger: ILogger, outdir: Union[str, Path], rank: int, world_size: int, dist_address: str, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history={})[source]
Bases:
BaseTrainerBuilds the basic distributed data parallel trainer module
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
rank (int): The process index.
world_size (int): The number of nodes/processes.
dist_address (str): The address of the master node.
dist_port (int): The port of the master node.
dist_backend (str): The backend used for DDP.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- backward_pass(loss: Tensor) None[source]
This method performs a backward pass on the model parameters to update them based on the provided loss tensor.
- Args:
loss (Tensor): The loss tensor.
- property is_master
- class speeq.trainers.trainers.BaseTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
ITrainerBuilds the basic trainer module
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- backward_pass(loss: Tensor) None[source]
This method performs a backward pass on the model parameters to update them based on the provided loss tensor.
- Args:
loss (Tensor): The loss tensor.
- property is_master
- test() float[source]
Performing a model test on the testing data
- Returns:
float: The average test loss.
- train() float[source]
The main training loop, where the function iterate over the training examples and perform forward and backward pass.
Returns:
float: The average loss over all training examples.
- train_step(batch: Tuple[Tensor]) float[source]
This method represents a single step in the training process. It performs a forward pass, calculates the loss, and then performs a backward pass to update the model parameters.
Args:
batch (Tuple[Tensor]): The input batch to be processed. Returns:
float: The loss value for this step.
- class speeq.trainers.trainers.CTCTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseTrainerA trainer module for CTC-based models.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- class speeq.trainers.trainers.DistCTCTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseDistTrainer,CTCTrainerA trainer module for CTC models that utilizes distributed data parallelism.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
rank (int): The process index.
world_size (int): The number of nodes/processes.
dist_address (str): The address of the master node.
dist_port (int): The port of the master node.
dist_backend (str): The backend used for DDP.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- class speeq.trainers.trainers.DistSeq2SeqTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseDistTrainer,Seq2SeqTrainerA trainer module for Seq2Seq models that utilizes distributed data parallelism.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
rank (int): The process index.
world_size (int): The number of nodes/processes.
dist_address (str): The address of the master node.
dist_port (int): The port of the master node.
dist_backend (str): The backend used for DDP.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- class speeq.trainers.trainers.DistTransducerTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, logger: ILogger, outdir: Union[str, Path], log_steps_frequency: int, rank: int, world_size: int, dist_address: int, dist_port: int, dist_backend: str, grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseDistTrainer,TransducerTrainerA trainer module for transducer models that utilizes distributed data parallelism.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
rank (int): The process index.
world_size (int): The number of nodes/processes.
dist_address (str): The address of the master node.
dist_port (int): The port of the master node.
dist_backend (str): The backend used for DDP.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- class speeq.trainers.trainers.Seq2SeqTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseTrainerA trainer module for Seq2Seq models.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- class speeq.trainers.trainers.TransducerTrainer(optimizer: Union[Optimizer, IScheduler], criterion: Module, model: Module, train_loader: IDataLoader, test_loader: IDataLoader, epochs: int, log_steps_frequency: int, device: str, logger: ILogger, outdir: Union[str, Path], grad_acc_steps: int = 1, grad_clip_thresh: Union[None, float] = None, grad_clip_norm_type: float = 2.0, history: dict = {})[source]
Bases:
BaseTrainerA trainer module for transducer-based models.
- Args:
optimizer (Union[Optimizer, IScheduler]): The optimizer or the wrapped optimizer that will be used during the training.
criterion (Module): The loss fucntion that will be used during the training process.
model (Module): The model.
train_loader (ILoader): The loader for the training data.
test_loader (ILoader): The loader for the testing data.
epochs (int): The number of epochs.
log_steps_frequency (int): The frequency at which to log results.
logger (ILogger): The logger to be used.
outdir (Union[str, Path]): The directory to save checkpoints.
grad_acc_steps (int): The number of steps to accumulate gradients over. Default 1.
grad_clip_thresh (Union[None, float]): The maximum norm of the gradients. Default None.
grad_clip_norm_type (float): The type of p-norm used. Default 2.0.
history (dict): The training history, if available. Default {}.
- speeq.trainers.trainers.launch_training_job(trainer_config: object, data_config: object, model_config: object) None[source]
Launches ASR training job by constructing a trainer from the passed configuration and run it on single or multiple GPUS.
- Args:
trainer_config (object): Trainer configuration object. data_config (object): Data configuration object. model_config (object): Model configuration object.