Data

Augmenters

This module offers a variety of data augmentation techniques that operate on either the time or frequency domain. All of the augmenters implemented in this module have an abstract method named run, which is either an instance of StochasticProcess or IProcess. This method applies the augmentation to the input signal.

The classes implemented in this module can be divided into two groups based on the domain they operate on.

The time domain augmenters include:

WhiteNoiseInjector: Adds white noise to the input signal.
VolumeChanger: Changes the volume of the input signal by applying a random gain.
ConsistentAttenuator: Attenuates the amplitude of the input signal by a random single value.
VariableAttenuator: Attenuates the amplitude of the input signal by applying a random gain, where the gain varies across time steps.
Reverberation: Adds a reverberation effect to the input signal.

The frequency domain augmenters are:

FrequencyMasking: Masks a random frequency pins in the input spectrogram.
TimeMasking: Masks a random time segment in the input spectrogram.

# Import the module
import torch
from speeq.data import augmenters

# creating dummy signal
signal = torch.randn(1, 100)

# Create an instance of the augmenter
# Will use WhiteNoiseInjector example for illustration
noise_injector = augmenters.WhiteNoiseInjector()

# Apply the augmentation to the signal
augmented_signal = noise_injector.run(signal)

class speeq.data.augmenters.ConsistentAttenuator(ratio=1.0, min_gain=0.1)[source]

Bases: VolumeChanger

applies amplitude attenuation to the input signal by multiplying it by a random gain that is less than 1, such that the gain is consistent across all time steps. The augmented signal x_augmented is given by the following equation:

\[x_{augmented} = x \cdot U\]

where U is a random number between min_gain and 1, and x is input time domain signal.

Args:

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

min_gain (float): The minimum gain that will be multiplied by the signal. Default 0.1

class speeq.data.augmenters.FrequencyMasking(n: int, max_length: int, ratio=1.0)[source]

Bases: _BaseMasking

Mask the inpus spectrogram, on the frequency axis.

Args:

n (int): The number of times to apply the masking operation.

max_length (int): The maximum masking length.

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

func(x: Tensor) → Tensor[source]: x (Tensor): the input spectrogram to be augmented of shape […, time, freq].

class speeq.data.augmenters.Reverberation(ratio=1.0, min_len=1000, max_len=4000, start_val=-10, end_val=10, eps=0.001)[source]

Bases: StochasticProcess

Reverberates the input signal by generating an impulse response and convolve it with the speech signal.

Args:

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

min_len (int): The minimum impulse response to generate. Default 1000.

max_len (int): The maximum impulse response length. Default 4000.

start_val (int): The starting value of the impulse response genration function. Default -10.

end_val (int): The end value of the impulse response genration function. Default 10.

eps (float): smoothing value, to prevent devision by 0. Default to 1e-3.

func(x: Tensor)[source]

class speeq.data.augmenters.TimeMasking(n: int, max_length: int, ratio=1.0)[source]

Bases: _BaseMasking

Mask the inpus spectrogram, on the time axis.

Args:

n (int): The number of times to apply the masking operation.

max_length (int): The maximum masking length.

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

func(x: Tensor) → Tensor[source]: x (Tensor): the input spectrogram to be augmented of shape […, time, freq].

class speeq.data.augmenters.VariableAttenuator(ratio=1.0, noise_mul=0.5)[source]

Bases: StochasticProcess

applies random attenuation to an input signal by multiplying it with a random gain less than 1. The amount of attenuation varies across time steps. The function uses the following equation to apply the attenuation

\[x_{augmented} = x \cdot U \cdot noise\_mul\]

where x is the input time-domain signal, U is a random Gaussian noise with values between 0 and 1 and the same shape as x.

Args:

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

noise_mul (float): The noise multiplier. Default 0.5

func(x: Tensor)[source]

class speeq.data.augmenters.VolumeChanger(min_gain: float, max_gain: float, ratio=1.0)[source]

Bases: StochasticProcess

Amplifies the input signal by a random gain.

This changes the amplitude of the input time domain signal x by multiplying it with a random gain gain, which is computed using the following equation:

\[gain = (max\_gain - min\_gain) \cdot U + min\_gain\]

where U is a random number between 0 and 1. The resulting amplified signal x_augmented can be computed as follows:

\[x_{augmented} = x \cdot gain\]

Args:

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

min_gain (float): The minimum gain that will be multiplied by the signal.

max_gain (float): The maximum gain that will be multiplied by the signal.

func(x: Tensor) → Tensor[source]

class speeq.data.augmenters.WhiteNoiseInjector(ratio=1.0, gain_mul=0.05)[source]

Bases: StochasticProcess

Injects random Gaussian noise to the original signal, this is done by adding the inpus signal x to randomly generated Gaussian noise multiplied by a random gain as the below equation shows:

\[x_{augmented} = x + noise \cdot gain \cdot gain\_mul\]

where gain is a random number between 0 and 1, and x is a signal in the time domain.

Args:

ratio (float): The ratio/rate that the augmentation will be applied to the data. Default 1.0

gain_mul (float): The gain multiplier factor to control the strength of the noise. Default 0.05

func(x: Tensor) → Tensor[source]

Data Loaders

This module contains classes for loading and building data loaders.

Dataset Classes:

CSVDataset: A base dataset class for handling CSV datasets.
SpeechTextDataset: A dataset class for speech-text pairs.

Data loader classes

SpeechTextLoader: An iterable data loader class for speech-text pairs.

The CSVDataset class provides a generic base class for handling CSV datasets, while the SpeechTextDataset class is specifically designed for speech-text pairs. The SpeechTextLoader class builds an iterable data loader for speech-text pairs, which can be used for training speech recognition models.

class speeq.data.loaders.CSVDataset(data_path: Union[str, Path], sep: str = ',', encoding='utf-8', sort_key: Optional[str] = '', reverse: bool = False)[source]

Bases: IDataset

A base dataset class for handling CSV datasets.

Args:

data_path (Union[str, Path]): The file path of the CSV dataset.

sep (str): The separator used in the CSV file. Default is ‘,’.

encoding (str): The encoding of the CSV file. Default is “utf-8”.

sort_key (Optional[str]): The key to sort the data on. Default is an empty string.

reverse (bool): Used to specify the sorting order. If set to False, data will be sorted in ascending order. If set to True, data will be sorted in descending order. Default is False.

class speeq.data.loaders.SpeechTextDataset(data_path: Union[str, Path], tokenizer: ITokenizer, speech_processor: IProcessor, text_processor: IProcessor, sep: str, add_sos=False, add_eos=False, encoding='utf-8', text_key: Optional[str] = 'text', speech_key: Optional[str] = 'file_path', sort_key: Optional[str] = '', reverse: bool = False)[source]

Bases: CSVDataset

Implements a basic dataset for speech-text pairs to be used in speech-recognition.

Args:

data_path (Union[str, Path]): The file path for the data in CSV format.

tokenizer (ITokenizerITokenizer): The tokenizer that will be used to process the text data.

speech_processor (IProcessor): The speech processor, where the run method returns the speech data with shape [B] or [1, M], or […, M, F].

text_processor (IProcessor): The text processor.

sep (str): The separator used in the CSV file.

add_sos (bool): A flag that indicates whether to add the Start of Sequence (SOS) token to the text sequence. Default is False.

add_eos (bool): A flag that indicates whether to add the End of Sequence (EOS) token to the text sequence. Default is False.

encoding (Optional[str]): The file encoding. Default “utf-8”.

text_key (Optional[str]): The name of the column that holds the text data. Default ‘text’.

speech_key (Optional[str]): The name of the column that holds the audio file path. Default ‘file_path’

sort_key (Optional[str]): The key to sort the data on. Default ‘’.

reverse (bool): A flag used if a sorting key is passed. If set to False, data will be sorted in ascending order. If set to True, data will be sorted in descending order. Default is False.

Example:

# Import the module
from speeq.data.loaders import SpeechTextDataset
from speeq.data.tokenizers import CharTokenizer
from speeq.data.processors import OrderedProcessor
from speeq.data.processes import AudioLoader
sample_rate = 16000
sep = ','
file_path = 'file.csv'

# creating a dummy tokenizer and processors
tokenizer = CharTokenizer()
speech_processor = OrderedProcessor(
    [
        AudioLoader(sample_rate=sample_rate),
    ]
    )
text_processor = OrderedProcessor([])
tokenizer.add_sos_token().add_eos_token()

# Create an instance of the dataset
dataset = SpeechTextDataset(
    data_path=file_path,
    tokenizer=tokenizer,
    speech_processor=speech_processor,
    text_processor=text_processor,
    sep=sep,
    add_sos=True
    )

# to get the first item of the dataset
speech, speech_len, text, text_len = dataset[0]

# to get the number of examples in the dataset
length = len(dataset)

# to iterate over the dataset
for speech, speech_len, text, text_len in dataset:
    pass

class speeq.data.loaders.SpeechTextLoader(dataset: object, batch_size: int, text_padder: IPadder, speech_padder: IPadder, rank: int = 0, world_size: int = 1, shuffle: bool = False)[source]

Bases: _DataLoader

Builds an iterable data loader for speech-text pairs.

Args:

dataset (object): The dataset to be loaded, the __getitem__ method of the dataset should return a tuple contains the below in order:

The speech tensor of shape [1, M, f]
The speech length as integer value equal to M
The text tensor of shape [N]
The text length as integer value equal to N

batch_size (int): The size of each batch.

text_padder (IPadder): The padder for the text data.

speech_padder (IPadder): The padder for the speech data.

rank (int): The process rank used in distributed data-parallel setting. Default is 0.

world_size (int): The number of total processes used in distributed data-parallel settings. Default is 1.

shuffle (bool): A flag indicating whether the dataset should be shuffled at each iteration. Default is False.

Example:

# Import the module
from speeq.data.loaders import SpeechTextDataset, SpeechTextLoader
from speeq.data.padders import DynamicPadder
from speeq.data.tokenizers import CharTokenizer
from speeq.data.processors import OrderedProcessor
from speeq.data.processes import AudioLoader, FeatExtractor
batch_size = 4
sample_rate = 16000
sep = ','
file_path = 'clean_data.csv'

# creating a dummy tokenizer, processors, and padders
tokenizer = CharTokenizer()
speech_processor = OrderedProcessor(
    [
        AudioLoader(sample_rate=sample_rate),
        FeatExtractor(feat_ext_name='mfcc', feat_ext_args={})
    ]
    )
text_processor = OrderedProcessor([])
tokenizer.add_sos_token().add_eos_token()
speech_padder = DynamicPadder(dim=1, pad_val=0.0)
text_padder = DynamicPadder(dim=0, pad_val=-1)

# Create an instance of a dataset
dataset = SpeechTextDataset(
    data_path=file_path,
    tokenizer=tokenizer,
    speech_processor=speech_processor,
    text_processor=text_processor,
    sep=sep,
    add_sos=True
    )

# Create an instance of the data loader
loader = SpeechTextLoader(
    dataset=dataset,
    batch_size=batch_size,
    text_padder=text_padder,
    speech_padder=speech_padder
)

# to get the number of batches
n_batches = len(loader)

# to iterate over the loader
for batch in dataset:
    speech, speech_len, text, text_len = batch
    break

get_batch() → Tuple[Tensor, Tensor, Tensor, Tensor][source]

Prepares and returns a batch of examples

Returns:: Tuple[Tensor, Tensor, Tensor, Tensor]: A tuple containing the following tensors in order: speech tensor of shape [B, M, d], speech mask tensor of shape [B, M], text tensor of shape [B, M], and text mask tensor of shape [B, M].

Padders

The padders module provides two classes for padding input sequences: DynamicPadder and StaticPadder.

DynamicPadder pads an input sequence along a specified dimension to match the maximum sequence length, while StaticPadder is a subclass of DynamicPadder that also allows the user to specify the maximum length of the sequence to pad to.

Both classes have a pad method that accepts an input tensor and the maximum length to pad to, and returns the padded tensor and the length of the padding added.

Usage:

import torch
from speeq.data.padders import DynamicPadder, StaticPadder

# create a dummy input
input_tensor = torch.randn(1, 3, 7)

# Example usage of DynamicPadder
dynamic_padder = DynamicPadder(dim=1, pad_val=0)
padded_tensor, padding_length = dynamic_padder.pad(input_tensor, max_len=10)

# Example usage of StaticPadder
static_padder = StaticPadder(dim=1, pad_val=0, max_len=10)
padded_tensor, padding_length = static_padder.pad(input_tensor)

class speeq.data.padders.DynamicPadder(dim: int, pad_val: Union[int, Tensor, float], left_pad=False, *args, **kwargs)[source]

Bases: IPadder

Pads the input sequence across a dim for the maximum length

Args:

dim (int): The dimension to do the padding across.

pad_val (Union[int, Tensor, float]): The padding value that will be used to fill the padding sequence.

left_pad (int): The side to pad the padding sequence to.

pad(x: Tensor, max_len: int) → Tuple[Tensor, int][source]

Pads the input tensor to match the specified maximum length along the pre-defined dimension.

Args:

x (Tensor): The input tensor to be padded.

max_len (int): The maximum length to pad the input tensor to.

Returns:

Tuple[Tensor, int]: A tuple containing the padded tensor and the length of the padding added.

class speeq.data.padders.StaticPadder(dim: int, pad_val: Union[int, Tensor, float], max_len: int, left_pad=False, *args, **kwargs)[source]

Bases: DynamicPadder

A subclass of DynamicPadder that pads an input sequence to match a pre-defined maximum length along a specified dimension.

Args:

dim (int): The dimension to pad across.

pad_val (Union[int, Tensor, float]): The value used to fill the padded sequence.

max_len (int): The maximum length of the sequence to pad to.

left_pad (int): The side to which the sequence will be padded.

pad(x: Tensor, *args, **kwargs)[source]

Pads the input tensor to match the specified maximum length along the pre-defined dimension.

Args:: x (Tensor): The input tensor to be padded.
Returns:: Tuple[Tensor, int]: A tuple containing the padded tensor and the length of the padding added.

Data Processes

This module contains classes for speech processing that implement the IProcess interface.

Classes:

AudioLoader: Loads and resamples an audio file to the targeted sample rate.
FeatExtractor: Extracts frequency features from a given time domain signal, supporting mfcc and mel spectrogram.
FeatStacker: Implements feature stacking operation by stacking consecutive time stamps along the feature space.
FrameContextualizer: Implements frame contextualizer through time as described in https://arxiv.org/abs/1412.5567

All classes have a run method as an abstract method that applies the process on the input signal.

Example usage:

# Import required packages and modules
import torch
from speeq.data.processes import AudioLoader, FeatExtractor, FeatStacker, FrameContextualizer

# Define the audio file path
audio_path = 'path/to/audio.wav'

# Create an instance of AudioLoader
audio_loader = AudioLoader(sample_rate=16000)

# Load the audio file using AudioLoader
audio_tensor = audio_loader.run(audio_path)

# Create an instance of FeatExtractor
feat_extractor = FeatExtractor(feat_ext_name='mfcc', feat_ext_args={'n_mfcc': 13})

# Extract the MFCC features of the audio tensor using FeatExtractor
feat_tensor = feat_extractor.run(audio_tensor)

# Create an instance of FeatStacker
feat_stacker = FeatStacker(feat_stack_factor=2)

# Stack the features using FeatStacker
stacked_feat_tensor = feat_stacker.run(feat_tensor)

# Create an instance of FrameContextualizer
frame_contextualizer = FrameContextualizer(contex_size=2)

# Add context to the features using FrameContextualizer
contextualized_feat_tensor = frame_contextualizer.run(stacked_feat_tensor)

class speeq.data.processes.AudioLoader(sample_rate: int)[source]

Bases: IProcess

Loads and resamples audio to the specified sample rate.

Note

This class utilizes the load function provided by torchaudio framework for loading audio. For additional details on supported file formats and further information, please refer to the documentation.

Args:: sample_rate (int): The target sampling rate.

run(file_path: Union[Path, str]) → Tensor[source]

Load and resample an audio file.

Args:: file_path (Union[Path, str]): The path to the audio file to be loaded.
Returns:: Tensor: A tensor containing the speech data of shape [C, M].

class speeq.data.processes.FeatExtractor(feat_ext_name: str, feat_ext_args: dict)[source]

Bases: IProcess

A class for extracting frequency features from a given time domain signal, supporting mfcc and mel spectrogram features.

Note

This class utilizes the transforms.MelSpectrogram and transforms.MFCC classes provided by torchaudio framework for feature extraction. For additional details and parameter information, please refer to the documentation.

Args:

feat_ext_name (str): The name of the feature extractor to be used. either mfcc or melspec.

feat_ext_args (dict): The arguments to be passed to the specified feature extractor. For more information on parameters, please refer to the torchaudio documentation.

run(x: Tensor) → Tensor[source]

Transforms the input signal x from time domain to frequency domain using the predefined feature extractor.

Args:: x (Tensor): A time domain tensor of shape […, T, F].
Returns:: Tensor: A tensor containing the frequency domain features of shape […, T, F].

class speeq.data.processes.FeatStacker(feat_stack_factor: int)[source]

Bases: IProcess

A class that implements feature stacking by stacking n consecutive time stamps along the feature space.

Args:

feat_stack_factor (int): The factor by which to stack the features.

Example:

# Import required packages
import torch
from speeq.data.processes import FeatStacker

batch_size = 3
max_len = 10
feat_size = 15
stacking_factor = 2
# creating dummy data
input = torch.randn(batch_size, max_len, feat_size)

# Create an instance of the class
stacker = FeatStacker(feat_stack_factor=stacking_factor)

# Apply the process to the input
result = stacker.run(input)

# Print the result's shape
print(result.shape)  # torch.Size([3, 5, 30])

run(x: Tensor)[source]

Applies feature stacking to the input tensor x by stacking n consecutive time frames along the feature space.

Args:: x (Tensor): The input tensor of shape […, T, F]
Returns:: Tensor: The result tensor after applying feature stacking. The shape of the result tensor is [batch_size, seq_len // n, feat_dim * n].

class speeq.data.processes.FrameContextualizer(contex_size: int)[source]

Bases: IProcess

Implements frame contextualization through time, as described in https://arxiv.org/abs/1412.5567

Args:

contex_size (int): The context size, i.e., the number of left or right frames to consider with the current frame.

Example:

# Import required packages
import torch
from speeq.data.processes import FrameContextualizer

max_len = 10
feat_size = 15

# 2 to the left, the current time step and 2 to the right
contex_size = 2

# creating dummy data
input = torch.randn(1, max_len, feat_size)

# Create an instance of the class
contextualizer = FrameContextualizer(contex_size=contex_size)

# Apply the process to the input
result = contextualizer.run(input)

# Print the result's shape
print(result.shape)  # torch.Size([1, 10, 75])

run(x: Tensor) → Tensor[source]

Applies frame contextualization on the input tensor x.

Args:: x (Tensor): The input tensor of shape [1, M, F]
Returns:: Tensor: The output tensor of shape [1, M, F * (2 * context_size + 1)]

class speeq.data.processes.StochasticProcess(ratio: float)[source]

Bases: IProcess

An inteerface that applies the process functionality based on the ratio provided

Args:: ratio (float): The rate of applying the process on the input.

abstract func()[source]

run(x)[source]

Processors

“This module includes implementations of classes that fulfill the abstract class IProcessor and define the execute method as an interface. The following classes are available:

OrderedProcessor: applies a series of processes in a specific order.
StochasticProcessor: applies a sequence of processes in a randomized order.
SpeechProcessor: a higher-level class that wraps a sequence of processors used for speech processing.

Usage:

The classes in this module are designed to be used together to process speech signals. The SpeechProcessor class provides a high-level interface to the processing pipeline, while the OrderedProcessor and StochasticProcessor classes can be used to construct custom processing pipelines. The classes can be used as follows:

OrderedProcessor:

This class applies a sequence of processes in order.

Example:

from speeq.interfaces import IProcess
from speeq.data.processors import OrderedProcessor

class MyProcess1(IProcess):
    def run(self, x: Any) -> Any:
        # Process x here
        return x

class MyProcess2(IProcess):
    def run(self, x: Any) -> Any:
        # Process x here
        return x

processes = [MyProcess1(), MyProcess2()]
processor = OrderedProcessor(processes)
output = processor.execute(input_data)

StochasticProcessor:

This class applies a sequence of processes in a randomized order.

from speeq.interfaces import IProcess
from speeq.data.processors import StochasticProcessor

class MyProcess1(IProcess):
    def run(self, x: Any) -> Any:
        # Process x here
        return x

class MyProcess2(IProcess):
    def run(self, x: Any) -> Any:
        # Process x here
        return x

processes = [MyProcess1(), MyProcess2()]
processor = StochasticProcessor(processes)
output = processor.execute(input_data)

Note

All of the classes in this module inherit from the IProcessor abstract class and implement the execute method. This allows them to be used interchangeably in processing pipelines.

class speeq.data.processors.OrderedProcessor(processes: List[IProcess])[source]

Bases: IProcessor

Applies a list of provided processes in a specific order. The order of the processes is determined by their position in the list.

Args:: processes (List[IProcess]): A list of IProcess objects representing the processes to be applied in order.

Example:

# Import required packages
from speeq.data.processors OrderedProcessor
from speeq.data.processes import AudioLoader, FeatExtractor

input_data = 'path/to/file.wav'

# Define a list of processes
processes = [
    AudioLoader(sample_rate=16000),
    FeatExtractor(feat_ext_name='mfcc', feat_ext_args={'n_mfcc': 10})
]

# Create an instance of the OrderedProcessor class
processor = OrderedProcessor(processes=processes)

# Apply the list of processes in order to some input data
processed_data = processor.execute(input_data)

execute(x: Any) → Any[source]

Executes all processes on the input x in the order they were provided. The output of the previous process is used as the input for the next process.

Args:: x (Any): The input
Returns:: Any: The output data after applying all the processes in order.

class speeq.data.processors.SpeechProcessor(audio_processor: OrderedProcessor, audio_augmenter: Optional[Union[OrderedProcessor, StochasticProcessor]] = None, spec_processor: Optional[OrderedProcessor] = None, spec_augmenter: Optional[Union[OrderedProcessor, StochasticProcessor]] = None)[source]

Bases: IProcessor

Speech processor that applies a series of processing steps to audio data. The processing steps can be described as: spec_augmenter(spec_processor(audio_augmenter(audio_processor(file_path))))

Note

If feature extraction is needed, it has to be part of the spectrogram processor.

Args:

audio_processor (OrderedProcessor): The audio processor that takes the audio file path as input.

audio_augmenter (Optional[Union[OrderedProcessor, StochasticProcessor]]): The time-domain augmentation processor. Defaults to None.

spec_processor (Optional[OrderedProcessor]): The spectrogram processor that takes the signal in the time domain as input. If any feature extraction is needed, it has to be part of this processor. Defaults to None.

spec_augmenter (Optional[Union[OrderedProcessor, StochasticProcessor]]): The frequency-domain augmentation processor. Defaults to None.

execute(file_path: Union[str, Path])[source]

class speeq.data.processors.StochasticProcessor(processes: List[IProcess])[source]

Bases: OrderedProcessor

Applies the provided processes in a stochastic order. The order in which the processes are applied is randomized for each input, making this class suitable for data augmentation.

Args:: processes (List[IProcess]): A list of processes to be applied in a stochastic order.

execute(x: Any) → Any[source]

Executes all the processes on the input x in a randomly shuffled order.

Args:: x (Any): The input to be processed.
Returns:: Any: The output of the processed input.

Registry

This module serves as a factory for creating objects/instances from other classes by abstracting the creation process. It contains several functions that return instances of classes such as tokenizer, ASR datasets, and data loaders, making it easier to manage dependencies and abstract away the details of object creation.

Functions:

get_tokenizer: Returns an instance of a tokenizer object.
load_tokenizer: Loads and returns a pre-trained tokenizer instance.
get_asr_datasets: Returns instances of training and testing datasets for ASR tasks.
get_text_padder: Returns an instance of a text padder object.
get_speech_padder: Returns an instance of a speech padder object.
get_asr_loaders: Returns instances of training and testing data loaders for ASR tasks.

speeq.data.registry.get_asr_datasets(data_config: object, tokenizer: ITokenizer) → Tuple[IDataset, IDataset][source]

Creates train and test dataset objects based on the provided data configuration and tokenizer.

Args:

data_config (object): Data configuration object.

tokenizer (ITokenizer): The tokenizer to tokenize the test data.

Returns:

Tuple[IDataset, IDataset]: A tuple containing the train and test dataset objects.

speeq.data.registry.get_asr_loaders(data_config: object, tokenizer: ITokenizer, batch_size: int, world_size: int, rank: int) → Tuple[IDataLoader, IDataLoader][source]

Builds training and testing dataloaders.

Args:

data_config (object): Data configuration object.

tokenizer (ITokenizer): the text tokenizer.

batch_size (int): The batch size.

world_size (int): The number of nodes/gpus.

rank (int): the index of the current process/gpu will use the data loaders.

Returns:

Tuple[IDataLoader, IDataLoader]: The training and testing data loaders.

speeq.data.registry.get_speech_padder(data_config) → IPadder[source]

Creates a speech padding object.

Args:

data_config (object): The data configuration object.

pad_val (Union[float, int]): The value that will be used for padding.

Returns:

IPadder: A padder object that can be used to pad sequences of variable length to a fixed length, by adding padding values at the beginning or end of the sequence. The padding is applied along the first dimension of the input tensor.

speeq.data.registry.get_text_padder(data_config: object, pad_val: Union[float, int]) → IPadder[source]

Creates a text padding object.

Args:

data_config (object): The data configuration object.

pad_val (Union[float, int]): The value that will be used for padding.

Returns:

IPadder: A padder object that can be used to pad sequences of variable length to a fixed length, by adding padding values at the beginning or end of the sequence. The padding is applied along the zeroth dimension of the input tensor.

speeq.data.registry.get_tokenizer(data_config: object, data: Optional[List[str]] = None) → ITokenizer[source]

Creates a tokenizer based on the provided data configuration, or loads a pre-trained tokenizer from a file. If a pre-trained tokenizer path is not provided, the function trains the tokenizer on the provided data.

Args:

data_config (object): An object that contains the configuration for the data.

data (List[str], optional): A list of strings to train the tokenizer on. Defaults to None.

Returns:

ITokenizer: A tokenizer object.

speeq.data.registry.load_tokenizer(tokenizer_path: Union[Path, str]) → ITokenizer[source]

Loads a pre-trained tokenizer from the specified path.

Args:: tokenizer_path (Union[Path, str]): A path to the pre-trained tokenizer file.
Returns:: ITokenizer: An object representing the loaded tokenizer.

Tokenizers

class speeq.data.tokenizers.BaseTokenizer[source]

Bases: ITokenizer

add_blank_token(token='<BLANK>') → ITokenizer[source]: Adds BLANK token

add_eos_token(token='<EOS>') → ITokenizer[source]: Adds EOS token

add_oov_token(token='<OOV>') → ITokenizer[source]: Adds OOV token

add_pad_token(token='<PAD>') → ITokenizer[source]: Adds PAD token

add_sos_token(token='<SOS>') → ITokenizer[source]: Adds SOS token

add_token(token: str) → int[source]

Adds the provided token to the tokenizer.

Args:: token (str): The token to be added.
Returns:: int: The id of the token.

batch_detokenizer(data: List[int]) → list[source]

batch_tokenizer(data: List[str], add_sos=False, add_eos=False) → list[source]

ids2tokens(ids: List[int]) → List[str][source]

Converts a list of integers to a list of strings

Args:: ids (List[int]): The list of tokens ids.
Returns:: List[str]: A list of string.

load_tokenizer(tokenizer_path: Union[str, Path], *args, **kwargs) → ITokenizer[source]

Loads a pre-trained tokenizer.

Args:: tokenizer_path (Union[str, Path]): The pre-trained tokenizer path.
Returns:: ITokenizer: The loaded tokenizer.

load_tokenizer_from_dict(data: dict) → ITokenizer[source]

Loads a pre-trained tokenizer of type dict.

Args:: data (dict): The pre-trained tokenizer dictionary.
Returns:: ITokenizer: The loaded tokenizer.

save_tokenizer(save_path: Union[str, Path], *args, **kwargs) → None[source]

Saves the tokenizer to a json file

Args:: save_path (Union[str, Path]): The path to save the tokenizer to.

set_tokenizer(data: List[str], *args, **kwargs) → ITokenizer[source]

Sets/trains the tokenizer on the provided data.

Args:: data (List[str]): A list of all text sentences.
Returns:: ITokenizer: The trained tokenizer.

tokenize(sentence: str, add_sos=False, add_eos=False) → List[int][source]

Tokenizes the input sentence.

Args:

sentence (str): The sentence to be tokenized.

add_sos (bool, optional): A flag to whether added SOS token at the of the sequence. Defaults to False.

add_eos (bool, optional): A flag to whether add EOS token at the end of the sequence. Defaults to False.

Returns:

List[int]: The tokenized sequence.

property vocab_size: int

class speeq.data.tokenizers.CharTokenizer[source]

Bases: BaseTokenizer

Implements character based tokenizer.

get_tokens(data: List[str])[source]

preprocess_tokens(sentence: str) → List[str][source]

class speeq.data.tokenizers.WordTokenizer(sep=' ')[source]

Bases: BaseTokenizer

Implements white space based tokenizer.

get_tokens(data: List[str])[source]

preprocess_tokens(sentence: str) → List[str][source]