parlai.utils¶

ParlAI has very many utilities, roughly organized by function.

parlai.utils.bpe¶

Byte pair encoding (BPE).

Lots of BPE things for ParlAI

parlai.utils.bpe.bpe_factory(opt: Opt, shared: TShared) → BPEHelper[source]¶

BPE Helper Factory.

Returns the appropriate BPE helper given the opt as well as available libraries.

Parameters

opt – options
shared – shared dict

Return BPEHelper

returns the appropriate BPEHelper object

class parlai.utils.bpe.BPEHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶

Bases: ABC

Abstract BPE Helper.

BPE Helper subclasses must implement appropriate abstractmethods.

__init__(opt: Opt, shared: Optional[TShared] = None)[source]¶: Subclasses _should_ override __init__ to initialize other things.

enable_bpe_dropout(enabled: bool)[source]¶: Used to toggle BPE dropout on (True) or off (False).

encode(text: str) → List[str][source]¶

Tokenize text.

Checks for add_prefix_space; handles accordingly

NOTE: DO NOT OVERRIDE

Parameters: text – text to tokenize
Return tokens: A list of tokens

abstract helper_encode(text: str) → List[str][source]¶

Tokenize text.

Subclasses should override this method for encoding.

Parameters: text – text to tokenize
Return tokens: A list of tokens

decode(tokens: List[str], token_ids: List[int], delimiter: str = ' ') → str[source]¶

Decode list of tokens into a text string.

NOTE: DO NOT OVERRIDE

Parameters

tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens

Return text

decoded text

abstract helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) → str[source]¶

Decode list of tokens into text string.

Subclasses should override this method for decoding.

Parameters

tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens

Return text

decoded text

abstract sync_with_dict(dict_agent)[source]¶

Sync BPE Helper dictionary with dict_agent dict.

Parameters: dict_agent – agent with which we are syncing the dictionary

add_special_tokens(dict_agent, special_tokens: List[str])[source]¶

Add special tokens to the tokenizer.

These tokens are never split, and prioritized over the BPE tokenization.

finalize(frequencies: Dict[str, int], num_symbols: int, minfreq: int) → bool[source]¶

Build the codecs.

Default helpers are pre-trained and thus do not build their own codecs

Parameters

frequencies – dictionary of (token: frequency) pairs
num_symbols – Number of BPE symbols. Recommend 30000-40000. If <= 0, default 30000 will be used.
minfreq – Minimum frequency of a token before forced BPE decomposition. If <= 0 will use subword-nmt default of 2.

Return did_finalize

return whether codecs are finalized this call.

copy_codecs_file(target_file: str)[source]¶

Copy the codecs file to a new location.

Default behavior is to do nothing.

Parameters: target_file – where to copy the codecs.

should_sort() → bool[source]¶

Return whether tokens should be sorted for this particular helper.

DictionaryAgent sorts tokens upon saving; we don’t generally want to sort with our pre-trained dictionaries, so default is False.

class parlai.utils.bpe.SubwordBPEHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶

Bases: BPEHelper

Helper class for performing BPE subword tokenization.

For technical details, please refer to https://arxiv.org/abs/1508.07909. This class just wraps around the official subword-nmt repository.

This API expects the user to call tokenize() (encode) onto the training data, then call finalize() to learn the encodings, and then iterate over the data in a second pass, calling tokenize() again to get processed output.

__init__(opt: Opt, shared: Optional[TShared] = None)[source]¶

Initialize the BPE module.

Parameters

opt – options
shared – shared dictionary

add_special_tokens(dict_agent, special_tokens: List[str])[source]¶

Add special tokens to the tokenizer.

These tokens are never split, and prioritized over the BPE tokenization.

helper_encode(text: str) → List[str][source]¶

Tokenize the text with bpe if codecs are already finalized.

Otherwise, returns the regularly split tokens that will train the bpe.

Parameters: text – Raw text to tokenize.
Returns: a list of tokens. Will use BPE once finalized.

helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) → str[source]¶

Decode list of tokens into text string.

Parameters

tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens

Return text

decoded text

finalize(frequencies: Dict[str, int], num_symbols: int = 30000, minfreq: int = 2) → bool[source]¶

Build the codecs.

Parameters

frequencies – dictionary of (token: frequency) pairs
num_symbols – Number of BPE symbols. Recommend 30000-40000. If <= 0, default 30000 will be used.
minfreq – Minimum frequency of a token before forced BPE decomposition. If <= 0 will use subword-nmt default of 2.

Return did_finalize

return whether codecs are finalized this call.

copy_codecs_file(target_file: str)[source]¶

Copy the codecs file to a new location.

Parameters: target_file – where to copy the codecs.

sync_with_dict(dict_agent)[source]¶: No need to sync subword BPE.

should_sort() → bool[source]¶

Return whether tokens should be sorted for this particular helper.

We want to sort with SubwordBPEHelper.

class parlai.utils.bpe.Gpt2BpeHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶

Bases: BPEHelper

BPE Helper for GPT2 Models.

Original source:: https://github.com/openai/gpt-2/blob/main/src/encoder.py

Original license: MIT

This is a modified implementation from that of fairseq:: https://github.com/pytorch/fairseq/blob/main/fairseq/data/encoders/gpt2_bpe_utils.py

Fairseq license: MIT

__init__(opt: Opt, shared: Optional[TShared] = None)[source]¶: Override init to build the data.

bytes_to_unicode() → Dict[int, str]¶

Returns list of utf-8 byte and a corresponding list of unicode strings.

The reversible bpe codes work on unicode strings. This means you need a large # of unicode characters in your vocab if you want to avoid UNKs. When you’re at something like a 10B token dataset you end up needing around 5K for decent coverage. This is a significant percentage of your normal, say, 32K bpe vocab. To avoid that, we want lookup tables between utf-8 bytes and unicode strings. And avoids mapping to whitespace/control characters the bpe code barfs on.

get_pairs(word: Tuple[str, ...]) → Set[Tuple[str, str]][source]¶

Return set of symbol pairs in a word.

Word is represented as tuple of symbols (symbols being variable-length strings).

Parameters: word – word to symbolize
Return pairs: set of tuples of symbols

bpe(token: str) → str[source]¶

Convert token to BPE.

Parameters: token – token to convert
Return bpe_encoding: string bpe encoding

helper_encode(text: str) → List[str][source]¶

Tokenize text.

Parameters: text – text to tokenize
Return tokens: A list of tokens

helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) → str[source]¶

Decode list of tokens into text string.

Parameters

tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens

Return text

decoded text

sync_with_dict(dict_agent)[source]¶

Sync with dictionary agent.

Just add all of the tokens to the dict

NOTE: How does this handle special tokens?

Parameters: dict_agent – A DictionaryAgent instantiation

save(dir_name: str, file_name: str)[source]¶

Save appropriate files.

Parameters

dir_name – directory to save.
file_name – file to save.

class parlai.utils.bpe.HuggingFaceBpeHelper(opt: Opt, shared: Optional[TShared] = None)[source]¶

Bases: BPEHelper

HuggingFace’s ByteLevelBPE Tokenizer.

Fast because Rust.

__init__(opt: Opt, shared: Optional[TShared] = None)[source]¶: Subclasses _should_ override __init__ to initialize other things.

helper_encode(text: str) → List[str][source]¶

Decode list of tokens into text string.

Parameters

tokens – list of tokens
delimiter – string delimiter for tokens

Return text

decoded text

helper_decode(tokens: List[str], token_ids: List[int], delimiter: str) → str[source]¶

Decode list of tokens into text string.

Parameters

tokens – list of tokens
token_ids – list of token ids
delimiter – string delimiter for tokens

Return text

decoded text

add_special_tokens(dict_agent, special_tokens: List[str])[source]¶: Add special tokens to the tokenizer and dict_agent.

sync_with_dict(dict_agent)[source]¶

Sync the dictionary agent with Hugging Face tokenizer’s BPE dict.

Called only once on initialization.

save(dir_name: str, file_name: str)[source]¶

Save appropriate files.

Parameters

dir_name – directory to save.
file_name – file to save.

class parlai.utils.bpe.SlowBytelevelBPE(opt: Opt, shared: Optional[TShared] = None)[source]¶

Bases: Gpt2BpeHelper

Stand-in for HuggingFace if we do not have access to tokenizers.

Only EVER used for a model used in interactive mode that was previously trained with HF BPE.

sync_with_dict(dict_agent)[source]¶

Basically a combination of syncing HF dict with the GPT2 standard.

It’s kinda reversed.

Parameters: dict_agent – Dictionary Agent

parlai.utils.conversations¶

Utility methods for conversations format.

class parlai.utils.conversations.Metadata(datapath)[source]¶

Bases: object

Utility class for conversation metadata.

Metadata should be saved at <datapath>.metadata.

__init__(datapath)[source]¶

read()[source]¶: Read the relevant metadata.

classmethod save_metadata(datapath, opt, self_chat=False, speakers=None, **kwargs)[source]¶: Dump conversation metadata to file.

class parlai.utils.conversations.Turn(id=None, text=None, **kwargs)[source]¶

Bases: AttrDict

Utility class for a dialog turn.

__init__(id=None, text=None, **kwargs)[source]¶: Initialize AttrDict using input dict.

class parlai.utils.conversations.Conversation(episode)[source]¶

Bases: object

Utility class for iterating through a single episode.

Used in the context of the Conversations class.

__init__(episode)[source]¶

class parlai.utils.conversations.Conversations(datapath)[source]¶

Bases: object

Utility class for reading and writing from ParlAI Conversations format.

Conversations should be saved in JSONL format, where each line is a JSON of the following form:

WARNING: The data below must be on ONE LINE per dialogue in a conversation file or it will not load!!

__init__(datapath)[source]¶

classmethod save_conversations(act_list, datapath, opt, save_keys='all', context_ids='context', self_chat=False, **kwargs)[source]¶

Write Conversations to file from an act list.

Conversations assume the act list is of the following form: a list of episodes, each of which is comprised of a list of act pairs (i.e. a list dictionaries returned from one parley)

parlai.utils.data¶

Utilities related to handling data.

class parlai.utils.data.DatatypeHelper[source]¶

Bases: object

Helper class to determine properties from datatype strings.

classmethod fold(datatype: str) → str[source]¶

Extract the fold part of the datatype.

Parameters: datatype – parlai datatype
Returns: the fold

>>> DatatypeHelper.fold("train:ordered")
... "train"

classmethod strip_stream(datatype: str) → str[source]¶

Remove :stream from the datatype.

Used by ChunkTeacher where behavior does not change based on streaming.

Parameters: datatype – parlai datatype
Returns: a non-streaming version of the datatype.

>>> DatatypeHelper.fold("train:stream")
"train"
>>> DatatypeHelper.fold("train")
"train"

classmethod should_cycle(datatype: str) → bool[source]¶

Return whether we should cycle data based on the datatype.

Parameters: datatype – parlai datatype
Return should_cycle: given datatype, return whether we should cycle

classmethod should_shuffle(datatype: str) → bool[source]¶

Return whether we should shuffle data based on the datatype.

Parameters: datatype – parlai datatype
Return should_shuffle: given datatype, return whether we should shuffle

classmethod is_training(datatype: str) → bool[source]¶

Return whether we should return eval_labels or labels.

Parameters: datatype – parlai datatype
Return is_training: bool indicating whether should return eval_labels or labels

classmethod is_streaming(datatype: str) → bool[source]¶

Return whether this is streaming.

Parameters: datatype – parlai datatype
Returns: bool indicating whether we are streaming

classmethod split_data_by_fold(fold: str, data: List, train_frac: float, valid_frac: float, test_frac: float, seed: int = 42)[source]¶

Splits a list of data into train/valid/test folds. The members of these folds are randomized (in a consistent manner) by a seed. This is a convenience function for datasets that do not have a canonical split.

Parameters

fold – parlai fold/datatype
data – List of data examples to be split
train_frac – Fraction of data to be used for the “train” fold. train_frac, valid_frac, and test_frac should sum to 1.
valid_frac – Fraction of data to be used for the “valid” fold. train_frac, valid_frac, and test_frac should sum to 1.
test_frac – Fraction of data to be used for the “test” fold. train_frac, valid_frac, and test_frac should sum to 1.
seed – Seed for shuffling

classmethod split_subset_data_by_fold(fold: str, subsets: List[List], train_frac: float, valid_frac: float, test_frac: float, seed: int = 42)[source]¶

Splits a list of subsets of data, where we want equal samples from each subset, into train/valid/test folds, ensuring that samples from a given subset are not changed to another fold as more subsets are added.

For example, say a dataset has domains A, B. Let’s say we have an experiment where we train and validate a model on domain A, then on domains A + B. If we naively concatinate the subset of data from A + B together, randomize it, and split the result into train, valid, and test folds, there is no guarantee that valid or test examples from A-only will not end up into the train fold of the A + B split from this naive concatination process.

The members of these folds are randomized (but in a fixed manner) by a seed.

Parameters

fold – parlai fold/datatype
subsets – List of subsets of data examples to be split
train_frac – Fraction of data to be used for the “train” fold. train_frac, valid_frac, and test_frac should sum to 1.
valid_frac – Fraction of data to be used for the “valid” fold. train_frac, valid_frac, and test_frac should sum to 1.
test_frac – Fraction of data to be used for the “test” fold. train_frac, valid_frac, and test_frac should sum to 1.
seed – Seed for shuffling

parlai.utils.distributed¶

Useful utilities for training in distributed mode.

Many of these functions act as wrappers which perform no-ops if code is running in non- distributed mode.

parlai.utils.distributed.is_distributed()[source]¶: Return if we are in distributed mode.

parlai.utils.distributed.num_workers()[source]¶: Get the total number of workers.

parlai.utils.distributed.is_primary_worker()[source]¶

Determine if we are the primary (rank 0) worker.

Returns False if we are a secondary worker. Returns True if we are either (1) not in distributed mode (2) or are the primary (rank 0) worker.

parlai.utils.distributed.get_rank()[source]¶

Returns the rank of the current worker.

Returns 0 if not in distributed.

parlai.utils.distributed.override_print(suppress=False, prefix=None)[source]¶

Context manager to override the print to suppress or modify output.

Recommended usage is to call this with suppress=True for all non-primary workers, or call with a prefix of rank on all workers.

>>> with override_print(prefix="rank{}".format(rank)):
...     my_computation()
:param bool suppress:
    if true, all future print statements are noops.
:param str prefix:
    if not None, this string is prefixed to all future print statements.

parlai.utils.distributed.all_gather_list(data)[source]¶

Gather arbitrary data from all nodes into a list.

Similar to ~torch.distributed.all_gather but for arbitrary Python data. Note that data must be picklable.

Parameters: data – data from the local worker to be gathered on other workers
Returns: a list containing [data1, data2, …] of all workers

parlai.utils.distributed.sync_object(data)[source]¶

Sync an object among all workers.

All workers will return the same value for data when returning from this method, always using the primary worker’s version. Useful for ensuring control flow decisions are made the same.

Parameters: data (object) – The object to synchronize. Must be pickleable.
Returns: the synchronized data

parlai.utils.distributed.sync_parameters(model: Module) → bool[source]¶

Sync all parameters across all workers are the same.

Always returns True, or raises an AssertionError if there was a failure.

Parameters: model – A pytorch model.
Returns: always True

parlai.utils.distributed.distributed_context(rank, opt, rank_offset=0, gpu=None, init_method='tcp://localhost:61337')[source]¶

A context which wraps initialization of a distributed/multiprocessing run.

Every process in the distributed run should launch with this. In true distributed setting you may wish to use slurm_distributed_context instead.

Parameters

rank (int) – This process’s rank, less rank_offset.
rank_offset (int) – Used as an offset of rank. Used between multiprocessing vs true distributed, and a hack around torch.multiprocessing.spawn being only used for the non-primary workers.
opt – command line options distributed training setups on the same machine.
gpu (int) – Which GPU to use. Defaults to using rank and local devices, but must be manually specified when using many-hosts.
method (str init) – Init method, such as tcp://localhost:61337. See torch.distributed docs.

parlai.utils.distributed.get_dist_group()[source]¶

Find the default pytorch distributed group.

Used within FSDP to mark which workers are participating. Important to manually call this because FSDP will cache old groups, but our test suite will instantiate new groups per test.

parlai.utils.distributed.slurm_distributed_context(opt)[source]¶

Initialize a distributed context, using the SLURM environment.

Does some work to read the environment to find a list of participating nodes and the main node.

Parameters: opt – Command line options.

parlai.utils.distributed.find_free_port() → int[source]¶

Find a free port we can bind to locally.

Credit: https://stackoverflow.com/questions/1365265/on-localhost-how-do-i-pick-a-free-port-number

parlai.utils.fp16¶

Utility methods for mixed precision training.

class parlai.utils.fp16.FP16SafeCrossEntropy(weight: Optional[Tensor] = None, ignore_index: int = -100, reduction: str = 'none')[source]¶

Bases: Module

FP16-safe cross entropy loss.

This avoids overflow in the softmax by doing the operation in FP32.

__init__(weight: Optional[Tensor] = None, ignore_index: int = -100, reduction: str = 'none')[source]¶: Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(scores, targets)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

parlai.utils.fp16.clip_grad_norm(params, max_norm: float = 0, sync: bool = False)[source]¶

Clips grad norms.

During combination with FSDP, will also ensure that grad norms are aggregated across all workers, since each worker only stores their shard of the gradients.

Parameters

params – Parameters whose gradients we wish to clip
max_norm – Maximum norm we wish the gradients to have. If non-positive, then we will not perform clipping.
sync – Boolean indicating whether we should aggregate across the distributed group. Used only in combination with FSDP.

Returns

The gradient norm across all parameters, before clipping.

parlai.utils.fp16.has_overflow(grad_norm)[source]¶: Detect inf and NaN in grad_norm.

class parlai.utils.fp16.SafeFP16Optimizer(optimizer, aggregate_gnorms=False)[source]¶

Bases: Optimizer

__init__(optimizer, aggregate_gnorms=False)[source]¶

state_dict()[source]¶: Return the optimizer’s state dict.

load_state_dict(state_dict)[source]¶

Load an optimizer state dict.

In general we should prefer the configuration of the existing optimizer instance (e.g., learning rate) over that found in the state_dict. This allows us to resume training from a checkpoint using a new set of optimizer args.

backward(loss, update_main_grads=False, retain_graph=False)[source]¶

Computes the sum of gradients of the given tensor w.r.t. graph leaves.

Compared to fairseq.optim.FairseqOptimizer.backward(), this function additionally dynamically scales the loss to avoid gradient underflow.

multiply_grads(c)[source]¶: Multiplies grads by a constant c.

clip_main_grads(max_norm)[source]¶: Clips gradient norm and updates dynamic loss scaler.

step(closure=None)[source]¶: Performs a single optimization step.

zero_grad()[source]¶: Clears the gradients of all optimized parameters.

property loss_scale¶: Convenience function which TorchAgent calls to get current scale value.

class parlai.utils.fp16.DynamicLossScaler(init_scale: float = 32768.0, scale_factor: float = 2.0, scale_window: int = 2000, tolerance: float = 0.0, threshold: Optional[float] = None)[source]¶

Bases: object

Dynamically adjusts the loss scaling factor.

Dynamic loss scalers are important in mixed-precision training. They help us avoid underflows and overflows in low-precision gradients.

See here for information: <https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html#lossscaling>

Shamelessly stolen and adapted from Fairseq. <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/fp16_optimizer.py>

__init__(init_scale: float = 32768.0, scale_factor: float = 2.0, scale_window: int = 2000, tolerance: float = 0.0, threshold: Optional[float] = None)[source]¶

Parameters

init_scale – Initial loss scale.
scale_factor – Factor by which to increase or decrease loss scale.
scale_window – If we do not experience overflow in scale_window iterations, loss scale will increase by scale_factor.
tolerance – Pct of iterations that have overflowed after which we must decrease the loss scale
threshold – If not None, loss scale will decrease below this threshold

update_scale(overflow: bool)[source]¶

Update the loss scale.

If overflow exceeds our tolerance, we decrease the loss scale. If the number of iterations since the last overflow exceeds the scale window, we increase the loss scale.

class parlai.utils.fp16.MemoryEfficientFP16Optimizer(init_optimizer: Optimizer, aggregate_gnorms: bool = False, loss_initial_scale: float = 131072.0, min_loss_scale: float = 0.0001)[source]¶

Bases: Optimizer

Wrap an optimizer to perform memory-efficient mixed precision training.

This class wraps an optimizer to perform FP16 training. This implementation is heavily based on the Fairseq implementation of MemoryEfficientFP16Optimizer, which can be found here: <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/fp16_optimizer.py#L382>

This allows you to train bigger models on a single GPU, but can be unstable. Prefer the SafeFP16 implementation if you do not have concerns about memory.

Parameters

params – Model parameters
optimizer – Any torch optimizer
loss_initial_scale (float) – Initial loss scaling. Default chosen empirically, but models with very low or high loss values may need this adjusted. Stick with powers of 2
min_loss_scale (float) – Throws an error if your loss scale goes below this threshold

__init__(init_optimizer: Optimizer, aggregate_gnorms: bool = False, loss_initial_scale: float = 131072.0, min_loss_scale: float = 0.0001)[source]¶

static compatible_optimizers()[source]¶: List of compatible optimizers.

property params¶: Return an iterable of the parameters held by the optimizer.

add_param_group(param_group)[source]¶

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Args:

param_group (dict): Specifies what Tensors should be optimized along with group: specific optimization options.

clip_main_grads(gradient_clip)[source]¶

Clips gradient norm and updates dynamic loss scaler.

Returns -1 if the most recently computed gradients overflowed.

multiply_grads(c)[source]¶: Multiplies grads by a constant c.

backward(loss, update_main_grads=False)[source]¶

Computes the sum of gradients of the given tensor w.r.t. graph leaves.

Compared to a regular backwards call , this function dynamically scales the loss to avoid gradient underflow.

step(closure=None)[source]¶: Performs a single optimization step.

state_dict()[source]¶: Return the optimizer’s state dict.

load_state_dict(state_dict)[source]¶

Load an optimizer state dict.

Override from PyTorch implementation to avoid casting to FP32.

property loss_scale¶: Convenience function which TorchAgent calls to get current scale value.

zero_grad()[source]¶: Clears the gradients of all optimized parameters.

class parlai.utils.fp16.MemoryEfficientFP16Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False, *, foreach: Optional[bool] = None, maximize: bool = False, capturable: bool = False, differentiable: bool = False, fused: Optional[bool] = None)[source]¶

Bases: Adam

Override from Pytorch implementation to ensure aggregations done in FP32.

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Args:

closure (Callable): A closure that reevaluates the model and: returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

class parlai.utils.fp16.Adafactor(params, lr=None, eps=(1e-30, 0.001), clip_threshold=1.0, decay_rate=-0.8, beta1=None, weight_decay=0.0, warmup_init=False)[source]¶

Bases: Optimizer

Implements Adafactor algorithm.

This implementation is based on: Adafactor: Adaptive Learning Rates with Sublinear Memory Cost (see https://arxiv.org/abs/1804.04235)

Taken from the fairseq implementation, which can be found here: <https://github.com/pytorch/fairseq/blob/main/fairseq/optim/adafactor.py>.

Parameters

(iterable) (params) – iterable of parameters to optimize or dicts defining parameter groups
optional) (weight_decay (float,) – external learning rate (default: None)
float]) (eps (tuple[float,) – regularization constans for square gradient and parameter scale respectively (default: (1e-30, 1e-3))
(float) (beta1) – threshold of root mean square of final gradient update (default: 1.0)
(float) – coefficient used to compute running averages of square gradient (default: -0.8)
(float) – coefficient used for computing running averages of gradient (default: None)
optional) – weight decay (L2 penalty) (default: 0)
(bool) (warmup_init) – if true, learning rate is scaled by root mean square of parameter (default: True)
(bool) – if true, time-dependent learning rate is computed instead of external learning rate (default: True)
(bool) – time-dependent learning rate computation depends on whether warm-up initialization is being used (default: False)

__init__(params, lr=None, eps=(1e-30, 0.001), clip_threshold=1.0, decay_rate=-0.8, beta1=None, weight_decay=0.0, warmup_init=False)[source]¶

step(closure=None)[source]¶

Performs a single optimization step.

Arguments:

closure (callable, optional): A closure that reevaluates the model: and returns the loss.

parlai.utils.logging¶

class parlai.utils.logging.ParlaiLogger(name, console_level=20)[source]¶

Bases: Logger

__init__(name, console_level=20)[source]¶

Initialize the logger object.

Parameters

name – Name of the logger
console_level – minimum level of messages logged to console

log(msg, level=20)[source]¶: Default Logging function.

add_format_prefix(prefix)[source]¶: Include prefix in all future logging statements.

mute()[source]¶: Stop logging to stdout.

unmute()[source]¶: Resume logging to stdout.

parlai.utils.misc¶

File for miscellaneous utility functions and constants.

parlai.utils.misc.maintain_dialog_history(history, observation, reply='', historyLength=1, useReplies='label_else_model', dict=None, useStartEndIndices=True, splitSentences=False)[source]¶

Keep track of dialog history, up to a truncation length.

Either includes replies from the labels, model, or not all using param ‘replies’.

DEPRECATED. USE PARLAI.CORE.TORCH_AGENT INSTEAD.

parlai.utils.misc.load_cands(path, lines_have_ids=False, cands_are_replies=False)[source]¶

Load global fixed set of candidate labels that the teacher provides.

Every example will include these as candidates. The true labels for a specific example are also added to this set, so that it’s possible to get the right answer.

class parlai.utils.misc.Timer[source]¶

Bases: object

Computes elapsed time.

__init__()[source]¶: Initialize timer.

reset()[source]¶: Reset timer to zero.

resume()[source]¶: Resume timer.

stop()[source]¶: Pause timer.

time()[source]¶: Get current timer time.

class parlai.utils.misc.TimeLogger[source]¶

Bases: object

Class for logging time progress against a goal.

__init__()[source]¶: Set up timer.

total_time()[source]¶: Return time elapsed at last log call.

time()[source]¶: Return current timer time.

log(done, total, report=None)[source]¶

Log report, time elapsed, and percentage progress towards goal.

Parameters

done – number of examples completed so far
total – total number of elements to be completed. if total > 0, calculates the time remaining and percentage complete.
report – dict of pairs to log

Returns

tuple log string, log dict log string contains time elapsed and string representation of the log dict log dict contains pairs of all items to log, which includes percentage complete and projected time left if total > 0

class parlai.utils.misc.AttrDict(*args, **kwargs)[source]¶

Bases: dict

Helper class to have a dict-like object with dot access.

For example, instead of d = {‘key’: ‘value’} use d = AttrDict(key=’value’). To access keys, instead of doing d[‘key’] use d.key.

While this has some limitations on the possible keys (for example, do not set the key items or you will lose access to the items() method), this can make some code more clear.

__init__(*args, **kwargs)[source]¶: Initialize AttrDict using input dict.

class parlai.utils.misc.SimpleCounter(value=0)[source]¶

Bases: object

Simple counter object.

__init__(value=0)[source]¶

parlai.utils.misc.float_formatter(f: Union[float, int]) → str[source]¶: Format a float as a pretty string.

parlai.utils.misc.nice_report(report) → str[source]¶

Render an agent Report as a beautiful string.

If pandas is installed, we will use it to render as a table. Multitask metrics will be shown per row, e.g.

If pandas is not available, we will use a dict with like-metrics placed next to each other.

parlai.utils.misc.round_sigfigs(x: Union[float, Tensor], sigfigs=4) → float[source]¶

Round value to specified significant figures.

Parameters

x – input number
sigfigs – number of significant figures to return

Returns

float number rounded to specified sigfigs

parlai.utils.misc.clip_text(text, max_len)[source]¶: Clip text to max length, adding ellipses.

parlai.utils.misc.display_messages(msgs: List[Dict[str, Any]], prettify: bool = False, ignore_agent_reply: bool = False, add_fields: str = '', max_len: int = 1000, verbose: bool = False) → Optional[str][source]¶

Return a string describing the set of messages provided.

If prettify is true, candidates are displayed using prettytable. add_fields provides a list of fields in the msgs which should be displayed if verbose is off.

parlai.utils.misc.str_to_msg(txt, ignore_fields='')[source]¶

Convert formatted string to ParlAI message dict.

Parameters

txt – formatted string to convert. String format is tab-separated fields, with colon separating field name and contents.
ignore_fields – (default ‘’) comma-separated field names to not include in the msg dict even if they’re in the string.

parlai.utils.misc.msg_to_str(msg, ignore_fields='')[source]¶

Convert ParlAI message dict to string.

Parameters

msg – dict to convert into a string.
ignore_fields – (default ‘’) comma-separated field names to not include in the string even if they’re in the msg dict.

parlai.utils.misc.set_namedtuple_defaults(namedtuple, default=None)[source]¶

Set all of the fields for a given nametuple to a singular value.

Additionally removes the default docstring for each field. Modifies the tuple in place, but returns it anyway.

More info: https://stackoverflow.com/a/18348004

Parameters

namedtuple – A constructed collections.namedtuple
default – The default value to set.

Returns

the modified namedtuple

parlai.utils.misc.warn_once(msg: str) → None[source]¶

Log a warning, but only once.

Parameters: msg (str) – Message to display

parlai.utils.misc.error_once(msg: str) → None[source]¶

Log an error, but only once.

Parameters: msg (str) – Message to display

parlai.utils.misc.recursive_getattr(obj, attr, *args)[source]¶: Recursive call to getattr for nested attributes.

parlai.utils.pickle¶

ParlAI’s custom unpickler.

As modules move around or are renamed, it old torch model files become invalid, since they look for modules in all the wrong places. Furthermore, we occasionally use APEX for performance reasons, but we don’t want to outright die if the user has not installed it.

This module is to handle both of these issues. It is used like this:

>>> import parlai.utils.pickle
>>> state_dict = torch.load(filename, pickle_module=parlai.utils.pickle)

class parlai.utils.pickle.Unpickler(file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=None)[source]¶

Bases: _Unpickler

Custom unpickler to handle moved classes and optional libraries.

parlai.utils.safety¶

Utility functions and classes for detecting offensive language.

class parlai.utils.safety.OffensiveLanguageClassifier(shared: Optional[TShared] = None, custom_model_file='zoo:dialogue_safety/single_turn/model')[source]¶

Bases: object

Load model trained to detect offensive language in the context of single- turn dialogue utterances.

This model was trained to be robust to adversarial examples created by humans. See <http://parl.ai/projects/dialogue_safety/> for more information.

__init__(shared: Optional[TShared] = None, custom_model_file='zoo:dialogue_safety/single_turn/model')[source]¶

contains_offensive_language(text)[source]¶: Returns the probability that a message is safe according to the classifier.

class parlai.utils.safety.OffensiveStringMatcher(datapath: Optional[str] = None)[source]¶

Bases: object

Detects offensive language using a list of offensive language and phrases from https://github.com/LDNOOBW.

__init__(datapath: Optional[str] = None)[source]¶

Get data from external sources and build data representation.

If datapath ends in ‘.txt’ it is assumed a custom model file is already given.

add_phrase(phrase)[source]¶: Add a single phrase to the filter.

add_words(phrase_list)[source]¶: Add list of custom phrases to the filter.

contains_offensive_language(text)[source]¶: Determine if text contains any offensive words in the filter.

find_all_offensive_language(text)[source]¶: Find all offensive words from text in the filter.

parlai.utils.strings¶

Utility functions and classes for handling text strings.

parlai.utils.strings.normalize_reply(text: str, version=1) → str[source]¶

Standardize the capitalization and punctuation spacing of the input text.

Version 1: Fix sentence start casing, and punctuation.

Version 2: Add trailing period, if missing.

parlai.utils.strings.uppercase(string: str) → str[source]¶: Make the first character of the string uppercase, if the string is non-empty.

parlai.utils.testing¶

General utilities for helping writing ParlAI unit and integration tests.

parlai.utils.testing.is_this_circleci()[source]¶: Return if we are currently running in CircleCI.

parlai.utils.testing.skipUnlessTorch(testfn, reason='pytorch is not installed')[source]¶: Decorate a test to skip if torch is not installed.

parlai.utils.testing.skipIfGPU(testfn, reason='Test is CPU-only')[source]¶

Decorate a test to skip if a GPU is available.

Useful for disabling hogwild tests.

parlai.utils.testing.skipUnlessGPU(testfn, reason='Test requires a GPU')[source]¶: Decorate a test to skip if no GPU is available.

parlai.utils.testing.skipUnlessBPE(testfn, reason='Test requires subword NMT')[source]¶: Decorate a test to skip if BPE is not installed.

parlai.utils.testing.skipIfCircleCI(testfn, reason='Test disabled in CircleCI')[source]¶: Decorate a test to skip if running on CircleCI.

parlai.utils.testing.skipUnlessVision(testfn, reason='torchvision not installed')[source]¶: Decorate a test to skip unless torchvision is installed.

parlai.utils.testing.skipUnlessFairseq(testfn, reason='fairseq not installed')[source]¶: Decorate a test to skip unless fairseq is installed.

parlai.utils.testing.skipUnlessMephisto(testfn, reason='mephisto not installed')[source]¶: Decorate a test to skip unless mephisto is installed.

parlai.utils.testing.skipUnlessClearML(testfn, reason='clearml not installed')[source]¶: Decorate a test to skip unless clearml is installed.

class parlai.utils.testing.retry(ntries=3, log_retry=False)[source]¶

Bases: object

Decorator for flaky tests. Test is run up to ntries times, retrying on failure.

Parameters

ntries – the number of tries to attempt
log_retry – if True, prints to stdout on retry to avoid being seen as “hanging”

On the last time, the test will simply fail.

>>> @retry(ntries=10)
... def test_flaky(self):
...     import random
...     self.assertLess(0.5, random.random())

__init__(ntries=3, log_retry=False)[source]¶

parlai.utils.testing.git_ls_files(root=None, skip_nonexisting=True)[source]¶: List all files tracked by git.

parlai.utils.testing.git_ls_dirs(root=None)[source]¶: List all folders tracked by git.

parlai.utils.testing.git_changed_files(skip_nonexisting=True)[source]¶

List all the changed files in the git repository.

Parameters: skip_nonexisting (bool) – If true, ignore files that don’t exist on disk. This is useful for disregarding files created in main, but don’t exist in HEAD.

parlai.utils.testing.git_commit_messages()[source]¶: Output each commit message between here and main.

parlai.utils.testing.is_new_task_filename(filename)[source]¶

Check if a given filename counts as a new task.

Used in tests and test triggers, and only here to avoid redundancy.

parlai.utils.testing.capture_output()[source]¶

Suppress all logging output into a single buffer.

Use as a context manager.

>>> with capture_output() as output:
...     print('hello')
>>> output.getvalue()
'hello'

parlai.utils.testing.tempdir()[source]¶

Create a temporary directory.

Use as a context manager so the directory is automatically cleaned up.

>>> with tempdir() as tmpdir:
...    print(tmpdir)  # prints a folder like /tmp/randomname

parlai.utils.testing.timeout(time: int = 30)[source]¶

Raise a timeout if a function does not return in time time.

Use as a context manager, so that the signal class can reset it’s alarm for SIGALARM

Parameters: time (int) – Time in seconds to wait for timeout. Default is 30 seconds.

parlai.utils.testing.train_model(opt: Opt) → Tuple[Dict[str, Any], Dict[str, Any]][source]¶

Run through a TrainLoop.

If model_file is not in opt, then this helper will create a temporary directory to store the model, dict, etc.

Returns: (stdout, valid_results, test_results)
Return type: (str, dict, dict)

parlai.utils.testing.eval_model(opt, skip_valid=False, skip_test=False, valid_datatype='valid', test_datatype='test')[source]¶

Run through an evaluation loop.

Parameters

opt – Any non-default options you wish to set.
skip_valid (bool) – If true skips the valid evaluation, and the first return value will be None.
skip_test (bool) – If true skips the test evaluation, and the second return value will be None.
valid_datatype (str) – If custom datatype required for valid, e.g. train:evalmode, specify here

Returns

(valid_results, test_results)

Return type

(dict, dict)

If model_file is not in opt, then this helper will create a temporary directory to store the model files, and clean up afterwards. You can keep the directory by disabling autocleanup

parlai.utils.testing.display_data(opt)[source]¶

Run through a display data run.

Returns: (stdout_train, stdout_valid, stdout_test)
Return type: (str, str, str)

parlai.utils.testing.display_model(opt) → Tuple[str, str, str][source]¶

Run display_model.py.

Returns: (stdout_train, stdout_valid, stdout_test)

parlai.utils.torch¶

Utility methods for dealing with torch code.

parlai.utils.torch.neginf(dtype: dtype) → float[source]¶: Return a representable finite number near -inf for a dtype.

parlai.utils.torch.atomic_save(state_dict: Any, path: str) → None[source]¶

Like torch.save, but atomic.

Useful for preventing trouble coming from being pre-empted or killed while writing to disk. Works by writing to a temporary file, and then renaming the file to the final name.

parlai.utils.torch.padded_tensor(items: List[Union[List[int], LongTensor]], pad_idx: int = 0, left_padded: bool = False, max_len: Optional[int] = None, fp16friendly: bool = False) → Tuple[LongTensor, List[int]][source]¶

Create a padded matrix from an uneven list of lists.

Returns (padded, lengths), where padded is the padded matrix, and lengths is a list containing the lengths of each row.

Matrix is right-padded (filled to the right) by default, but can be left padded if the flag is set to True.

Matrix can also be placed on cuda automatically.

Parameters

items (list[iter[int]]) – List of items
sort (bool) – If True, orders by the length
pad_idx (int) – the value to use for padding
left_padded (bool) –
max_len (int) – if None, the max length is the maximum item length
fp16friendly (bool) – if True, pads the time dimension to be a multiple of 4.

Returns

(padded, lengths) tuple

Return type

(Tensor[int64], list[int])

parlai.utils.torch.padded_3d(tensors: List[LongTensor], pad_idx: int = 0, dtype: Optional[dtype] = torch.int64, fp16friendly: bool = False)[source]¶

Make 3D padded tensor for list of lists of 1D tensors or lists.

Will keep items on the same device as originally.

Parameters

tensors – list of lists of 1D tensors (or lists)
pad_idx – padding to fill tensor with
fp16friendly (bool) – if True, pads the final dimension to be a multiple of 8.

Returns

3D tensor with the maximum dimensions of the inputs

parlai.utils.torch.concat_without_padding(text_idx, cand_idx, use_cuda, null_idx=0)[source]¶

Concatenate two right padded tensors and move padding to the right.

For example,: if text_idx = [[1, 2, 3, 4, 0, 0 ]] and cand_idx = [[5, 6, 7, 8, 0, 0 ]]:
Then result = (tokens, segments) where: tokens = [[1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0]] segments = [[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]]

parlai.utils.torch.argsort(keys: List[Any], *lists: List[List[Any]], descending: bool = False)[source]¶

Reorder each list in lists by the (descending) sorted order of keys.

Parameters

keys (iter) – Keys to order by.
lists (list[list]) – Lists to reordered by keys’s order. Correctly handles lists and 1-D tensors.
descending (bool) – Use descending order if true.

Returns

The reordered items.

parlai.utils.torch.compute_grad_norm(parameters, norm_type=2.0)[source]¶

Compute norm over gradients of model parameters.

Parameters

parameters – the model parameters for gradient norm calculation. Iterable of Tensors or single Tensor
norm_type – type of p-norm to use

Returns

the computed gradient norm

class parlai.utils.torch.IdentityLayer(*args, **kwargs)[source]¶

Bases: Module

Identity layer module.

Useful for decoder-only Torch Generator agents.

forward(xs)[source]¶: Identity.

parlai.utils.torch.total_parameters(model: Module) → int[source]¶

Count the total number of parameters in the model.

Parameters: model – the model whose parameters we wish to count.
Returns: total number of parameters in the model.

parlai.utils.torch.trainable_parameters(model: Module) → int[source]¶

Count the total number of trainable parameters in the model.

Parameters: model – the model whose parameters we wish to count.
Returns: total number of trainable parameters in the model.

class parlai.utils.torch.PipelineWorkItem(chunk_idx, layer_nos, next_device)¶

Bases: tuple

chunk_idx¶: Alias for field number 0

layer_nos¶: Alias for field number 1

next_device¶: Alias for field number 2

class parlai.utils.torch.PipelineHelper[source]¶

Bases: object

PipelineHelper assists with implementing pipelining in model parallelism.

For a tutorial on model parallelism, as it’s implemented in parts of ParlAI, see https://pytorch.org/tutorials/intermediate/model_parallel_tutorial.html.

Usage: >>> my_model = PipelineHelper().make_parallel(my_model)

Note that you will need to manually implement logic which handles the moved layers.

__init__()[source]¶

check_compatibility(opt)[source]¶

Check compatibility for opts.

Really just used to raise an error message if the user mixes multiprocessing and model parallelism.

make_parallel(model: Module) → Module[source]¶

Allocate specific layers in a model to be ModelParallel.

Limited to only ModuleLists within the model. Uses some heuristics to attempt to evenly distribute layers across GPUs, in order to balance memory usage. They are:

Assume the 0th GPU will host the optimizer, word embeddings, etc.
Assume activation memory is linear with the number of parameters.
All layers are approximately equal in size.

static guess_split_size(item: Chunk, num_gpus: Optional[int] = None, dim=0) → int[source]¶: Estimate the number of chunks we should split the batch into via heuristics.

static split(item: Chunk, split_size: Optional[int] = None, dim=0) → List[Chunk][source]¶

Split a tensor or group of tensors into smaller chunks of the same type.

Parameters

item – The item being split. May be a Tensor, a tuple of Tensors, or a dictionary mapping str -> Tensor.
split_size – The maximum size of each output chunk. If None, we will guess using heuristics
dim – The dimension to split along.

static join(items: List[Chunk], dim=0) → Chunk[source]¶

Join chunks back together, the inverse of split.

Parameters

items – All the output chunks. Each chunk may be a tensor or a group of tensors.
dim – The dimension to join along.

static chunk_to(chunk: Chunk, device: str) → Chunk[source]¶

Move the chunk to the device.

Handles chunks which are groups of tensors.

static schedule_work_items(layers: ModuleList, chunks: List[Chunk])[source]¶

Iterate through chunks and layers that should be pipelined.

Each iteration of this generator yields the following properties:

layer_nos: a list of indices of layers for you to forward through

chunk_idx: the index of the chunk we are manipulating. Use this if you need to update chunk representations.

next_device: where the chunk should be moved to AFTER the layer computation is done.

parlai.utils.typing¶

Definitions of general ParlAI types.

parlai.utils.typing.TScalar¶

ParlAI type to represent an object that is theoretically expressible as a scalar value. Ints and floats are clearly scalars, and torch.Tensors can be represented by a scalar if Tensor.numel() == 1. Used as input type for classes derived from Metric.

alias of Union[int, float, Tensor]

parlai.utils.world_logging¶

Useful utilities for logging actions/observations in a world.

class parlai.utils.world_logging.WorldLogger(opt)[source]¶

Bases: object

Logs actions/observations in a world and saves in a given format.

__init__(opt)[source]¶

log(world)[source]¶: Log acts from a world.