parlai.core.torch_agent¶

Torch Agent implements much of the boilerplate necessary for creating a neural dialogue agent, so you can focus on modeling. Torch Agent limits its functionality to maintaining dialogue history, transforming text into vectors of indices, and loading/saving models. The user is required to implement their own logic in methods like train_step and eval_step.

Torch Ranker Agent and Torch Generator Agent have more specialized stub methods, and provide many rich features and benefits. Torch Ranker Agent assumes your model ranks possible responses from a set of possible candidates, and provides options around negative sampling, candidate sampling, and large-scale candidate prediction. Torch Generator Agent assumes your model generates utterances auto-regressively, and provides generic implementations of beam search.

Torch Agent¶

General utility code for building PyTorch-based agents in ParlAI.

Contains the following main utilities:

TorchAgent class which serves as a useful parent class for other model agents
Batch namedtuple which is the input type of the main abstract methods of the TorchAgent class
Output namedtuple which is the expected output type of the main abstract methods of the TorchAgent class
History class which handles tracking the dialogue state over the course of an episode.

See below for documentation on each specific tool.

class parlai.core.torch_agent.Batch(text_vec=None, text_lengths=None, label_vec=None, label_lengths=None, labels=None, valid_indices=None, candidates=None, candidate_vecs=None, reward=None, image=None, is_training: Optional[bool] = None, _context_original_length: Optional[LongTensor] = None, _context_truncate_rate: Optional[LongTensor] = None, _context_truncated_length: Optional[LongTensor] = None, _label_original_length: Optional[LongTensor] = None, _label_truncate_rate: Optional[LongTensor] = None, _label_truncated_length: Optional[LongTensor] = None, **kwargs)[source]¶

Bases: AttrDict

Batch is a namedtuple containing data being sent to an agent.

This is the input type of the train_step and eval_step functions. Agents can override the batchify function to return a Batch with additional fields if they would like, though we recommend calling the parent function to set up these fields as a base.

Batch objects contain some magic semantics when dealing with CUDA. Namely, Batch objects have a to() method that can be used to send all tensors to a particular device (GPU). This is undesireable in some instances, as some fields may be used only for accumulating metrics, or are only used on CPU. Prefixing a field with an underscore will prevent it from being transferred to GPU.

Note that in upcoming versions of ParlAI, we will enable features for getting speedups in training which work best when the number of non-Tensor objects in a batch is minimal.

Parameters

text_vec – bsz x seqlen tensor containing the parsed text data.
label_vec – bsz x seqlen tensor containing the parsed label (one per batch row).
labels – list of length bsz containing the selected label for each batch row (some datasets have multiple labels per input example).
valid_indices – tensor of length bsz containing the original indices of each example in the batch. we use these to map predictions back to their proper row, since e.g. we may sort examples by their length or some examples may be invalid.
candidates – list of lists of text. outer list has size bsz, inner lists vary in size based on the number of candidates for each row in the batch.
candidate_vecs – list of lists of tensors. outer list has size bsz, inner lists vary in size based on the number of candidates for each row in the batch.
image – list of image features in the format specified by the –image-mode arg.
reward – Tensor containing the “reward” field of observations, if present

__init__(text_vec=None, text_lengths=None, label_vec=None, label_lengths=None, labels=None, valid_indices=None, candidates=None, candidate_vecs=None, reward=None, image=None, is_training: Optional[bool] = None, _context_original_length: Optional[LongTensor] = None, _context_truncate_rate: Optional[LongTensor] = None, _context_truncated_length: Optional[LongTensor] = None, _label_original_length: Optional[LongTensor] = None, _label_truncate_rate: Optional[LongTensor] = None, _label_truncated_length: Optional[LongTensor] = None, **kwargs)[source]¶: Initialize AttrDict using input dict.

to(dev)[source]¶

Move all tensors in the batch to a device.

NOT in place.

Note that valid_indices and fields starting with an underscore are always kept on CPU.

Returns: self

class parlai.core.torch_agent.Output(text=None, text_candidates=None, **kwargs)[source]¶

Bases: AttrDict

Output is an object containing agent predictions.

This is the expected return type of the train_step and eval_step functions, though agents can choose to return None if they do not want to answer.

Parameters

text (List[str]) – list of strings of length bsz containing the predictions of the model
text_candidates (List[List[str]]) – list of lists of length bsz containing ranked predictions of the model. each sub-list is an ordered ranking of strings, of variable length.

__init__(text=None, text_candidates=None, **kwargs)[source]¶: Initialize AttrDict using input dict.

class parlai.core.torch_agent.History(opt, field='text', maxlen=None, size=-1, p1_token='__p1__', p2_token='__p2__', dict_agent=None)[source]¶

Bases: object

History handles tracking the dialogue state over the course of an episode.

History may also be used to track the history of any field.

Parameters

field – field in the observation to track over the course of the episode (defaults to ‘text’)
maxlen – sets the maximum number of tunrs
p1_token – token indicating ‘person 1’; opt must have ‘person_tokens’ set to True for this to be added
p1_token – token indicating ‘person 2’; opt must have ‘person_tokens’ set to True for this to be added
dict_agent – DictionaryAgent object for tokenizing the history

__init__(opt, field='text', maxlen=None, size=-1, p1_token='__p1__', p2_token='__p2__', dict_agent=None)[source]¶

parse(text)[source]¶: Tokenize text with the given dictionary.

reset()[source]¶: Clear the history.

add_reply(text)[source]¶: Add your own response to the history.

update_history(obs: Message, temp_history: Optional[str] = None)[source]¶

Update the history with the given observation.

Parameters

obs – Observation used to update the history.
temp_history – Optional temporary string. If it is not None, this string will be appended to the end of the history. It will not be in the history on the next dialogue turn. Set to None to stop adding to the history.

get_history_str() → Optional[str][source]¶: Return the string version of the history.

get_history_vec()[source]¶: Return a vectorized version of the history.

get_history_vec_list()[source]¶: Return a list of history vecs.

class parlai.core.torch_agent.TorchAgent(opt: Opt, shared=None)[source]¶

Bases: ABC, Agent

A provided abstract base agent for any model that wants to use Torch.

Exists to make it easier to implement a new agent. Not necessary, but reduces duplicated code.

Many methods are intended to be either used as is when the default is acceptable, or to be overriden and called with super(), with the extra functionality added to the initial result. See the method comment for recommended behavior.

This agent serves as a common framework for all ParlAI models which want to use PyTorch.

classmethod optim_opts()[source]¶

Fetch optimizer selection.

By default, collects everything in torch.optim, as well as importing: - qhm / qhmadam if installed from github.com/facebookresearch/qhoptim

Override this (and probably call super()) to add your own optimizers.

static dictionary_class()[source]¶

Return the dictionary class that this agent expects to use.

Can be overridden if a more complex dictionary is required.

classmethod history_class()[source]¶

Return the history class that this agent expects to use.

Can be overridden if a more complex history is required.

classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) → ParlaiParser[source]¶: Add the default commandline args we expect most agents to want.

__init__(opt: Opt, shared=None)[source]¶: Initialize agent.

build_history()[source]¶: Return the constructed history object.

build_dictionary()[source]¶

Return the constructed dictionary, which will be set to self.dict.

If you need to add additional tokens to the dictionary, this is likely the right place to do it.

abstract build_model()[source]¶: Construct the model and return it.

init_optim(params, optim_states=None, saved_optim_type=None, is_finetune: bool = False) → bool[source]¶

Initialize optimizer with model parameters.

Parameters

params – parameters from the model
optim_states – optional argument providing states of optimizer to load
saved_optim_type – type of optimizer being loaded, if changed will skip loading optimizer states
is_finetune – bool indicating whether this training run is a fine-tune or not

Returns

boolean indicating whether the optimizer failed to initialize with optim_states.

build_lr_scheduler(states=None, hard_reset=False)[source]¶

Create the learning rate scheduler, and assign it to self.scheduler. This scheduler will be updated upon a call to receive_metrics. May also create self.warmup_scheduler, if appropriate.

Parameters

states (state_dict) – Possible state_dict provided by model checkpoint, for restoring LR state
hard_reset (bool) – If true, the LR scheduler should ignore the state dictionary.

record_local_metric(keyname: str, values: List[Metric])[source]¶

Record an example-level metric for all items in the batch.

Local metrics are maybe recorded anywhere within batch act. They will automatically be collated and returned at the end of batch_act. The beginning of batch_act resets these, so you may not use them during observe.

Example local metrics include ppl, token_acc, any other agent-specific metrics.

report()[source]¶

Report metrics.

Report includes learning rate and number of training updates.

share()[source]¶

Share fields from parent as well as useful objects in this class.

Subclasses will likely want to share their model as well.

vectorize(obs, history, add_start=True, add_end=True, text_truncate=None, label_truncate=None)[source]¶

Make vectors out of observation fields and store in the observation.

In particular, the ‘text’ and ‘labels’/’eval_labels’ fields are processed and a new field is added to the observation with the suffix ‘_vec’.

If you want to use additional fields on your subclass, you can override this function, call super().vectorize(…) to process the text and labels, and then process the other fields in your subclass.

Additionally, if you want to override some of these default parameters, then we recommend using a pattern like:

def vectorize(self, *args, **kwargs):
    kwargs['add_start'] = False
    return super().vectorize(*args, **kwargs)

Parameters

obs – Single observation from observe function.
add_start – default True, adds the start token to each label.
add_end – default True, adds the end token to each label.
text_truncate – default None, if set truncates text vectors to the specified length.
label_truncate – default None, if set truncates label vectors to the specified length.

Returns

the input observation, with ‘text_vec’, ‘label_vec’, and ‘cands_vec’ fields added.

is_valid(obs)[source]¶: Determine if an observation is valid or not.

batchify(obs_batch, sort=False)[source]¶

Create a batch of valid observations from an unchecked batch.

A valid observation is one that passes the lambda provided to the function, which defaults to checking if the preprocessed ‘text_vec’ field is present which would have been set by this agent’s ‘vectorize’ function.

Returns a namedtuple Batch. See original definition above for in-depth explanation of each field.

If you want to include additional fields in the batch, you can subclass this function and return your own “Batch” namedtuple: copy the Batch namedtuple at the top of this class, and then add whatever additional fields that you want to be able to access. You can then call super().batchify(…) to set up the original fields and then set up the additional fields in your subclass and return that batch instead.

Parameters

obs_batch – List of vectorized observations
sort – Default False, orders the observations by length of vectors. Set to true when using torch.nn.utils.rnn.pack_padded_sequence. Uses the text vectors if available, otherwise uses the label vectors if available.

match_batch(batch_reply, valid_inds, output=None)[source]¶

Match sub-batch of predictions to the original batch indices.

Batches may be only partially filled (i.e when completing the remainder at the end of the validation or test set), or we may want to sort by e.g the length of the input sequences if using pack_padded_sequence.

This matches rows back with their original row in the batch for calculating metrics like accuracy.

If output is None (model choosing not to provide any predictions), we will just return the batch of replies.

Otherwise, output should be a parlai.core.torch_agent.Output object. This is a namedtuple, which can provide text predictions and/or text_candidates predictions. If you would like to map additional fields into the batch_reply, you can override this method as well as providing your own namedtuple with additional fields.

Parameters

batch_reply – Full-batchsize list of message dictionaries to put responses into.
valid_inds – Original indices of the predictions.
output – Output namedtuple which contains sub-batchsize list of text outputs from model. May be None (default) if model chooses not to answer. This method will check for text and text_candidates fields.

get_temp_history(observation) → Optional[str][source]¶

Return a string to temporarily insert into history for a single turn.

NOTE: This does NOT attempt to provide any sort of delimiter or spacing between the original history and the temporary history. If you require such delimiter or spacing, you should include it in the temp history.

Intentionally overridable so more complex models can insert temporary history strings, i.e. strings that are removed from the history after a single turn.

observe(observation)[source]¶

Process incoming message in preparation for producing a response.

This includes remembering the past history of the conversation.

self_observe(self_message: Message) → None[source]¶

Observe one’s own utterance.

This is used so that the agent can incorporate its own response into the dialogue history after a batch_act. Failure to implement this will result in an agent that cannot hear itself speak.

Parameters: self_message – The message corresponding to the output from batch_act.

state_dict()[source]¶

Get the state dict for saving.

Override this method for more specific saving.

save_nonprimary(path=None)[source]¶

Save model parameters, when you are working on the non-primary worker.

For models or optimizers that shard parameters, this ensures we sync.

save(path=None)[source]¶

Save model parameters to path (or default to model_file arg).

Please try to refrain from overriding this function, and instead override state_dict(self) for more specific saving.

load_state_dict(state_dict)[source]¶

Load the state dict into model.

This is easily overridable to facilitate transfer of state dicts.

load(path: str) → Dict[str, Any][source]¶

Return opt and model states.

Override this method for more specific loading.

classmethod upgrade_opt(opt_from_disk: Opt)[source]¶

Upgrade legacy options when loading an opt file from disk.

This is primarily made available to provide a safe space to handle backwards-compatible behavior. For example, perhaps we introduce a new option today, which wasn’t previously available. We can have the argument have a new default, but fall back to the “legacy” compatibility behavior if the option doesn’t exist.

upgrade_opt provides an opportunity for such checks for backwards compatibility. It is called shortly after loading the opt file from disk, and is called before the Agent is initialized.

Other possible examples include:

Renaming an option,

Deprecating an old option,

Splitting coupled behavior, etc.

Implementations of upgrade_opt should conform to high standards, due to the risk of these methods becoming complicated and difficult to reason about. We recommend the following behaviors:

1. upgrade_opt should only be used to provide backwards compatibility. Other behavior should find a different location. 2. Children should always call the parent’s upgrade_opt first. 3. upgrade_opt should always warn when an option was overwritten. 4. Include comments annotating the date and purpose of each upgrade. 5. Add an integration test which ensures your old work behaves appropriately.

Parameters: opt_from_disk (Opt) – The opt file, as loaded from the .opt file on disk.
Returns: The modified options
Return type: Opt

reset()[source]¶: Clear internal states.

reset_metrics()[source]¶: Reset all TorchAgentMetrics.

act()[source]¶: Call batch_act with the singleton batch.

batch_act(observations)[source]¶

Process a batch of observations (batchsize list of message dicts).

These observations have been preprocessed by the observe method.

Subclasses can override this for special functionality, but if the default behaviors are fine then just override the train_step and eval_step methods instead. The former is called when labels are present in the observations batch; otherwise, the latter is called.

abstract train_step(batch)[source]¶: [Abstract] Process one batch with training labels.

abstract eval_step(batch)[source]¶: [Abstract] Process one batch but do not train on it.

set_interactive_mode(mode, shared)[source]¶: Set interactive mode on or off.

backward(loss, **kwargs)[source]¶

Perform a backward pass.

It is recommended you use this instead of loss.backward(), for integration with distributed training and FP16 training.

update_params()[source]¶

Perform step of optimization.

Handles clipping gradients and adjusting LR schedule if needed. Gradient accumulation is also performed if agent is called with –update-freq.

It is recommended (but not forced) that you call this in train_step.

zero_grad()[source]¶

Zero out optimizer.

It is recommended you call this in train_step. It automatically handles gradient accumulation if agent is called with –update-freq.

Torch Generator Agent¶

Generic PyTorch-based Generator agent.

Implements quite a bit of boilerplate, including forced-decoding loss and a tree search.

Contains the following utilities:

ref:TorchGeneratorAgent class, which serves as a useful parent for generative torch agents.
Beam class which provides some generic beam functionality for classes to use

class parlai.core.torch_generator_agent.SearchBlocklist(dict_agent: DictionaryAgent)[source]¶

Bases: object

Search block list facilitates blocking ngrams from being generated.

__init__(dict_agent: DictionaryAgent) → None[source]¶

class parlai.core.torch_generator_agent.TorchGeneratorModel(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1, **kwargs)[source]¶

Bases: Module, ABC

Abstract TorchGeneratorModel.

This interface expects you to implement model with the following reqs:

Attribute model.encoder: takes input returns tuple (enc_out, enc_hidden, attn_mask)
Attribute model.decoder: takes decoder params and returns decoder outputs after attn
Attribute model.output: takes decoder outputs and returns distr over dictionary

__init__(padding_idx=0, start_idx=1, end_idx=2, unknown_idx=3, input_dropout=0, longest_label=1, **kwargs)[source]¶: Initializes internal Module state, shared by both nn.Module and ScriptModule.

decode_forced(encoder_states, ys)[source]¶

Decode with a fixed, true sequence, computing loss.

Useful for training, or ranking fixed candidates.

Parameters

ys (LongTensor[bsz, time]) – the prediction targets. Contains both the start and end tokens.
encoder_states (model specific) – Output of the encoder. Model specific types.

Returns

pair (logits, choices) containing the logits and MLE predictions

Return type

(FloatTensor[bsz, ys, vocab], LongTensor[bsz, ys])

abstract reorder_encoder_states(encoder_states, indices)[source]¶

Reorder encoder states according to a new set of indices.

This is an abstract method, and must be implemented by the user.

Its purpose is to provide beam search with a model-agnostic interface for beam search. For example, this method is used to sort hypotheses, expand beams, etc.

For example, assume that encoder_states is an bsz x 1 tensor of values

indices = [0, 2, 2]
encoder_states = [[0.1]
                  [0.2]
                  [0.3]]

then the output will be

output = [[0.1]
          [0.3]
          [0.3]]

Parameters

encoder_states (model specific) – output from encoder. type is model specific.
indices (list[int]) – the indices to select over. The user must support non-tensor inputs.

Returns

The re-ordered encoder states. It should be of the same type as encoder states, and it must be a valid input to the decoder.

Return type

model specific

abstract reorder_decoder_incremental_state(incremental_state, inds)[source]¶

Reorder incremental state for the decoder.

Used to expand selected beams in beam search. Unlike reorder_encoder_states, implementing this method is optional. However, without incremental decoding, decoding a single beam becomes O(n^2) instead of O(n), which can make beam search impractically slow.

In order to fall back to non-incremental decoding, just return None from this method.

Parameters

incremental_state (model specific) – second output of model.decoder
inds (LongTensor[n]) – indices to select and reorder over.

Returns

The re-ordered decoder incremental states. It should be the same type as incremental_state, and usable as an input to the decoder. This method should return None if the model does not support incremental decoding.

Return type

model specific

forward(*xs, ys=None, prev_enc=None, maxlen=None, bsz=None)[source]¶

Get output predictions from the model.

Parameters

xs (LongTensor[bsz, seqlen]) – input to the encoder
ys (LongTensor[bsz, outlen]) – Expected output from the decoder. Used for teacher forcing to calculate loss.
prev_enc – if you know you’ll pass in the same xs multiple times, you can pass in the encoder output from the last forward pass to skip recalcuating the same encoder output.
maxlen – max number of tokens to decode. if not set, will use the length of the longest label this model has seen. ignored when ys is not None.
bsz – if ys is not provided, then you must specify the bsz for greedy decoding.

Returns

(scores, candidate_scores, encoder_states) tuple

scores contains the model’s predicted token scores. (FloatTensor[bsz, seqlen, num_features])
candidate_scores are the score the model assigned to each candidate. (FloatTensor[bsz, num_cands])
encoder_states are the output of model.encoder. Model specific types. Feed this back in to skip encoding on the next call.

class parlai.core.torch_generator_agent.PPLMetric(numer: Union[int, float, Tensor], denom: Union[int, float, Tensor] = 1)[source]¶

Bases: AverageMetric

value()[source]¶: Return the value of the metric as a float.

class parlai.core.torch_generator_agent.TorchGeneratorAgent(opt: Opt, shared=None)[source]¶

Bases: TorchAgent, ABC

Abstract Generator agent; only meant to be extended.

TorchGeneratorAgent aims to handle much of the bookkeeping and infrastructure work for any generative models, like seq2seq or transformer. It implements the train_step and eval_step. The only requirement is that your model must be implemented with the TorchGeneratorModel interface.

classmethod upgrade_opt(opt_from_disk: Opt)[source]¶

Upgrade legacy options when loading an opt file from disk.

This is primarily made available to provide a safe space to handle backwards-compatible behavior. For example, perhaps we introduce a new option today, which wasn’t previously available. We can have the argument have a new default, but fall back to the “legacy” compatibility behavior if the option doesn’t exist.

upgrade_opt provides an opportunity for such checks for backwards compatibility. It is called shortly after loading the opt file from disk, and is called before the Agent is initialized.

Other possible examples include:

Renaming an option,

Deprecating an old option,

Splitting coupled behavior, etc.

Implementations of upgrade_opt should conform to high standards, due to the risk of these methods becoming complicated and difficult to reason about. We recommend the following behaviors:

1. upgrade_opt should only be used to provide backwards compatibility. Other behavior should find a different location. 2. Children should always call the parent’s upgrade_opt first. 3. upgrade_opt should always warn when an option was overwritten. 4. Include comments annotating the date and purpose of each upgrade. 5. Add an integration test which ensures your old work behaves appropriately.

Parameters: opt_from_disk (Opt) – The opt file, as loaded from the .opt file on disk.
Returns: The modified options
Return type: Opt

classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) → ParlaiParser[source]¶: Add command line arguments.

__init__(opt: Opt, shared=None)[source]¶: Initialize agent.

build_criterion()[source]¶

Construct and return the loss function.

By default torch.nn.CrossEntropyLoss.

If overridden, this model should produce a sum that can be used for a per-token loss.

set_interactive_mode(mode, shared=False)[source]¶: Turn on interactive mode.

reset_metrics()[source]¶: Reset metrics for reporting loss and perplexity.

share()[source]¶: Share internal states between parent and child instances.

vectorize(*args, **kwargs)[source]¶: Override vectorize for generative models.

batchify(obs_batch, sort=False)[source]¶

Create a batch of valid observations from an unchecked batch.

A valid observation is one that passes the lambda provided to the function, which defaults to checking if the preprocessed ‘text_vec’ field is present which would have been set by this agent’s ‘vectorize’ function.

Returns a namedtuple Batch. See original definition above for in-depth explanation of each field.

If you want to include additional fields in the batch, you can subclass this function and return your own “Batch” namedtuple: copy the Batch namedtuple at the top of this class, and then add whatever additional fields that you want to be able to access. You can then call super().batchify(…) to set up the original fields and then set up the additional fields in your subclass and return that batch instead.

Parameters

obs_batch – List of vectorized observations
sort – Default False, orders the observations by length of vectors. Set to true when using torch.nn.utils.rnn.pack_padded_sequence. Uses the text vectors if available, otherwise uses the label vectors if available.

record_per_token_metrics(batch, loss_per_token)[source]¶: Override this method for custom loss values that require loss_per_token.

compute_loss(batch, return_output=False)[source]¶

Compute and return the loss for the given batch.

Easily overridable for customized loss functions.

If return_output is True, the full output from the call to self.model() is also returned, via a (loss, model_output) pair.

train_step(batch)[source]¶: Train on a single batch of examples.

rank_eval_label_candidates(batch, batchsize)[source]¶

Rank label_candidates during eval_step.

Can be overridden to allow for different ways of ranking candidates. Must have –rank-candidates set to True. By default, we roughly compute PPL to rank the candidates.

eval_step(batch)[source]¶: Evaluate a single batch of examples.

get_prefix_tokens(batch: Batch) → Optional[LongTensor][source]¶

Set prefix tokens to seed decoding at generation time.

By default, we do not utilize prefix tokens, but this is left overridable by child classes.

Returned tensor should be of dimension bsz x len(prefix)

class parlai.core.torch_generator_agent.TreeSearch(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶

Bases: object

Abstract Tree Search class.

It keeps information about beam_size concurrent, developing hypotheses. Concrete implementations make choices about which token to explore next at each point in the tree. Different choices result in different generation algorithms.

__init__(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

set_context(context: LongTensor) → TSType[source]¶

Set the internal context representation and return self.

Parameters: context – a LongTensor representing the input context; used for context ngram blocking, if supplied

set_batch_context(batch_context_list: LongTensor, batch_idx: int, gpu_beam_blocking: bool) → TSType[source]¶

Version of .set_context() that operates on a single element of a batch.

Set the internal context representation and return self.

Parameters

batch_context_list – a list of lists, each one containing the context for one member of the batch
batch_idx – index of the batch
gpu_beam_blocking – whether we are using gpu kernel for beam blocking, if so return a tensor, else return a list.

get_output_from_current_step()[source]¶: Get the output at the current step.

get_backtrack_from_current_step()[source]¶: Get the backtrack at the current step.

abstract select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

advance(logprobs, step)[source]¶: Advance the beam one step.

is_done()[source]¶: Return whether beam search is complete.

get_rescored_finished(n_best=None)[source]¶

Return finished hypotheses according to adjusted scores.

Score adjustment is done according to the Google NMT paper, which penalizes long utterances.

Parameters

n_best – number of finalized hypotheses to return

Returns

list of (tokens, score, token_metadata) 3-tuples, in sorted order, where:

tokens is a tensor of token ids
score is the adjusted log probability of the entire utterance
token_metadata dictionary:
token_logprobs -> a tensor of conditional log probabilities of tokens token_ranks -> a tensor of ranks of tokens in vocabulator, by probability, when sampled

class parlai.core.torch_generator_agent.GreedySearch(*args, **kwargs)[source]¶

Bases: TreeSearch

Greedy search.

Picks the highest probability utterance at each step. Only works with –beam-size 1.

__init__(*args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

class parlai.core.torch_generator_agent.BeamSearch(beam_size, block_ngram=-1, context_block_ngram=-1, padding_token=0, bos_token=1, eos_token=2, min_length=3, device='cpu', length_penalty=0.65, verbose=False, gpu_beam_blocking=False, dict=None)[source]¶

Bases: TreeSearch

Beam search.

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶: Select the next vocabulary item in these beams.

class parlai.core.torch_generator_agent.DelayedBeamSearch(k, delay, *args, **kwargs)[source]¶

Bases: TreeSearch

DelayedBeam: Top-K sampling followed by beam search (Massarelli et al., 2019).

Samples from a truncated distribution where only the most probable K words are considered at each time for the first N tokens, then switches to beam after N steps.

See https://arxiv.org/abs/1911.03587 for details.

__init__(k, delay, *args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

class parlai.core.torch_generator_agent.DelayedNucleusBeamSearch(p, delay, *args, **kwargs)[source]¶

Bases: TreeSearch

__init__(p, delay, *args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

class parlai.core.torch_generator_agent.TopKSampling(k, *args, **kwargs)[source]¶

Bases: TreeSearch

Top-K sampling (Fan et al., 2018).

Samples from a truncated distribution where only the most probable K words are considered at each time.

Typical values of k are 2, 10, 50.

See https://arxiv.org/abs/1805.04833 for details.

__init__(k, *args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

class parlai.core.torch_generator_agent.NucleusSampling(p, *args, **kwargs)[source]¶

Bases: TreeSearch

Nucelus, aka top-p sampling (Holtzman et al., 2019).

Samples from a truncated distribution which covers a fixed CDF proportion of the original distribution.

Typical values of p are 0.3 and 0.9.

See https://arxiv.org/abs/1904.09751 for details.

__init__(p, *args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

get_mask(sorted_probs: Tensor) → Tensor[source]¶

Get probability mask.

Parameters: sorted_probs – sorted probabilities
Return mask: mask out tokens below the p value when sampling.

select_paths(logprobs, prior_scores, current_length) → _PathSelection[source]¶

Select the next vocabulary item in these beams.

Parameters

logprobs – a (beamsize x vocab) tensor of log probabilities. If this is the first turn in the dialogue, it will be a (1 x vocab) tensor.
prior_scores – a (beamsize) tensor of weights with the cumulative running log-probability of each beam. If the first turn, it will be a (1) tensor.
current_length – the current length in tokens

Returns

a {hypothesis_ids, token_ids, scores, token_details} , where:

hypothesis_ids is a LongTensor of hypotheses we’re extending. May have repeats, but should always be (beamsize) long.
token_ids is a (beamsize) LongTensor of next-token choices for each of the hypotheses.
scores is a (beamsize) Tensor with the updated cumulative log-probs of each beam.
token_details is a (beamsize) list of objects with with metadata about each generated token.

class parlai.core.torch_generator_agent.FactualNucleusSampling(p, lambda_decay, omega_bound, p_reset, beam_size, *args, **kwargs)[source]¶

Bases: NucleusSampling

Factual Nucleus Sampling.

See https://arxiv.org/pdf/2206.04624.pdf for more information

__init__(p, lambda_decay, omega_bound, p_reset, beam_size, *args, **kwargs)[source]¶

Instantiate Beam object.

Parameters

beam_size – number of hypothesis in the beam
block_ngram – size of ngrams to block.
context_block_ngram – size of context ngrams to block
padding_token – padding token ID
bos_token – beginning of sentence token ID
eos_token – end of sentence token ID
min_length – minimum length of the predicted sequence
device – What device to use for computations
dict – dictionary, if necessary

update_p(tokens: Tensor)[source]¶

Updates sampling P value according to tokens generated.

When tokens are not punctuation, p is decayed by lambda_decay factor.

Otherwise, we reset the p value.

Parameters: tokens – sampled tokens.

get_mask(sorted_probs: Tensor) → Tensor[source]¶

Get probability mask.

Parameters: sorted_probs – sorted probabilities
Return mask: mask out tokens below the p value when sampling.

Torch Ranker Agent¶

Torch Ranker Agents provide functionality for building ranking models.

See the TorchRankerAgent tutorial for examples.

class parlai.core.torch_ranker_agent.TorchRankerAgent(opt: Opt, shared=None)[source]¶

Bases: TorchAgent

Abstract TorchRankerAgent class; only meant to be extended.

TorchRankerAgents aim to provide convenient functionality for building ranking models. This includes:

Training/evaluating on candidates from a variety of sources.
Computing hits@1, hits@5, mean reciprical rank (MRR), and other metrics.
Caching representations for fast runtime when deploying models to production.

classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) → ParlaiParser[source]¶: Add CLI args.

__init__(opt: Opt, shared=None)[source]¶: Initialize agent.

build_criterion()[source]¶

Construct and return the loss function.

By default torch.nn.CrossEntropyLoss.

set_interactive_mode(mode, shared=False)[source]¶

Set interactive mode defaults.

In interactive mode, we set ignore_bad_candidates to True. Additionally, we change the eval_candidates to the option specified in –interactive-candidates, which defaults to False.

Interactive mode possibly changes the fixed candidates path if it does not exist, automatically creating a candidates file from the specified task.

abstract score_candidates(batch, cand_vecs, cand_encs=None)[source]¶

Given a batch and candidate set, return scores (for ranking).

Parameters

batch (Batch) – a Batch object (defined in torch_agent.py)
cand_vecs (LongTensor) – padded and tokenized candidates
cand_encs (FloatTensor) – encoded candidates, if these are passed into the function (in cases where we cache the candidate encodings), you do not need to call self.model on cand_vecs

is_valid(obs)[source]¶

Override from TorchAgent.

Check to see if label candidates contain the label.

train_step(batch)[source]¶: Train on a single batch of examples.

eval_step(batch)[source]¶: Evaluate a single batch of examples.

block_repeats(cand_preds)[source]¶: Heuristic to block a model repeating a line from the history.

share()[source]¶: Share model parameters.

set_vocab_candidates(shared)[source]¶

Load the tokens from the vocab as candidates.

self.vocab_candidates will contain a [num_cands] list of strings self.vocab_candidate_vecs will contain a [num_cands, 1] LongTensor

set_fixed_candidates(shared)[source]¶

Load a set of fixed candidates and their vectors (or vectorize them here).

self.fixed_candidates will contain a [num_cands] list of strings self.fixed_candidate_vecs will contain a [num_cands, seq_len] LongTensor

See the note on the –fixed-candidate-vecs flag for an explanation of the ‘reuse’, ‘replace’, or path options.

Note: TorchRankerAgent by default converts candidates to vectors by vectorizing in the common sense (i.e., replacing each token with its index in the dictionary). If a child model wants to additionally perform encoding, it can overwrite the vectorize_fixed_candidates() method to produce encoded vectors instead of just vectorized ones.

load_candidates(path, cand_type='vectors')[source]¶: Load fixed candidates from a path.

encode_candidates(padded_cands)[source]¶

Convert the given candidates to vectors.

This is an abstract method that must be implemented by the user.

Parameters: padded_cands – The padded candidates.

vectorize_fixed_candidates(cands_batch, add_start=False, add_end=False)[source]¶

Convert a batch of candidates from text to vectors.

Parameters: cands_batch – a [batchsize] list of candidates (strings)
Returns: a [num_cands] list of candidate vectors

By default, candidates are simply vectorized (tokens replaced by token ids). A child class may choose to overwrite this method to perform vectorization as well as encoding if so desired.

Torch Classifier Agent¶

Torch Classifier Agents classify text into a fixed set of labels.

class parlai.core.torch_classifier_agent.ConfusionMatrixMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶

Bases: Metric

Class that keeps count of the confusion matrix for classification.

Also provides helper methods computes precision, recall, f1, weighted_f1 for classification.

property macro_average: bool¶: Indicates whether this metric should be macro-averaged when globally reported.

__init__(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0) → None[source]¶

class parlai.core.torch_classifier_agent.PrecisionMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶

Bases: ConfusionMatrixMetric

Class that takes in a ConfusionMatrixMetric and computes precision for classifier.

value() → float[source]¶: Return the value of the metric as a float.

class parlai.core.torch_classifier_agent.RecallMetric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶

Bases: ConfusionMatrixMetric

Class that takes in a ConfusionMatrixMetric and computes recall for classifier.

value() → float[source]¶: Return the value of the metric as a float.

class parlai.core.torch_classifier_agent.ClassificationF1Metric(true_positives: Union[int, float, Tensor] = 0, true_negatives: Union[int, float, Tensor] = 0, false_positives: Union[int, float, Tensor] = 0, false_negatives: Union[int, float, Tensor] = 0)[source]¶

Bases: ConfusionMatrixMetric

Class that takes in a ConfusionMatrixMetric and computes f1 for classifier.

value() → float[source]¶: Return the value of the metric as a float.

class parlai.core.torch_classifier_agent.AUCMetrics(class_name: Union[int, str], max_bucket_dec_places: int = 3, pos_dict: Optional[Counter[float]] = None, neg_dict: Optional[Counter[float]] = None)[source]¶

Bases: Metric

Computes Area Under ROC Curve (AUC) metrics.

Does so by keeping track of positives’ and negatives’ probability score counts in Counters or dictionaries. Note the introduction of max_bucket_dec_places; this integer number determines the number of digits to save for the probability scores. A higher max_bucket_dec_places will a more accurate estimate of the exact AUC metric, but may also use more space.

NOTE: currently only used for classifiers in the eval_model script; to use, add the argument -auc <max_bucket_dec_places> when calling eval_model script

property macro_average: bool¶: Indicates whether this metric should be macro-averaged when globally reported.

__init__(class_name: Union[int, str], max_bucket_dec_places: int = 3, pos_dict: Optional[Counter[float]] = None, neg_dict: Optional[Counter[float]] = None)[source]¶

update_raw(true_labels: List[Union[int, str]], pos_probs: List[float], class_name)[source]¶: given the true/golden labels and the probabilities of the positive class, we will update our bucket dictionaries of positive and negatives (based on the class_name); max_bucket_dec_places is also used here to round the probabilities and possibly.

value() → float[source]¶: Return the value of the metric as a float.

class parlai.core.torch_classifier_agent.WeightedF1Metric(metrics: Dict[str, ClassificationF1Metric])[source]¶

Bases: Metric

Class that represents the weighted f1 from ClassificationF1Metric.

property macro_average: bool¶: Indicates whether this metric should be macro-averaged when globally reported.

__init__(metrics: Dict[str, ClassificationF1Metric]) → None[source]¶

value() → float[source]¶: Return the value of the metric as a float.

class parlai.core.torch_classifier_agent.TorchClassifierAgent(opt: Opt, shared=None)[source]¶

Bases: TorchAgent

Abstract Classifier agent. Only meant to be extended.

TorchClassifierAgent aims to handle much of the bookkeeping any classification model.

classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) → ParlaiParser[source]¶: Add CLI args.

__init__(opt: Opt, shared=None)[source]¶: Initialize agent.

share()[source]¶: Share model parameters.

train_step(batch)[source]¶: Train on a single batch of examples.

eval_step(batch)[source]¶: Evaluate a single batch of examples.

score(batch)[source]¶

Given a batch and labels, returns the scores.

Parameters: batch – a Batch object (defined in torch_agent.py)
Returns: a [bsz, num_classes] FloatTensor containing the score of each class.

Torch Image Agent¶

Subclass of TorchAgent used for handling image features.

class parlai.core.torch_image_agent.TorchImageAgent(opt, shared=None)[source]¶

Bases: TorchAgent

Subclass of TorchAgent that allows for encoding image features.

Provides flags and utility methods.

classmethod add_cmdline_args(parser: ParlaiParser, partial_opt: Optional[Opt] = None) → ParlaiParser[source]¶: Add command-line arguments specifically for this agent.

__init__(opt, shared=None)[source]¶: Initialize agent.

batchify(obs_batch: List[Message], sort: bool = False) → Batch[source]¶: Override to handle image features.

abstract batchify_image_features(batch: Batch) → Batch[source]¶

Put this batch of images into the correct format for this agent.

self._process_image_features() will likely be useful for this.