core.dialog_teacher

class parlai.core.dialog_teacher.DialogTeacher(opt, shared=None)

A base teacher class for doing dialog with fixed chat logs.

This class provides a set a basic functionality:

  • uses data class to store and query text data
  • generates action tables to send to the student agent from the data
  • metrics tracking count of sent vs correctly answered queries

If you have opt.numthreads > 1, this also activates a shared memory array for the data and lock-protected shared-memory metrics.

In order to subclass this class, you must implement setup_data() in your class (or subclass another class which does, like FbDialogTeacher), which reads your data file as an iterator.

label_candidates()

Returns None by default, but override this in children (such as FbDialogTeacher) to load up candidate labels for every example.

observe(observation)

Process observation for metrics.

act()

Send new dialog message.

class parlai.core.dialog_teacher.DialogData(opt, data_loader, cands=None, shared=None)

Provides a data structure for accessing textual dialog data. This can be used whenever the dialog data is a fixed log of chats (i.e not a simulator setting). The logs can include dialog text and possibly supervised labels, candidate labels and rewards.

All these are stored in this internal data format which is used by the DialogTeacher class.

data_loader is an iterable, with each call returning:

(x, ...), new_episode?

Where

  • x is a query and possibly context

... can contain additional fields, specifically

  • y is an iterable of label(s) for that query
  • r is the str reward for getting that query correct
  • c is an iterable of label candidates that the student can choose from
  • i is a str path to an image on disk, which will be loaded by the data class at request-time. should always point to the raw image file.
  • new_episode? is a boolean value specifying whether that example is the start of a new episode. If you don’t use episodes set this to True every time.

cands can be set to provide a list of candidate labels for every example in this dataset, which the agent can choose from (the correct answer should be in this set).

random tells the data class whether or not to visit episodes sequentially or randomly when returning examples to the caller.

__len__()

Returns total number of entries available. Each episode has at least one entry, but might have many more.

num_episodes()

Return number of episodes in the dataset.

get(episode_idx, entry_idx=0)

Returns a specific entry from the dataset.