Authors: Alexander Holden Miller, Margaret Li
As an alternative to this quick start tutorial, you may also consider our Google Colab tutorial, which takes you through fine-tuning the small version of BlenderBot (90M).
First, make sure you have Python 3. Now open up terminal and run the following.
Clone ParlAI Repository:
git clone https://github.com/facebookresearch/ParlAI.git ~/ParlAI
cd ~/ParlAI; python setup.py develop
This will add the parlai command to your system.
Several models have additional requirements, such as PyTorch.
View a task & train a model¶
Let’s start by printing out the first few examples of the bAbI tasks, task 1.
# display examples from bAbI 10k task 1 parlai display_data --task babi:task10k:1
Now let’s try to train a model on it (even on your laptop, this should train fast).
# train MemNN using batch size 1 and for 5 epochs parlai train_model --task babi:task10k:1 --model-file /tmp/babi_memnn --batchsize 1 --num-epochs 5 --model memnn --no-cuda
Let’s print some of its predictions to make sure it’s working.
# display predictions for model save at specified file on bAbI task 1 parlai display_model --task babi:task10k:1 --model-file /tmp/babi_memnn --eval-candidates vocab
The “eval_labels” and “MemNN” lines should (usually) match!
Let’s try asking the model a question ourselves.
# interact with saved model parlai interactive --model-file /tmp/babi_memnn --eval-candidates vocab ... Enter your message: John went to the hallway.\n Where is John?
Hopefully the model gets this right!
Train a Transformer on Twitter¶
Now let’s try training a Transformer (Vaswani, et al 2017) ranker model. Make sure to complete this section on a GPU with PyTorch installed.
We’ll be training on the Twitter task, which is a dataset of tweets and
replies. There’s more information on tasks in these docs, including a
full list of tasks and
on specifying arguments for training and evaluation (like the
-t <task> argument used here).
Let’s begin again by printing the first few examples.
# display first examples from twitter dataset parlai display_data --task twitter
Now, we’ll train the model. This will take a while to reach convergence.
# train transformer ranker parlai train_model --task twitter --model-file /tmp/tr_twitter --model transformer/ranker --batchsize 16 --validation-every-n-secs 3600 --candidates batch --eval-candidates batch --data-parallel True
You can modify some of the command line arguments we use here -we set batch size to 10, run validation every 3600 seconds, and take candidates from the batch for training and evaluation.
The train model script will by default save the model after achieving
best validation results so far. The Twitter task is quite large, and
validation is run by default after each epoch (full pass through the
train data), but we want to save our model more frequently so we set
validation to run once an hour with
This train model script evaluates the model on the valid and test sets at the end of training, but if we wanted to evaluate a saved model -perhaps to compare the results of our newly trained Transformer against the BlenderBot 90M baseline from our Model Zoo, we could do the following:
# Evaluate the tiny BlenderBot model on twitter data parlai eval_model --task twitter --model-file zoo:blender/blender_90M/model
Finally, let’s print some of our transformer’s predictions with the same display_model script from above.
# display predictions for model saved at specific file on twitter parlai display_model --task twitter --model-file /tmp/tr_twitter --eval-candidates batch
Add a simple model¶
Let’s put together a super simple model which will print the parsed version of what is said to it.
First let’s set it up.
mkdir parlai/agents/parrot touch parlai/agents/parrot/parrot.py
We’ll inherit the TorchAgent parsing code so we don’t have to write it ourselves. Open parrot.py and copy the following:
from parlai.core.torch_agent import TorchAgent, Output class ParrotAgent(TorchAgent): def train_step(self, batch): pass def eval_step(self, batch): # for each row in batch, convert tensor to back to text strings return Output([self.dict.vec2txt(row) for row in batch.text_vec]) def build_model(self, batch): # Our agent doesn't have a real model, so we will return a placeholder # here. return None
Now let’s test it out:
parlai display_model --task babi:task10k:1 --model parrot
You’ll notice the model is always outputting the “unknown” token. This token is automatically selected because the dictionary doesn’t recognize any tokens, because we haven’t built a dictionary yet. Let’s do that now.
parlai build_dict --task babi:task10k:1 --dict-file /tmp/parrot.dict
Now let’s try our Parrot agent again.
parlai display_model --task babi:task10k:1 --model parrot --dict-file /tmp/parrot.dict
This ParrotAgent implements
eval_step, one of two abstract functions
in TorchAgent. The other is
train_step. You can easily and quickly
build a model agent by creating a class which implements only these two
functions with the most typical custom code for a model, and inheriting
vectorization and batching from TorchAgent.
As needed, you can also override any functions to change the default argument values or to override the behavior with your own. For example, you could change the vectorizer to return numpy arrays instead of Torch Tensors.
To see more details about ParlAI’s general structure, how tasks and models are set up, or how to use Mechanical Turk, Messenger, Tensorboard, and more –check out the other tutorials.