### AmazonQA¶

This dataset contains Question and Answer data from Amazon, totaling around 1.4 million answered questions.

### AQuA¶

Dataset containing algebraic word problems with rationales for their answers.

### bAbI 1k¶

20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models.

Notes

### bAbI 10k¶

20 synthetic tasks that each test a unique aspect of text and reasoning, and hence test different capabilities of learning models.

Notes

CoQA is a large-scale dataset for building Conversational Question Answering systems. The goal of the CoQA challenge is to measure the ability of machines to understand a text passage and answer a series of interconnected questions that appear in a conversation. CoQA is pronounced as coca.

### HotpotQA¶

HotpotQA is a dataset for multi-hop question answering.The overall setting is that given some context paragraphs(e.g., a few paragraphs, or the entire Web) and a question,a QA system answers the question by extracting a span of textfrom the context. It is necessary to perform multi-hop reasoningto correctly answer the question.

### NarrativeQA¶

Notes

You can access summaries only task for NarrativeQA by using task ‘narrative_qa:summaries’. By default, only stories are provided.

### Natural Questions¶

An open domain question answering dataset. Each example contains real questions that people searched for in Google and the content of the a Wikipedia article that was amongst the top 5 search resutls for that query, and its annotations. The annotations have the options of a long answer that is seleced from span of major content entities in the Wikipedia article (e.g., paragraphs, tables), a short answerthat is selected from one or more short span of words in the article, or ‘yes/no’. The existence of any of these answer formats depends on whether the main question can be answered, given the article; if not they are left empty.

Notes

Since this task uses ChunkTeacher, it should be used with streaming.

Question Answering in Context is a dataset for modeling, understanding, and participating in information seeking dialog. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text. QuAC introduces challenges not found in existing machine comprehension datasets: its questions are often more open-ended, unanswerable, or only meaningful within the dialog context.

### Simple Questions¶

Open-domain QA dataset based on Freebase triples.

Open-domain QA dataset answerable from a given paragraph from Wikipedia.

Open-domain QA dataset answerable from a given paragraph from Wikipedia.

### TriviaQA¶

Open-domain QA dataset with question-answer-evidence triples.

### Web Questions¶

Open-domain QA dataset from Web queries.

### WikiQA¶

Open domain QA from Wikipedia dataset

### InsuranceQA¶

Task which requires agents to identify high quality answers composed by professionals with deep domain knowledge.

### MS_MARCO¶

A large scale Machine Reading Comprehension Dataset with questions sampled from real anonymized user queries and contexts from web documents.

### QAngaroo¶

Reading Comprehension with Multiple Hop. Including two datasets: WIKIHOP built on on wikipedia, MEDHOP built on paper abstracts from PubMed.

### ELI5¶

This dataset contains Question and Answer data from Reddit explainlikeimfive posts and comments.

### DREAM¶

A multiple-choice answering dataset based on multi-turn, multi-party dialogue.

### C3¶

A multiple-choice answering dataset in Chinese based on a prior passage.

### CommonSenseQA¶

CommonSenseQA is a multiple-choice Q-A dataset that relies on commonsense knowlegde to predict correct answers.

### BookTest¶

Sentence completion given a few sentences as context from a book. A larger version of CBT.

### Children’s Book Test (CBT)¶

Sentence completion given a few sentences as context from a children’s book.

### QA CNN¶

Cloze dataset based on a missing (anonymized) entity phrase from a CNN article

### QA Daily Mail¶

Cloze dataset based on a missing (anonymized) entity phrase from a Daily Mail article.

### Coached Conversational Preference Elicitation¶

A dataset consisting of 502 dialogs with 12,000 annotated utterances between a user and an assistant discussing movie preferences in natural language. It was collected using a Wizard-of-Oz methodology between two paid crowd-workers, where one worker plays the role of an ‘assistant’, while the other plays the role of a ‘user’.

### Dialog Based Language Learning: bAbI Task¶

Short dialogs based on the bAbI tasks, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve.

Notes

Tasks can be accessed with a format like: ‘parlai display_data -t dbll_babi:task:2_p0.5’ which specifies task 2, and policy with 0.5 answers correct, see the paper for more details of the tasks.

### Dialog Based Language Learning: WikiMovies Task¶

Short dialogs based on WikiMovies, but in the form of a question from a teacher, the answer from the student, and finally a comment on the answer from the teacher. The aim is to find learning models that use the comments to improve.

### Dialog bAbI¶

Simulated dialogs of restaurant booking

### Dialog bAbI+¶

bAbI+ is an extension of the bAbI Task 1 dialogues with everyday incremental dialogue phenomena (hesitations, restarts, and corrections) which model the disfluencies and communication problems in everyday spoken interaction in real-world environments.

### MutualFriends¶

Task where two agents must discover which friend of theirs is mutual based on the friends’s attributes.

### Movie Dialog QA Recommendations¶

Dialogs discussing questions about movies as well as recommendations.

### Personalized Dialog Full Set¶

Simulated dataset of restaurant booking focused on personalization based on user profiles.

### Personalized Dialog Small Set¶

Simulated dataset of restaurant booking focused on personalization based on user profiles.

Dataset of synthetic shapes described by attributes, for agents to play a cooperative QA game.

### SCAN¶

SCAN is a set of simple language-driven navigation tasks for studying compositional learning and zero-shot generalization. The SCAN tasks were inspired by the CommAI environment, which is the origin of the acronym (Simplified versions of the CommAI Navigation tasks).

### MultiWOZ 2.0¶

A fully labeled collection of human-written conversations spanningover multiple domains and topics.

### MultiWOZ 2.1¶

A fully labeled collection of human-written conversations spanningover multiple domains and topics.

### OneCommon¶

A collaborative referring task which requires advanced skills of common grounding under continuous and partially-observable context. This code also includes reference-resolution annotation.

### AirDialogue¶

Task for goal-oriented dialogue using airplane booking conversations between agents and customers.

### ReDial¶

Annotated dataset of dialogues where users recommend movies to each other.

The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between a human and a virtual assistant.

The second version of TaskMaster, containing Wizard-of-Oz dialogues for task oriented dialogue in 7 domains.

### Blended Skill Talk¶

A dataset of 7k conversations explicitly designed to exhibit multiple conversation modes: displaying personality, having empathy, and demonstrating knowledge.

### Cornell Movie¶

Fictional conversations extracted from raw movie scripts.

### Dialogue NLI¶

Dialogue NLI is a dataset that addresses the issue of consistency in dialogue models.

### DSTC7 subtrack 1 - ubuntu¶

DSTC7 is a competition which provided a dataset of dialogs very similar to the ubuntu dataset. In particular, the subtrack 1 consists in predicting the next utterance.

### Movie Dialog Reddit¶

Dialogs discussing Movies from Reddit (the Movies SubReddit).

### Open Subtitles¶

Dataset of dialogs from movie scripts.

### Ubuntu¶

Dialogs between an Ubuntu user and an expert trying to fix issue, we use the V2 version, which cleaned the data to some extent.

### ConvAI2¶

A chit-chat dataset based on PersonaChat for a NIPS 2018 competition.

### ConvAI_ChitChat¶

Human-bot dialogues containing free discussions of randomly chosen paragraphs from SQuAD.

### Persona-Chat¶

A chit-chat dataset where paired Turkers are given assigned personas and chat to try to get to know each other.

A chit-chat dataset by GoogleAI providing high quality goal-oriented conversationsThe dataset hopes to provoke interest in written vs spoken languageBoth the datasets consists of two-person dialogs:Spoken: Created using Wizard of Oz methodology.Written: Created by crowdsourced workers who were asked to write the full conversation themselves playing roles of both the user and assistant.

Twitter data found on GitHub. No train/valid/test split was provided so 10k for valid and 10k for test was chosen at random.

### ConvAI2_wild_evaluation¶

Dataset collected during the wild evaluation of ConvaAI2 participants bots. 60% train, 20% valid and 20% test is chosen at random from the whole dataset.

### Image_Chat¶

202k dialogues and 401k utterances over 202k images from the YFCC100m dataset using 215 possible personality traits

Notes

If you have already downloaded the images, please specify with the --yfcc-path flag, as the image download script takes a very long time to run

### Image_Chat_Generation¶

Image Chat task to train generative model

### Wizard_of_Wikipedia¶

A dataset with conversations directly grounded with knowledge retrieved from Wikipedia. Contains 201k utterances from 22k dialogues spanning over 1300 diverse topics, split into train, test, and valid sets. The test and valid sets are split into two sets each: one with overlapping topics with the train set, and one with unseen topics.

Notes

To access the different valid/test splits (unseen/seen), specify the corresponding split (random_split for seen, topic_split for unseen) after the last colon in the task. E.g. wizard_of_wikipedia:WizardDialogKnowledgeTeacher:random_split

### Wizard_of_Wikipedia_Generator¶

Wizard of Wikipedia task to train generative models

### Daily Dialog¶

A dataset of chitchat dialogues with strong annotations for topic, emotion and utterance act. This version contains both sides of every conversation, and uses the official train/valid/test splits from the original authors.

### Empathetic Dialogues¶

A dataset of 25k conversations grounded in emotional situations to facilitate training and evaluating dialogue systems.Dataset has been released under the CC BY-NC license.

Notes

EmpatheticDialoguesTeacher returns examples like so:

• [text]: context line (previous utterance by ‘speaker’)

• [labels]: label line (current utterance by ‘listener’)

• [situation]: a 1-3 sentence description of the situation that the conversation is

• [emotion]: one of 32 emotion words

Other optional fields:

• [prepend_ctx]: fasttext prediction on context line - or None

• [prepend_cand]: fasttext prediction on label line (candidate) - or None

• [deepmoji_ctx]: vector encoding from deepmoji penultimate layer - or None

• [deepmoji_cand]: vector encoding from deepmoji penultimate layer for label line (candidate) - or None

### Image Grounded Conversations¶

A dataset of (image, context, question, answer) tuples, comprised of eventful images taken from Bing, Flickr, and COCO.

### Holl-E¶

Sequence of utterances and responses with background knowledge aboutmovies. From the Holl-E dataset.

### ReDial¶

Annotated dataset of dialogues where users recommend movies to each other.

### Style-Controlled Generation¶

Dialogue datasets (BlendedSkillTalk, ConvAI2, EmpatheticDialogues, and Wizard of Wikipedia) labeled with personalities taken from the Image-Chat dataset. Used for the style-controlled generation project

### Deal or No Deal¶

End-to-end negotiation task which requires two agents to agree on how to divide a set of items, with each agent assigning different values to each item.

### FVQA¶

The FVQA, a VQA dataset which requires, and supports, much deeper reasoning. We extend a conventional visual question answering dataset, which contains image-question-answer triplets, through additional image-question-answer-supporting fact tuples. The supporting fact is represented as a structural triplet, such as <Cat,CapableOf,ClimbingTrees>.

### VQAv2¶

Bigger, more balanced version of the original VQA dataset.

### VisDial¶

Task which requires agents to hold a meaningful dialog about visual content.

### MNIST_QA¶

Task which requires agents to identify which number they are seeing. From the MNIST dataset.

### CLEVR¶

A visual reasoning dataset that tests abilities such as attribute identification, counting, comparison, spatial relationships, and logical operations.

### nlvr¶

Cornell Natural Language Visual Reasoning (NLVR) is a language grounding dataset based on pairs of natural language statements grounded in synthetic images.

### Flickr30k¶

30k captioned images pulled from Flickr compiled by UIUC.

### COCO_Captions¶

COCO annotations derived from the 2015 COCO Caption Competition.

### Personality_Captions¶

200k images from the YFCC100m dataset with captions conditioned on one of 215 personalities.

Notes

If you have already downloaded the images, please specify with the --yfcc-path flag, as the image download script takes a very long time to run

### Image_Chat¶

202k dialogues and 401k utterances over 202k images from the YFCC100m dataset using 215 possible personality traits

Notes

If you have already downloaded the images, please specify with the --yfcc-path flag, as the image download script takes a very long time to run

### Image_Chat_Generation¶

Image Chat task to train generative model

### Image Grounded Conversations¶

A dataset of (image, context, question, answer) tuples, comprised of eventful images taken from Bing, Flickr, and COCO.

### MultiNLI¶

A dataset designed for use in the development and evaluation of machine learning models for sentence understanding. Each example contains a premise and hypothesis. Model has to predict whether premise and hypothesis entail, contradict or are neutral to each other.

### IWSLT14¶

2014 International Workshop on Spoken Language task, currently only includes en_de and de_en.

### ConvAI_ChitChat¶

Human-bot dialogues containing free discussions of randomly chosen paragraphs from SQuAD.

### SST Sentiment Analysis¶

Dataset containing sentiment trees of movie reviews. We use the modified binary sentence analysis subtask given by the DecaNLP paper here.

### CNN/DM Summarisation¶

Dataset collected from CNN and the Daily Mail with summaries as labels, Implemented as part of the DecaNLP task.

### QA-SRL Semantic Role Labeling¶

QA dataset implemented as part of the DecaNLP task.

### QA-ZRE Relation Extraction¶

Zero Shot relation extraction task implemented as part of the DecaNLP task.

### WOZ restuarant reservation (Goal-Oriented Dialogue)¶

Dataset containing dialogues dengotiating a resturant reservation. Implemented as part of the DecaNLP task, focused on the change in the dialogue state.