DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension

What is DREAM?

DREAM is a multiple-choice Dialogue-based REAding comprehension exaMination dataset. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding.

DREAM paper Download data & code

DREAM contains 10,197 multiple choice questions for 6,444 dialogues, collected from English-as-a-foreign-language examinations designed by human experts. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.

Sample Problem

Dialogue

W: Tom, look at your shoes. How dirty they are! You must clean them.

M: Oh, mum, I just cleaned them yesterday.

W: They are dirty now. You must clean them again.

M: I do not want to clean them today. Even if I clean them today, they will get dirty again tomorrow.

W: All right, then.

M: Mum, give me something to eat, please.

W: You had your breakfast in the morning, Tom, and you had lunch at school.

M: I am hungry again.

W: Oh, hungry? But if I give you something to eat today, you will be hungry again tomorrow.

Q1 Why did the woman say that she wouldn’t give him anything to eat?

A. Because his mother wants to correct his bad habit. ✔

B. Because he had lunch at school.

C. Because his mother wants to leave him hungry.

Report Your Results

If you have new results, please send an email to dream@dataset.org with the link to your paper!

Sibling Dataset

Check C³ (https://dataset.org/c3) if you are interested in free-form multiple-choice machine reading comprehension for Chinese.

Leaderboard

Report Time	Model	Accuracy
	Human Ceiling Performance Tencent & Cornell & UW & AI2 Sun et al., 2019	98.6
	Human Performance Tencent & Cornell & UW & AI2 Sun et al., 2019	95.5
Jun 20, 2022	ALBERT-xxlarge + HRCA+ + Multi-Task Learning Waseda University Zhang and Yamana, 2022	92.6
Feb 26, 2020	ALBERT-xxlarge + DUMA + Multi-Task Learning IBM Research AI Wan et al., 2020	91.8
Jun 20, 2022	ALBERT-xxlarge + HRCA+ Waseda University Zhang and Yamana, 2022	91.6
Feb 05, 2020	ALBERT-xxlarge + DUMA SJTU & Huawei Noah’s Ark Lab Zhu et al., 2020	90.4
Nov 07, 2021	ALBERT-xxlarge + DUMA + Retraining CAS & UCAS & China Merchants Bank Ju et al., 2021	90.2
Nov 07, 2021	ALBERT-xxlarge + Retraining CAS & UCAS & China Merchants Bank Ju et al., 2021	90.0
Jul 14, 2021	ALBERT-xxlarge + RekNet Shanghai Jiao Tong University Zhao et al., 2021	89.6
Oct 01, 2019	RoBERTa-Large + MMM MIT & Amazon Alexa AI Jin et al., 2019	88.9
Jul 21, 2019	XLNet-Large River Valley High School, Singapore https://github.com/NoviScl/XLNet_DREAM	72.0
Dec 19, 2019	BERT-Large + WAE The Hong Kong University of Science and Technology Kim et al., 2020	69.0
Apr 25, 2019	BERT-Large https://github.com/nlpdata/mrc_bert_baseline	66.8
Dec 19, 2019	BERT-Base + WAE The Hong Kong University of Science and Technology Kim et al., 2020	64.7
Apr 23, 2019	BERT-Base https://github.com/nlpdata/mrc_bert_baseline	63.2
Feb 01, 2019	GBDT++ and FTLM++ (ensemble) Tencent & Cornell & UW & AI2 Sun et al., 2019	59.5
Feb 23, 2019	EER + FT Tencent & TTIC & Cornell & UPenn Wang et al., 2019	57.7
Feb 01, 2019	FTLM++ Tencent & Cornell & UW & AI2 Sun et al., 2019	57.4
Feb 01, 2019	Finetuned Transformer LM (*) OpenAI Radford et al., 2018	55.5
Feb 01, 2019	GBDT++ Tencent & Cornell & UW & AI2 Sun et al., 2019	52.8
Feb 01, 2019	DSW++ Tencent & Cornell & UW & AI2 Sun et al., 2019	50.1
Feb 01, 2019	Co-Matching (*) Singapore Management University & IBM Research Wang et al., 2018	45.5
Feb 01, 2019	Distance-Based Sliding Window (*) Microsoft Research Richardson et al., 2013	44.6
Feb 01, 2019	Sliding Window (*) Microsoft Research Richardson et al., 2013	42.5
Feb 01, 2019	Word Matching (*) Microsoft Research Yih et al., 2013	42.0
Feb 01, 2019	Gated-Attention Reader (*) Carnegie Mellon University Dhingra et al., 2017	41.3
Feb 01, 2019	Stanford Attentive Reader (*) Stanford University Chen et al., 2016	39.8

*: Run and reported by Sun et al., 2019.