What is DREAM?
DREAM is a multiple-choice Dialogue-based REAding comprehension exaMination dataset. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding.
DREAM contains 10,197 multiple choice questions for 6,444 dialogues, collected from English-as-a-foreign-language examinations designed by human experts. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.
Sample Problem
W: Tom, look at your shoes. How dirty they are! You must clean them. M: Oh, mum, I just cleaned them yesterday. W: They are dirty now. You must clean them again. M: I do not want to clean them today. Even if I clean them today, they will get dirty again tomorrow. W: All right, then. M: Mum, give me something to eat, please. W: You had your breakfast in the morning, Tom, and you had lunch at school. M: I am hungry again. W: Oh, hungry? But if I give you something to eat today, you will be hungry again tomorrow. |
Q1 Why did the woman say that she wouldn’t give him anything to eat? A. Because his mother wants to correct his bad habit. ✔ B. Because he had lunch at school. C. Because his mother wants to leave him hungry. |
Report Your Results
If you have new results, please send an email to dream@dataset.org with the link to your paper!
Sibling Dataset
Check C3 (https://dataset.org/c3) if you are interested in free-form multiple-choice machine reading comprehension for Chinese.
Leaderboard
Report Time | Model | Accuracy |
---|---|---|
Human Ceiling Performance
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
98.6 | |
Human Performance
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
95.5 | |
Jun 20, 2022 |
ALBERT-xxlarge + HRCA+ + Multi-Task Learning
Waseda University Zhang and Yamana, 2022 |
92.6 |
Feb 26, 2020 |
ALBERT-xxlarge + DUMA + Multi-Task Learning
IBM Research AI Wan et al., 2020 |
91.8 |
Jun 20, 2022 |
ALBERT-xxlarge + HRCA+
Waseda University Zhang and Yamana, 2022 |
91.6 |
Feb 05, 2020 |
ALBERT-xxlarge + DUMA
SJTU & Huawei Noah’s Ark Lab Zhu et al., 2020 |
90.4 |
Nov 07, 2021 |
ALBERT-xxlarge + DUMA + Retraining
CAS & UCAS & China Merchants Bank Ju et al., 2021 |
90.2 |
Nov 07, 2021 |
ALBERT-xxlarge + Retraining
CAS & UCAS & China Merchants Bank Ju et al., 2021 |
90.0 |
Jul 14, 2021 |
ALBERT-xxlarge + RekNet
Shanghai Jiao Tong University Zhao et al., 2021 |
89.6 |
Oct 01, 2019 |
RoBERTa-Large + MMM
MIT & Amazon Alexa AI Jin et al., 2019 |
88.9 |
Jul 21, 2019 |
XLNet-Large
River Valley High School, Singapore https://github.com/NoviScl/XLNet_DREAM |
72.0 |
Dec 19, 2019 |
BERT-Large + WAE
The Hong Kong University of Science and Technology Kim et al., 2020 |
69.0 |
Apr 25, 2019 | BERT-Large https://github.com/nlpdata/mrc_bert_baseline | 66.8 |
Dec 19, 2019 |
BERT-Base + WAE
The Hong Kong University of Science and Technology Kim et al., 2020 |
64.7 |
Apr 23, 2019 | BERT-Base https://github.com/nlpdata/mrc_bert_baseline | 63.2 |
Feb 01, 2019 |
GBDT++ and FTLM++ (ensemble)
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
59.5 |
Feb 23, 2019 |
EER + FT
Tencent & TTIC & Cornell & UPenn Wang et al., 2019 |
57.7 |
Feb 01, 2019 |
FTLM++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
57.4 |
Feb 01, 2019 |
Finetuned Transformer LM (*)
OpenAI Radford et al., 2018 |
55.5 |
Feb 01, 2019 |
GBDT++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
52.8 |
Feb 01, 2019 |
DSW++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
50.1 |
Feb 01, 2019 |
Co-Matching (*)
Singapore Management University & IBM Research Wang et al., 2018 |
45.5 |
Feb 01, 2019 |
Distance-Based Sliding Window (*)
Microsoft Research Richardson et al., 2013 |
44.6 |
Feb 01, 2019 |
Sliding Window (*)
Microsoft Research Richardson et al., 2013 |
42.5 |
Feb 01, 2019 |
Word Matching (*)
Microsoft Research Yih et al., 2013 |
42.0 |
Feb 01, 2019 |
Gated-Attention Reader (*)
Carnegie Mellon University Dhingra et al., 2017 |
41.3 |
Feb 01, 2019 |
Stanford Attentive Reader (*)
Stanford University Chen et al., 2016 |
39.8 |
*: Run and reported by Sun et al., 2019.