What is DREAM?
DREAM is a multiple-choice Dialogue-based REAding comprehension exaMination dataset. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding.
DREAM contains 10,197 multiple choice questions for 6,444 dialogues, collected from English-as-a-foreign-language examinations designed by human experts. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.
Report Your Results
If you have new results, please send an email to dream@dataset.org with the link to your paper!
Leaderboard
Report Time | Model | Accuracy |
---|---|---|
Human Ceiling Performance
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
98.6 | |
Human Performance
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
95.5 | |
Oct 01, 2019 |
RoBERTa-Large + MMM
MIT & Amazon Alexa AI Jin et al., 2019 |
88.9 |
Jul 21, 2019 |
XLNet-Large
River Valley High School, Singapore https://github.com/NoviScl/XLNet_DREAM |
72.0 |
Apr 25, 2019 | BERT-Large https://github.com/nlpdata/mrc_bert_baseline | 66.8 |
Apr 23, 2019 | BERT-Base https://github.com/nlpdata/mrc_bert_baseline | 63.2 |
Feb 01, 2019 |
GBDT++ and FTLM++ (ensemble)
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
59.5 |
Feb 23, 2019 |
EER + FT
Tencent & TTIC & Cornell & UPenn Wang et al., 2019 |
57.7 |
Feb 01, 2019 |
FTLM++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
57.4 |
Feb 01, 2019 |
Finetuned Transformer LM (*)
OpenAI Radford et al., 2018 |
55.5 |
Feb 01, 2019 |
GBDT++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
52.8 |
Feb 01, 2019 |
DSW++
Tencent & Cornell & UW & AI2 Sun et al., 2019 |
50.1 |
Feb 01, 2019 |
Co-Matching (*)
Singapore Management University & IBM Research Wang et al., 2018 |
45.5 |
Feb 01, 2019 |
Distance-Based Sliding Window (*)
Microsoft Research Richardson et al., 2013 |
44.6 |
Feb 01, 2019 |
Sliding Window (*)
Microsoft Research Richardson et al., 2013 |
42.5 |
Feb 01, 2019 |
Word Matching (*)
Microsoft Research Yih et al., 2013 |
42.0 |
Feb 01, 2019 |
Gated-Attention Reader (*)
Carnegie Mellon University Dhingra et al., 2017 |
41.3 |
Feb 01, 2019 |
Stanford Attentive Reader (*)
Stanford University Chen et al., 2016 |
39.8 |
*: Run and reported by Sun et al., 2019.