DREAM

A Challenge Dataset and Models for Dialogue-Based Reading Comprehension


What is DREAM?

DREAM is a multiple-choice Dialogue-based REAding comprehension exaMination dataset. In contrast to existing reading comprehension datasets, DREAM is the first to focus on in-depth multi-turn multi-party dialogue understanding.


DREAM contains 10,197 multiple choice questions for 6,444 dialogues, collected from English-as-a-foreign-language examinations designed by human experts. DREAM is likely to present significant challenges for existing reading comprehension systems: 84% of answers are non-extractive, 85% of questions require reasoning beyond a single sentence, and 34% of questions also involve commonsense knowledge.

Sample Problem

Dialogue

W: Tom, look at your shoes. How dirty they are! You must clean them.

M: Oh, mum, I just cleaned them yesterday.

W: They are dirty now. You must clean them again.

M: I do not want to clean them today. Even if I clean them today, they will get dirty again tomorrow.

W: All right, then.

M: Mum, give me something to eat, please.

W: You had your breakfast in the morning, Tom, and you had lunch at school.

M: I am hungry again.

W: Oh, hungry? But if I give you something to eat today, you will be hungry again tomorrow.

Q1 Why did the woman say that she wouldn’t give him anything to eat?

A. Because his mother wants to correct his bad habit. ✔

B. Because he had lunch at school.

C. Because his mother wants to leave him hungry.

Report Your Results

If you have new results, please send an email to dream@dataset.org with the link to your paper!

Sibling Dataset

Check C3 (https://dataset.org/c3) if you are interested in free-form multiple-choice machine reading comprehension for Chinese.

Leaderboard

Report Time Model Accuracy
Human Ceiling Performance

Tencent & Cornell & UW & AI2

Sun et al., 2019
98.6
Human Performance

Tencent & Cornell & UW & AI2

Sun et al., 2019
95.5
Jun 20, 2022 ALBERT-xxlarge + HRCA+ + Multi-Task Learning

Waseda University

Zhang and Yamana, 2022
92.6
Feb 26, 2020 ALBERT-xxlarge + DUMA + Multi-Task Learning

IBM Research AI

Wan et al., 2020
91.8
Jun 20, 2022 ALBERT-xxlarge + HRCA+

Waseda University

Zhang and Yamana, 2022
91.6
Feb 05, 2020 ALBERT-xxlarge + DUMA

SJTU & Huawei Noah’s Ark Lab

Zhu et al., 2020
90.4
Nov 07, 2021 ALBERT-xxlarge + DUMA + Retraining

CAS & UCAS & China Merchants Bank

Ju et al., 2021
90.2
Nov 07, 2021 ALBERT-xxlarge + Retraining

CAS & UCAS & China Merchants Bank

Ju et al., 2021
90.0
Jul 14, 2021 ALBERT-xxlarge + RekNet

Shanghai Jiao Tong University

Zhao et al., 2021
89.6
Oct 01, 2019 RoBERTa-Large + MMM

MIT & Amazon Alexa AI

Jin et al., 2019
88.9
Jul 21, 2019 XLNet-Large

River Valley High School, Singapore

https://github.com/NoviScl/XLNet_DREAM
72.0
Dec 19, 2019 BERT-Large + WAE

The Hong Kong University of Science and Technology

Kim et al., 2020
69.0
Apr 25, 2019 BERT-Large

https://github.com/nlpdata/mrc_bert_baseline
66.8
Dec 19, 2019 BERT-Base + WAE

The Hong Kong University of Science and Technology

Kim et al., 2020
64.7
Apr 23, 2019 BERT-Base

https://github.com/nlpdata/mrc_bert_baseline
63.2
Feb 01, 2019 GBDT++ and FTLM++ (ensemble)

Tencent & Cornell & UW & AI2

Sun et al., 2019
59.5
Feb 23, 2019 EER + FT

Tencent & TTIC & Cornell & UPenn

Wang et al., 2019
57.7
Feb 01, 2019 FTLM++

Tencent & Cornell & UW & AI2

Sun et al., 2019
57.4
Feb 01, 2019 Finetuned Transformer LM (*)

OpenAI

Radford et al., 2018
55.5
Feb 01, 2019 GBDT++

Tencent & Cornell & UW & AI2

Sun et al., 2019
52.8
Feb 01, 2019 DSW++

Tencent & Cornell & UW & AI2

Sun et al., 2019
50.1
Feb 01, 2019 Co-Matching (*)

Singapore Management University & IBM Research

Wang et al., 2018
45.5
Feb 01, 2019 Distance-Based Sliding Window (*)

Microsoft Research

Richardson et al., 2013
44.6
Feb 01, 2019 Sliding Window (*)

Microsoft Research

Richardson et al., 2013
42.5
Feb 01, 2019 Word Matching (*)

Microsoft Research

Yih et al., 2013
42.0
Feb 01, 2019 Gated-Attention Reader (*)

Carnegie Mellon University

Dhingra et al., 2017
41.3
Feb 01, 2019 Stanford Attentive Reader (*)

Stanford University

Chen et al., 2016
39.8

*: Run and reported by Sun et al., 2019.