Dr. Christian M. Meyer

Beyond Generic Summarization:

A Multi-faceted Hierarchical Summarization Corpus of Large Heterogeneous Data

Abstract. Automatic summarization has so far focused on datasets of ten to twenty rather short documents of mostly news articles. But automatic systems could in theory analyze hundreds of documents from a range of sources and provide an overview to the interested reader. Such a summary would ideally present the most general issues in a specific topic and allow for more in-depth in­for­ma­tion on specific aspects within said topic. In this paper, we present a new approach for creating hierarchical summarization corpora by first, extracting relevant content from large, heterogeneous document collections using crowdsourcing and second, ordering the relevant in­for­ma­tion hierarchically by trained annotators. Our resulting corpus can be used to develop and evaluate hierarchical summarization systems.

Submitted: 02.10.2017 | Published: 09.05.2018
Crowdsourced summarization task.
Crowdsourced summarization task.