Skip to content

The results of toy example summaries and DUC corpus evaluated with different python libraries based on ROUGE metric

Notifications You must be signed in to change notification settings

Sandyguh04/NLP_summarization

Repository files navigation

Evaluating automatic summarization with different ROUGE libraries

We use different libraries of ROUGE, most of them implemented in python, to show the result scores of different experiments. Despite the fact that in the description library said that are a similar implementation of the original ROUGE measure, the scores are different for each library. The setting for the libraries used are described in "Evaluation settings of libraries" folder.

We begin with very simple toy examples and proposed to form sentences with repeated words in order to observe changes between document and its ROUGE measure. For the simple toy examples we used sentences extracted from DUC 2002 corpus. The results of these examples are in the "Toy examples" folder.

Additionally, we calculated for each experiment ROUGE measure by hand and our conclusion is that the library that obtains the same result is pyrouge. These are preliminary conclusions, so we presented experiment results using DUC corpora.

Finally, to our knowledge, pyrouge is the only library, python-compatible, that is able to obtain the same results that the standard ROUGE metric. For future works, we recommend to using the pyrouge library as ROUGE metric evaluation or its original Perl-based implementation and specify the evaluation settings.

About

The results of toy example summaries and DUC corpus evaluated with different python libraries based on ROUGE metric

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published