Skip to content

NewsEye/Article-Separation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Article Separation

Description

Article separation (AS), also called article segmentation, is the process of dividing a newspaper page into its articles. So far, existing systems still need considerable input from human users to solve this task due to handcrafted rules that need parameter tuning to work well dependent on the layout of the newspaper page. In the context of the NewsEye project the aim is to create an automated workflow that is independant from any user input.

Workflow

The preceeding tasks 2.1 Layout Analysis (LA) and 2.2 Automated Text Recognition (ATR) in the NewsEye project provide geometrical information in the form of text lines / baselines and the corresponding transcription of the text. This information we want to use to solve the AS task by combining traditional LA methods, semantic information and machine learning based approaches.

Tasks 2.1 and 2.2 are processed and further developed in the Transkribus platform which has its roots in the FP7 Project tranScriptorium (2013-2015) and was further developed in the H2020 Project READ (2016-2019).

The Transkribus GitHub repository can be found at https://github.com/transkribus/.

Used models and algorithms

Data

To our knowledge, there is no general AS dataset on newspaper pages on which a comparison with other existing workflows is possible.

CITlab AS GitHub Repository:

The code for the article separation can be found in the following GitHub repository, which was updated for M45: https://github.com/CITlabRostock/citlab-article-separation-new

About

Article Separation for newspaper - WP2, Task 2.3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published