Skip to content

hopper-project/wikify

Repository files navigation

wikify

Prerequisites: Python 3.4+, pyahocorasick (pip3 install pyahocorasick)

This folder and its associated scripts can be used for the following pipeline:

  • Processing of wikipedia (preprocessor.py)
    • Input: Wikipedia XML dump
    • Output: Extracted XML dumps of math articles
  • Generation of metadata (extractor.py)
    • Input: XML dumps generated by preprocessor.py
    • Output: data.p, ranks.p
  • Generation of topranks.tsv file (repickler.py)
    • Input: Directory containing data.p
    • Output: topranks.tsv
  • Bibdoc wikification (bibdoc_wikifier.py)
    • Input: Directory containing topranks.tsv, LaTeX files
    • Output: One correspondingly named TSV/document containing conceptually relevant article titles from Wikipedia, and corresponding metrics (see bibdoc_wikifier.py for details)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages