wikify

Prerequisites: Python 3.4+, pyahocorasick (pip3 install pyahocorasick)

This folder and its associated scripts can be used for the following pipeline:

Processing of wikipedia (preprocessor.py)
- Input: Wikipedia XML dump
- Output: Extracted XML dumps of math articles
Generation of metadata (extractor.py)
- Input: XML dumps generated by preprocessor.py
- Output: data.p, ranks.p
Generation of topranks.tsv file (repickler.py)
- Input: Directory containing data.p
- Output: topranks.tsv
Bibdoc wikification (bibdoc_wikifier.py)
- Input: Directory containing topranks.tsv, LaTeX files
- Output: One correspondingly named TSV/document containing conceptually relevant article titles from Wikipedia, and corresponding metrics (see bibdoc_wikifier.py for details)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
core		core
post_process		post_process
README.md		README.md
beautifulsoup4-4.1.0.tar.gz		beautifulsoup4-4.1.0.tar.gz
bibcode_wikifier_w_keywords.py		bibcode_wikifier_w_keywords.py
bibdoc_wikifier.py		bibdoc_wikifier.py
extractor.py		extractor.py
preprocessor.py		preprocessor.py
repickler.py		repickler.py
tester.py		tester.py

Provide feedback