Skip to content

mbernat/wiktionary-parser

Repository files navigation

Wiktionary parser

These are some tools to process wiktionary XML dumps.

To obtain such a dump go here and download enwiktionary-latest-pages-articles.xml.bz2.

NOTE that as of today (5 Nov 2019) the dump is 684MB compressed and 5.7GB uncompressed.

Usage

You need Esy, you can install the beta using npm:

% npm install -g esy@latest

NOTE: Make sure esy --version returns at least 0.5.4 for this project to build.

Then run the esy command from this project root to install and build dependencies.

% esy

Now you can run your editor within the environment (which also includes merlin):

% esy $EDITOR
% esy vim

Alternatively you can try vim-reasonml which loads esy project environments automatically.

After you make some changes to source code, you can re-run project's build again with the same simple esy command.

% esy

And test compiled executable (runs scripts.tests specified in package.json):

% esy test

Documentation for the libraries in the project can be generated with:

% esy doc
% esy open '#{self.target_dir}/default/_doc/_html/index.html'

Shell into environment:

% esy shell

Create Prebuilt Release:

esy allows creating prebuilt binary packages for your current platform, with no dependencies.

% esy npm-release
% cd _release
% npm publish

About

Tools to deal with wiktionary xml dumps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published