-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit Request: sudachidict-{core,full} - {75, 160MB} #131
Comments
Hi @sorami! It appears that the projects you've linked to (and also https://pypi.org/project/SudachiDict-small/) do not include any Python code. They just package a data file. (Note how $ unzip -l ./SudachiDict_small-20191030-py3-none-any.whl
Archive: ./SudachiDict_small-20191030-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
0 11-01-2019 09:47 sudachidict_small/__init__.py
122041043 11-01-2019 09:47 sudachidict_small/resources/system.dic
...
--------- -------
122056130 7 files Unfortunately, distributing large data files isn't what PyPI is intended for. In addition, large packages stress PyPI's infrastructure as well as that of its mirrors, and they tend to produce a poor user experience. That's especially true if the large package is updated frequently. Since you seem to already have a mechanism for distributing these data files (as witnessed by the links to ZIP files on the project pages) I would encourage leveraging that instead. There are a few ways in which that's commonly done. One is to have your package on PyPI simply include a command that the user is expected to run in order to download or update the data; this command could be a module or a setuptools console entry point which the user would run as Another common approach is to have your # setup.py
import os
import urllib.request
import setuptools
if not os.path.exists('path/to/data'):
with urllib.request.urlopen('https://location/of/data') as src:
with open('path/to/data', 'wb') as dest:
dest.write(src.read())
setuptools.setup(…) I hope this helps. |
Hi @jamadden , thank you very much for your reply! I see, I understand that the PyPI is not intended for hosting large data files. Thank you very much for a detailed explanation of how we can distribute the files in other ways. Let us consider these approaches. |
Project
Size of release
Which indexes
Both PyPI and Test PyPI.
Reasons for the request
Sudachi is a Japanese natural language processing tool. These packages include a large number of vocabulary information for the language analysis, therefore the binary size becomes large.
We regularly update this language resource (update every few months), however, we add new vocabulary but also refine and remove some vocabularies, therefore we believe it won't exceed the above size limit in the future.
The text was updated successfully, but these errors were encountered: