-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up canonical.json? #2243
Comments
It might be worth going with the second option so things are easier to organize later on when a lot more companies are inevitably added to the database. Plus, it already takes a lot of time to get through a single category like banks. So the less we have to sift through things that aren't related to a specific category that's being focused on or that already have information the better. Maybe it would allow for easier tagging projects based on the database at some point also. |
Also.. burying the lede a bit, but this project can expand to do more than just brands. I'm thinking of maybe doing other stuff with it too, like:
|
From a local data maintainer views - I prefer separated country files:
Pros:
Cons:
If no country separation - I need some excluding solution for country codes. ( |
Sorry @ImreSamu but I don't see us ever splitting this repo by country. The cons you listed above are pretty significant. Anyway wikidata is not split up by country - it's universal. |
I think nesting by key/value makes the most sense since that's essentially how the entries are grouped already - which means there's potential to reduce the amount of repetition induced by the
This has also crossed my mind once or twice
I do wonder if there's a good way we could mirror this and capture multinational brand information in a single entry 🤔 but, that's a thought for another time |
Got bored tonight and wrote a quick python script that will split canonical into sub-directories: import json
import os
root = 'config/brands/'
files = {}
with open('config/canonical.json', encoding="utf8") as canonical:
data = json.load(canonical)
for path in data:
tagging, name = path.split('|')
key, value = tagging.split('/')
files.setdefault(key,{}).setdefault(value,{}).update({name:data[path]})
for key in files:
os.makedirs(root + key, exist_ok=True)
for value in files[key]:
with open(root + key + "/" + value + ".json", "w", encoding="utf8") as out:
json.dump(files[key][value], out, ensure_ascii=False, indent=2) |
For me it is easier to edit a single file. What kind of problems it presents? It is not so large. |
Its pretty slow to load when you click on the file in github and sometimes it will only load half of it. At least for me and I have a pretty good computer/internet. Its also not exactly to browse through and its really easy to lose your place. The only way I can reliably edit stuff is by using search in whatever code editor I'm using. There would probably be other benefits to spliting it up to. Off the top of my head, it would allow anyone else that uses the index to more easily only include the poi types they want in their software. Modularity in general is a good thing. |
It would be very beneficial to have the data split in the key/value method shown in option 2. I just updated the allnames file locally and the updates to the count values in canonical.json so numerous that it crashes both SourceTree and Atom. It would be a lot easier to manage large changes like this with smaller files in the future. |
Just a heads up that I'm going to start splitting up All the PRs are merged or cancelled so there shouldn't be any huge conflicts from doing this. |
I just did this! I've updated the documentation too, but let me know if anything is confusing or broken.. There are still around 600 or so "Research Needed" issues open that contain old instructions, but I hope people will be able to figure it out. |
canonical.json
is about 34000 lines now, and the file is starting to get kind of cumbersome to edit.Should we split it up?
Maybe create a folder hierarchy like:
brands/
amenity.json
shop.json
or nest another level by key/value:
brands/
amenity/
bank.json
fuel.json
pharmacy.json
shop/
car.json
car_repair.json
chemist.json
clothes.json
The text was updated successfully, but these errors were encountered: