Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brand names are case sensitive #678

Open
odin-h opened this issue Jul 5, 2024 · 3 comments
Open

Brand names are case sensitive #678

odin-h opened this issue Jul 5, 2024 · 3 comments
Labels
🐛 bug Something isn't working Data quality

Comments

@odin-h
Copy link
Collaborator

odin-h commented Jul 5, 2024

What

The same brand will give different results based on if you capitalise the name or not in the URL:
https://prices.openfoodfacts.org/app/brands/Samyang (130 products)
https://prices.openfoodfacts.org/app/brands/samyang (6 products)

Expected behavior

All the products should be grouped together regardless of the casing used in the URL.

Platform (Desktop, Mobile, Hunger Games)

  • OS: Windows 10
  • Platform Desktop

Part of

@odin-h odin-h added the 🐛 bug Something isn't working label Jul 5, 2024
@raphodn
Copy link
Member

raphodn commented Aug 17, 2024

Yeah I'm pushing to have a brands taxonomy so that it would be much easier/cleaner to navigate around them 🙏

See this issue in the server to track advancement : openfoodfacts/openfoodfacts-server#5208

@raphodn
Copy link
Member

raphodn commented Aug 19, 2024

Kinda linked to the issue, I just did a quick analysis to see how many product prices we had for each brand.

Top 100 brands on OFF fr : https://fr.openfoodfacts.org/brands

disclaimer : lots of errors due to missing accents, apostrophes, hyphen.. another reason to have a brand taxonomy 😍

// Brand : Number of products with at least 1 price in OP / Number of products in OFF
U : 629 / 9654
Carrefour : 732 / 9272
Marque-repere : 1 / 7261
Casino : 47 / 6514
Auchan : 1083 / 6478
Nestle : 18 / 5678
Leader-price : 2 / 5042
Le-gaulois : 1 / 4286
Monoprix : 174 / 4271
Cora : 15 / 4211
Lidl : 103 / 4032
Picard : 43 / 3466
Leclerc : 29 / 2666
Thiriet : 0 / 2665
Marks-spencer : 0 / 2467
La-vie-claire : 5 / 2403
Danone : 69 / 2210
Franprix : 8 / 2208
E-leclerc : 0 / 2145
Belle-france : 24 / 2030
Haribo : 46 / 1894
Bio-village : 34 / 1879
Fleury-michon : 43 / 1782
Netto : 33 / 1768
Knorr : 52 / 1749
Maitre-coq : 1 / 1672
Sans-marque : 0 / 1654
Charal : 15 / 1639
Coop : 221 / 1584
Delhaize : 1 / 1558
Yoplait : 36 / 1555
Lindt : 64 / 1469
Intermarche : 0 / 1465
Bonduelle : 53 / 1396
Myprotein : 0 / 1358
La-nouvelle-agriculture : 1 / 1275
Hema : 3 / 1258
Lipton : 26 / 1235
Savencia : 65 / 1193
U-bio : 35 / 1139
Prozis : 0 / 1130
Palais-des-thes : 0 / 1118
Jardin-bio : 41 / 1114
Milka : 55 / 1108
Ferrero : 107 / 1102
Migros : 10 / 1101
Biocoop : 36 / 1095
Andros : 31 / 1083
Maggi : 49 / 1050
Kellogg-s : 0 / 1041
Bonne-maman : 49 / 1033
Tesco : 3 / 1023
Carrefour-bio : 155 / 1010
Lu : 145 / 1003
Moulin-des-moines : 6 / 972
Monoprix-gourmet : 11 / 954
Nos-regions-ont-du-talent : 1 / 943
Aldi : 77 / 924
Panzani : 95 / 917
Dia : 3 / 912
U-saveurs : 10 / 907
Bledina : 1 / 901
Herta : 54 / 898
Labeyrie : 10 / 887
Heinz : 32 / 881
Casino-bio : 17 / 870
Lea-nature : 1 / 867
Delpeyrat : 1 / 864
Le-comptoir-de-mathilde : 0 / 850
Bjorg : 75 / 838
Milbona : 34 / 837
Coca-cola : 5 / 823
Barilla : 94 / 817
Ducros : 34 / 811
Lustucru : 48 / 806
Toupargel : 0 / 794
Primeal : 9 / 783
Unilever : 54 / 766
Dr-oetker : 1 / 764
Deluxe : 2 / 758
Materne : 4 / 751
Eco : 64 / 748
Paturages : 17 / 731
Grand-jury : 0 / 719
Auchan-bio : 77 / 705
Mondelez : 131 / 694
Saint-alby : 1 / 694
Lucien-georgelin : 10 / 688
Kinder : 76 / 682
St-michel : 40 / 676
Bonneterre : 4 / 675
Volae : 2 / 671
Tendre-plus : 0 / 669
Tipiak : 25 / 667
Rochambeau : 0 / 664
Monoprix-bio : 20 / 664
Metro-chef : 0 / 636
Alnatura : 22 / 630
Gerble : 0 / 621
Tropicana : 18 / 616
import requests
import os
import json
import time

// load https://fr.openfoodfacts.org/brands.json inside a "brands.fr.json"

with open(os.path.join(os.path.dirname(__file__), "brands.fr.json"), "r") as file:
        brands = json.load(file)
        for brand in brands["tags"]:
            brand_name = brand["name"]
            brand_name_without_hyphen = brand_name.replace("-", " ").title()
            brand_products_off_count = brand["products"]
            response = requests.get(f"https://prices.openfoodfacts.org/api/v1/products?&page=1&size=10&brands__like={brand_name_without_hyphen}&order_by=-price_count&price_count__gte=1")
            brand_products_op_count = response.json()["total"]
            print(brand_name, ":", brand_products_op_count, "/", brand_products_off_count)
            time.sleep(1)

@raphodn
Copy link
Member

raphodn commented Aug 20, 2024

Some changes after openfoodfacts/open-prices#390

u : 255 / 9654
carrefour : 656 / 9272
marque-repere : 184 / 7261
casino : 42 / 6514
auchan : 931 / 6478
nestle : 182 / 5678
leader-price : 2 / 5042
le-gaulois : 3 / 4286
monoprix : 163 / 4271
cora : 12 / 4211
lidl : 104 / 4032
picard : 44 / 3466
leclerc : 17 / 2666
thiriet : 0 / 2665
marks-spencer : 3 / 2467
la-vie-claire : 5 / 2403
danone : 72 / 2210
franprix : 8 / 2208
e-leclerc : 3 / 2145
belle-france : 24 / 2030
haribo : 48 / 1894
bio-village : 35 / 1879
fleury-michon : 46 / 1782
netto : 26 / 1768
knorr : 52 / 1749
maitre-coq : 9 / 1672
sans-marque : 32 / 1654
charal : 15 / 1639
coop : 210 / 1584
delhaize : 1 / 1558
yoplait : 39 / 1555
lindt : 58 / 1469
intermarche : 81 / 1465
bonduelle : 53 / 1396
myprotein : 0 / 1358
la-nouvelle-agriculture : 3 / 1275
hema : 3 / 1258
lipton : 27 / 1235
savencia : 64 / 1193
u-bio : 39 / 1139
prozis : 0 / 1130
palais-des-thes : 0 / 1118
jardin-bio : 126 / 1114
milka : 57 / 1108
ferrero : 103 / 1102
migros : 7 / 1101
biocoop : 37 / 1095
andros : 30 / 1083
maggi : 43 / 1050
kellogg-s : 34 / 1041
bonne-maman : 61 / 1033
tesco : 3 / 1023
carrefour-bio : 202 / 1010
lu : 128 / 1003
moulin-des-moines : 54 / 972
monoprix-gourmet : 10 / 954
nos-regions-ont-du-talent : 24 / 943
aldi : 74 / 924
panzani : 99 / 917
dia : 1 / 912
u-saveurs : 11 / 907
bledina : 6 / 901
herta : 57 / 898
labeyrie : 10 / 887
heinz : 32 / 881
casino-bio : 17 / 870
lea-nature : 96 / 867
delpeyrat : 1 / 864
le-comptoir-de-mathilde : 0 / 850
bjorg : 80 / 838
milbona : 35 / 837
coca-cola : 60 / 823
barilla : 93 / 817
ducros : 34 / 811
lustucru : 48 / 806
toupargel : 0 / 794
primeal : 16 / 783
unilever : 49 / 766
dr-oetker : 16 / 764
deluxe : 1 / 758
materne : 9 / 751
eco : 46 / 748
paturages : 49 / 731
grand-jury : 0 / 719
auchan-bio : 72 / 705
mondelez : 120 / 694
saint-alby : 1 / 694
lucien-georgelin : 14 / 688
kinder : 76 / 682
st-michel : 45 / 676
bonneterre : 4 / 675
volae : 2 / 671
tendre-plus : 0 / 669
tipiak : 27 / 667
rochambeau : 0 / 664
monoprix-bio : 35 / 664
metro-chef : 0 / 636
alnatura : 22 / 630
gerble : 33 / 621
tropicana : 18 / 616
import requests
import os
import json
import time

// load https://fr.openfoodfacts.org/brands.json inside a "brands.fr.json"

with open(os.path.join(os.path.dirname(__file__), "brands.fr.json"), "r") as file:
        brands = json.load(file)
        for brand in brands["tags"]:
            brand_id = brand["id"]
            brand_products_off_count = brand["products"]
            response = requests.get(f"https://prices.openfoodfacts.org/api/v1/products?&page=1&size=10&brands_tags__icontains={brand_id}&order_by=-price_count&price_count__gte=1")
            brand_products_op_count = response.json()["total"]
            print(brand_id, ":", brand_products_op_count, "/", brand_products_off_count)
            time.sleep(1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working Data quality
Projects
Status: Backlog
Development

No branches or pull requests

2 participants