Twitter Crop Emoji Bias

The Twitter app for Android allows users to edit images before being tweeted, allowing users to apply filters, add an alternative description or simply add stickers. When a user wants to add a sticker, the application asks the user to choose from a fixed set of images developed by Twitter itself named "Twemoji" 1.

When a user adds a sticker from the app, its content is rasterized on the original image, and the final result is then posted to the user's timeline. Since the emoji is rasterized, its content is part of the image itself therefore it is visible to backend algorithms (e.g: by the cropping algorithm) that may behave incorrectly because of this new content. Furthermore, the Twemoji set, contains many versions of the same person-related image (e.g: a smiling face) representing the same content under many skin tones (light, medium-light, medium, medium-dark, dark) trying to be fair on representing all skin tones present in the world. Therefore, an incorrect behavior of an algorithm related to a specific set of emojis may be interpreted as racial discrimination, causing harm to users or to Twitter itself.

The notebook emoji-bias.ipynb demonstrates that when asked the image-crop-algorithm used by Twitter 2 to choice which emoji to crop out of two, it is biased towards light skin tone emoji. In particular, the following chart, shows how light skin tone emoji have +17% probability to be chosen more than the expected baseline, and that in general the lighter the emoji, the more likely to be picked:

Methodology: all Twitter emoji have been downloaded from 2 filtering only the ones with specified skin tone, then random "vertical collages" of pair of emoji have been syntethized (~50000) and processed by the cropping algorithm. Results were analyzed computing the probability that each skin tone is preferred (i.e: where the saliency point lies) over the other and using confidence intervals to assest significance.

Note:

I've noticed that emoji are rasterized when posted; anyway, since the automatic cropping is now disabled on Twitter 3, I can't be sure that the input of the tool was the image+emoji.
This analysis is not based on real data (i.e: created from the official Twitter App) but it is synthetized.
The natural next step should be to check if this bias disappear when emoji are applied to natural images. It is likely that the algorithm will prefer pattern of natural looking images (e.g: real faces) rather than emoji.
This analysis lacks other experiments to understand why this happen, but it is likely that light-skin-tone images have a better contrast. This may explain why dark-skin-tone are preferred over medium-skin-tone.
The risk of harm is extremely low, but can be interesting to reason if it is right to consider stickers (that are icons with a meaning) in the process of saliency detection or apply them later.

Data:

experiment_results.csv: each rows specify a pair of emoji, the last column specify which emoji was preferred by the cropping algorithm.
twitter_emoji_pairs_experiment.csv: same of experiment_results.csv without the result of the cropping algorithm.
twitter_emoji_collage: folder containing the ~50000 synthetized pair emoji (JPEG) and the output of the cropping model (TXT). Only the .txt are shared in the repo.
twemoji.html and twemoji_links.txt: web page of 2 used and links of the emoji extracted from the page.

Following the notebook should be sufficient to obtain the data and reproduce the results.

Self Evaluation

Type of Harm: unintentional underrepresentation (20 points), but it can also be intentional since the only tool required is the native Android client.
Multiplier Factors:
- Damage or impact: Low impact on a person’s well-being, harm is measured along a single axis of identity and disproportionally affects a marginalized community (avg (1, 1.2) = 1.15)
- Affected Users: This may impact any user of Twitter (Affective User Score = 1.2)
- Likelihood or Exploitability: Extremely rare but it could occur on Twitter (Likelihood = 1.0)
- Exploitability: No programming skills are needed; automated exploit tools exist (Exploitability = 1.3)
- Clarity: The submission included detailed instructions and notebooks but lacks convincing evidence regarding the metodology used (Clarity = 1.0)
- Justification: Findings were considered strong but could have been improved by doing better analysis, expecially using data created by the client of Twitter rather than synthetized. The submission contained limited details about how harms impacted affected people. (Justification = 1.0)
- Creativity: Did not qualify for additional creativity.
Result: 20 x 1.15 x 1.2 x 1 x 1.3 x 1 x 1 = 35.87

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
twitter_emoji_collage		twitter_emoji_collage
README.md		README.md
d_l.jpeg		d_l.jpeg
emoji-bias.ipynb		emoji-bias.ipynb
experiment_results.csv		experiment_results.csv
fastgaze_for_parallel.sh		fastgaze_for_parallel.sh
l_d.jpeg		l_d.jpeg
requirements.txt		requirements.txt
result_aggr.png		result_aggr.png
result_pairs.png		result_pairs.png
twemoji.html		twemoji.html
twemoji_links.txt		twemoji_links.txt
twitter_emoji_pairs_experiment.csv		twitter_emoji_pairs_experiment.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Crop Emoji Bias

Self Evaluation

About

Releases

Packages

Languages

0xNaN/twitter-crop-bias

Folders and files

Latest commit

History

Repository files navigation

Twitter Crop Emoji Bias

Self Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages