[insert period joke] …continued

Cross-language Image Analysis on Wikipedia

This research was part of the DMI Summer School project I did in week two: Menstrual Issues Across Language Spacess

What images are shared across language spaces on Wikipedia regarding menstruation?

> what content specific images are/aren’t shared across languages?
> what wikipedia specific images are/aren’t shared across languages?
> what do the images say about the status of the topic ‘menstruation’ on wikipedia across languages?

Astrid Bigoni | Zuzana Karascakova | Emily Stacey | Sarah McMonagle

with a big thanks to:
Federica Bardelli  (designs)| Giulia de Amicis (designs) | Han-Teng Liao (technical and linguistic advice)

Wikipedia Cross-Lingual Image Analysis Tool (DMI)


  • Run the Wikipedia Cross-Lingual Image Analysis on the topic of ‘menstruation’

  • Separate the content-specific and the wikipedia-specific images into two folders, two spreadsheets


  • find which images occur in more than one entry, save a hi-res version of the image and route all paths to this one image in the spreadsheet (similar images are often saved under different filenames.

  • Visualize what images are shared across which languages and which are unique: show what the user-generated content (images) says about the topic of menstruation across languages.


  • interpret what the images mean in each entry

  • group images with the same meaning, give them a label

  • visualize the image network, color code the labels to show what wikipedia (as a technical platform) says about the topic

Continue reading for findings and discussion

Infographic: Network of Wikipedia specific images across languages

Limitations and issues

  • The tool outputs a table with the language, name of the article, URL and the images found on that page. However, when you download the .csv file, it contains more links than are shown in the table. So it reads all the images and saves them in the .csv file, but it doesn’t scrape the images itself.

  • Many different images may be used to indicate the same thing on Wikipedia, conversely, the same image may indicate many different things on different pages. E.g. the gender symbols can mean both: “related to sexology”, “may not be suitable for people under 16”, or a link to the “women’s portal”. Because of this, dealing with the data requires a lot of manual transcoding, translating and interpreting. Which, for languages such as Kazakh, Tagalog and Malayan doesn’t always go so smoothly without a native speaker.

  • Grouping multiple images under the same label is an abstraction that helps to visualize the relationships, but some interesting details and nuanced differences get lost.


Content-specific images

  1. Only in Finnish, there’s an image of a tampon with applicator, although it’s very popular in many countries because of an apparent hygiene wave (women prefer to insert tampons with an applicator rather than by using their fingers)

  2. the Tannerscale, showing the phases of breast and pubic hair growth is only shown in Catalan: indicating a younger target group in mind perhaps? Hebrew article shows how tampons and diaphragms are inserted.

  3. only the indonesian article shows a girl in pain, (remember from the Khazakh article is asking for additional illustrations? maybe they want these)

  4. red cloth pad: reused 5 times, but is hardly used in many countries (perhaps in Norway as part of the eco movement? unlikely. Definitely not common in England.

Wikipedia-specific images

> the wikipedia specific images seem to say a lot more about the state of the topic ‘menstruation’ on wikipedia

  1. It’s both frequently incomplete and marked as one of the 1000 pages every wikipedia should have

  2. ibid

  3. See main article points to relatedness to other articles related to the topic. Can’t say this point to more elaborated topics in a certain language, because it doesn’t take into account how long the ‘menstruation’ article itself is already (in English the entry is both very complete, AND well-connected) but contains no images to indicate that. We would have to run an inlink analysis to find out about article ecologies.

  4. Medical disclaimer: might point to a rather pathological discourse around menstruation: if you need a warning, you take it very seriously, medically

  5. What does the category say about the discourse? In Welsh it is listed under disease, medical is most mentioned, sexology/anatomy only once. Is menstruation a medical condition?? >> medicalized discourse

  6. Uniques: not suitable <16 ?, Islamic version available, illustrations needed. Article of the week (esperanto?)

Further research and discussion:

How much of this is the work of bots?

Articles with no images are omitted: making an article ecology or a manual analysis would give more complete results

Gallery with content specific images: