Using Artificial Intelligence to Fix Wikipedia’s Gender Problem

A software program from Primer scours news articles and scientific journals for women scientists who don’t have entries in the online encyclopedia.

Spread the love

Miriam Adelson is an accomplished physician who has published around a hundred research papers on the physiology and treatment of addiction. She also runs a high-profile substance-abuse clinic in Las Vegas. Oh, and she’s the publisher of Israel’s largest newspaper and, with her billionaire husband Sheldon, a philanthropist and influential Republican party donor.

Yet Wikipedia does not have an entry for her.

Adelson was among thousands of names flagged by Quicksilver, a software tool by San Francisco startup Primer designed to help Wikipedia editors fill in blind spots in the crowdsourced encyclopedia. Its underrepresentation of women in science is a particular target. The world’s fifth-most-visited website has a long-running problem with gender bias: Only 18 percent of its biographies are of women. Surveys estimate that between 84 and 90 percent of Wikipedia editors are male.

Quicksilver uses machine-learning algorithms to scour news articles and scientific citations to find notable scientists missing from Wikipedia, and then write fully sourced draft entries for them. The draft for Miriam Adelson looks like this:

Miriam Adelson is a doctor and chairman of The Dr. Miriam & Sheldon G. Adelson Clinic for Drug Abuse Treatment and Research.[1] With her husband, Sheldon Adelson, she owns the Las Vegas Review-Journal and Israel Hayom.[2] She was listed by Forbes in June 2015 as having a fortune of $28 billion, making him[sic] the 18th richest person in the world.[3] She has frequently been cited in media reports as the newspaper’s owner, including by JTA.[4]

Quicksilver has already produced 40,000 summaries like that—some are longer and minor glitches are the norm—for both men and women scientists missing from Wikipedia. Primer released a sample of 100 today. The bot doesn’t automatically add its output to Wikipedia. Rather, the summaries it generates are intended to provide a starting point for Wikipedia editors, who can clean up errors and check the sources to prevent any algorithmic slip-ups contaminating the site.

John Bohannon, who led work on Quicksilver at Primer, says the humans toiling to tend Wikipedia need some algorithmic help to make significant progress on filling in the project’s sizeable lacunae. “We can accelerate their production,” he says.

It’s early, but Primer’s software has begun to have an impact. Jessica Wade, a physicist at Imperial College London, got a preview of Quicksilver from Bohannon. She was prompted to write an entry for Joëlle Pineau, head of Facebook’s Montreal AI lab, who Quicksilver noted was missing from the site. “Wikipedia is incredibly biased and the underrepresentation of women in science is particularly bad,” says Wade, who personally added nearly 300 women scientists to the site over the past year. “With Quicksilver, you don’t have to trawl around to find missing names, and you get a huge amount of well-sourced information very quickly.”

Quicksilver can also help editors keep existing Wikipedia articles up to date. An early version was tested in New York City this spring at an edit-a-thon aimed at improving entries on women scientists hosted by the American Museum of Natural History. Quicksilver provided facts it had scraped from the web, including links to the sources, on women scientists with sparse Wikipedia bios. Maria Strangas, the museum researcher who organized the event, says it helped the 25 first-time editors update the pages for roughly 70 women scientists in just two hours. “It magnified the effect that event had on Wikipedia,” Strangas says.

Quicksilver is a spinoff from tools and data that Primer uses to serve clients including US intelligence agencies and large finance companies. The startup offers software that ingests internal or external data—think news feeds or internal documents—and generates graphics or written reports.

Primer’s project began when Bohannon met Wade and others trying to improve the representation of women on Wikipedia at a conference last year, and began to wonder if algorithms could help. He later took advice from Wikimedia Foundation, the nonprofit that hosts Wikipedia.

The first step was to collect 30,000 Wikipedia articles about scientists to train algorithms to detect the signals in news articles that correlate with a researcher having an entry on the site. Quicksilver uses that knowledge to find notable missing names by cross-referencing existing Wikipedia entries against a list of 200,000 scientific authors drawn from an academic search engine called Semantic Scholar. The software sources the facts needed to write missing entries from a collection of 500 million news articles and feeds them into a system trained to generate biographical entries from past examples.

Quicksilver is far from the first attempt to have machines make Wikipedia’s ambitious aims more tractable. Bots already automatically fix typos or vandalism, for example. Wikimedia Foundation is investigating how to automatically fill out Wikipedia by drawing on articles that exist in one language but not in others.

Primer is working to make Quicksilver multilingual too, initially expanding into Russian and Chinese, and to expand it to cover other subjects, such as politicians. But it doesn’t plan to ever let Quicksilver autonomously add to the site. “There are always humans in the loop,” says Sean Gourley, Primer’s CEO. “This project is about asking, How do you best use the finite number of engaged humans that you have?”

Wikipedia’s notoriously punctilious community will likely keep a close eye on content generated with Quicksilver’s help. One question is whether this tool aimed at fixing blind spots has any blind spots of its own. Wade has already noticed that the tool’s suggestions seem skewed towards Americans, echoing a shortcoming of Wikipedia itself. “We need to be super careful that we’re not passing on whatever biases are in that machine-learning system,” says Wade.


More Great WIRED Stories

Facebook Comments
Spread the love

Posted by News Monkey