Is AI Research a ‘Cesspit’?

Matthew Hutson
7 min readApr 30, 2021


  • Computer scientists worry about the social impacts of AI, which can invade privacy or write fake news.
  • Many think peer review should consider such risks alongside scientific merit.
  • Others believe scientists and engineers should spread information freely.

Recently The New Yorker published a feature I wrote on AI ethics, specifically about whether peer review at artificial intelligence conferences (the main venues for publication in the field) should consider social impact in addition to scientific and technical merit. Its risks include invading privacy and producing fake content. Some of what I wrote was trimmed for space. I’d like to share parts of that material now.

Papers beyond those mentioned in the article have caused controversy. Last May, Harrisburg University posted a press release announcing an algorithm that could purportedly predict criminality from faces. The internet soon reacted (one tweet called it “Phrenology 2.0”). In June, hundreds of researchers across disciplines signed an open letter asking that a paper describing the work not be published.

The paper had been provisionally accepted to a conference called CSCE, but ultimately rejected. (The authors of the open letter had apparently not checked on its publication status.) Hamid Arabnia, a computer scientist at The University of Georgia and the conference’s main coordinator, told me they’d also rejected a paper that proposed technologies for cyber-harassment. The authors hoped other researchers would develop defenses. “We decided that if this paper is read by people who tend to be antisocial and suffer from low integrity, they’re going to utilize it, and attack others,” he said. “So we ended up rejecting that paper, not because the technical content of it was flawed-in fact, it was a pretty good paper-but due to our concerns we rejected it, in a nice way.” The authors agreed with those concerns. Most scientists are aware of ethical issues, Arabnia said. “But unfortunately, most of us are not formally trained in that area. Therefore, we use our common sense and judgment calls. We could do better.”

Even knowing when to raise questions about a paper can require a reviewer to have a certain fluency with the relevant issues. Alex Hanna, a sociologist at Google who studies AI ethics, recommends ethics training for all AI researchers, and not just a weeklong workshop. “Ethics education really needs to be integrated from the get-go,” she told me. As a step in that direction, the Harvard Embedded EthiCS program integrates ethics into the curriculum by having philosophers teach modules in computer science courses.

Last year, the NeurIPS conference added a requirement that papers discuss “the potential broader impact of their work … both positive and negative.” Their announcement immediately stirred debate. One researcher tweeted, “I don’t think this is a positive step. Societal impacts of AI is a tough field, and there are researchers and organizations that study it professionally. Most authors do not have expertise in the area and won’t do good enough scholarship to say something meaningful.” Others attacked: “So most authors should suck it up and stop creating things unless they get that expertise. It’s that simple.” “Morally vacuous white guy contingent argues that they don’t have enough ‘expertise’ to consider the impact of their work! AI research is a cesspit.”

While questioning a paper’s potential impact can highlight neglected issues, sometimes people can be too quick to judge a project’s risk. Last July, a paper accepted for the International Conference on Intelligent Robots and Systems (IROS) described an algorithm for estimating crowd density from drone footage. “Should conferences have a policy for papers that clearly have harmful applications?” someone wrote on Reddit. “This paper I saw today … will appear in IROS, and it seems like it cannot possibly be used for good or even other related problems.” Someone else replied, “Managing crowd density and movement is a valid need for every major public event that involves crowds.” Another: “If you’re looking for sketchy papers, this isn’t it.”

Others have criticized decisions to withhold technology. In 2019, OpenAI decided not to immediately release its GPT-2 language model-software that writes-for fear it could “generate deceptive, biased, or abusive language at scale.” At the AAAI conference last year, Geoffrey Hinton, one of the “godfathers” of modern AI, who conducts research at the University of Toronto and Google Brain, said that OpenAI’s move appeared to be a “publicity stunt,” echoing a popular sentiment. (When I asked him about Google’s decision, just two weeks prior, not to release a sophisticated chatbot because of safety concerns, he allowed that OpenAI probably had safety in mind, too.)

“Taking the position that inventors should conceal some technologies from public view is weirdly elitist,” David Forsyth, a computer scientist at the University of Illinois at Urbana-Champaign, told me. “It implies there is an establishment that knows better. This line of thinking has not tended to end well in the past. I do think that we’re mostly better off having potentially annoying, inconvenient, improper, or offensive technologies out in the open and debated publicly-as opposed to hidden in laboratories and possibly deployed without debate by either governments or companies.”

Michael Kearns, a computer scientist at the University of Pennsylvania, noted one paper that helped raise alarms about what’s possible with AI. Michal Kosinski, a professor of organizational behavior at Stanford University, published a journal article showing that an algorithm, using nothing more than people’s Facebook likes, could predict attributes including drug use, sexual orientation, and even whether one’s parents were still together when one turned 21. “People didn’t realize until that paper that something they do in public that they thought of as innocuous, like ‘Oh, I like this cat photo,’ statistically identifies a lot of things that they might have wanted to keep private,” Kearns said.

Still, many agree the field needs some process withholding certain technologies, or at least shaping how they’re presented. Where is the best place in a research project’s pipeline to place gatekeepers? One option is not at the level of journal or conference peer review, but at the level of institutional review boards, before a project starts. But that’s not ideal. “The law that underpins what IRBs do was written for other disciplines, like medicine and psychology and social sciences, where there’s this assumption that you’re in touch with the people that you’re experimenting on,” Ben Zevenbergen, a research scientist at Google, who was an academic at Oxford and Princeton University when we spoke, told me.

Let’s say you train your algorithm on faces found on the internet, those of so-called data subjects. “If you are in touch with [these people],” Zevenbergen said, “the data subject will maintain some sort of self-determination, because they can actually understand and ask questions about this research and decide whether it’s beneficial for them to take part, or whether it’s something that they think is politically correct to do.” Noting some tweets about an algorithm that predicts faces from voices, raising concerns about transphobia, he said, “those issues could surface if the researchers bothered speaking to the data subjects.”

There are several reasons one might want the ethical gatekeepers not to be at the institution level-whether an IRB or a board dedicated to the impact of AI-but at the publication level. One is that journals and conferences can set field-wide norms. “Computer science is being done in so many different places with really different ethical norms,” Katie Shilton, an information scientist at the University of Maryland, told me. Considering the use of public data, for example, “It’s a much more complicated picture in China.” Second, if conferences require that papers receive approval from institutional boards, researchers from smaller schools or countries will be disadvantaged. Third, Kearns said, “it’s better at the conference level because that’s where the experts are.” Even a school like Penn might not have enough people with expertise in the ethical application of machine learning, plus its more technical aspects, to make up a full board.

“In an ideal world, we would rather have this kind of discussion and deliberation happening throughout the research process,” David Danks, a philosopher at Carnegie Mellon who studies machine learning and has headed CMU’s IRB, told me. “Part of what makes something research is we don’t know how it’s going to turn out. You can’t make a decision like this just at the outset and then never again revisit it. At the same time, if you wait until the research is done and you try to publish, then in many ways the proverbial cat is out of the bag.” If he had to choose between evaluating impact at the institution level before research starts or at the publication level after it’s done, he’d pick the former. “Part of that is because it is getting harder to know what constitutes publication,” he said. If you circulate a working paper, or write a blog post, or post a PDF on the preprint server arXiv, does that count as publication? These outlets escape peer review entirely.

A lot of research at conferences comes from researchers at tech companies like Google and Facebook, presenting another possible checkpoint for tech ethics. But companies might fear including impact statements that raise questions about tech at the heart of their own businesses. Last year, Timnit Gebru lost her job as co-lead of Google’s AI ethics team because the company had asked her to withhold a paper on the risks of language models, and she resisted. Google uses such models in its search algorithms.

My article stirred a lot of good discussion on Twitter and elsewhere. I hope it continues.

Originally published at