One massive concern about the rise of artificial intelligence and machine learning is the perpetuation of human biases. This could be racism (the story, possibly apocryphal, of a black person being tagged as a gorilla) or sexism (see tweet below) or any other forms of discrimination (objective looking data that actually represents certain divisions).
Turkish is a gender neutral language. There is no “he” or “she” – everything is just “o”. But look what happens when Google translates to English. Thread: pic.twitter.com/mIWjP4E6xw
— Alex Shams (@seyyedreza) November 27, 2017
In other words, mainstream concern about artificial intelligence is that it is too human, and such systems should somehow be “cured” of their human biases in order to be fair.
My concern, though, is the opposite. That many of the artificial intelligence and machine learning systems are not “human enough”. In other words, that most present day artificial intelligence and machine learning systems would not pass the Turing Test.
To remind you of the test, here is an extract from Wikipedia:
The Turing test, developed by Alan Turing in 1950, is a test of a machine’s ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human. Turing proposed that a human evaluator would judge natural language conversationsbetween a human and a machine designed to generate human-like responses. The evaluator would be aware that one of the two partners in conversation is a machine, and all participants would be separated from one another. The conversation would be limited to a text-only channel such as a computer keyboard and screen so the result would not depend on the machine’s ability to render words as speech.[2] If the evaluator cannot reliably tell the machine from the human, the machine is said to have passed the test. The test does not check the ability to give correct answers to questions, only how closely answers resemble those a human would give.
The test was introduced by Turing in his paper, “Computing Machinery and Intelligence“, while working at the University of Manchester (Turing, 1950; p. 460).
Think of any recommender system, for example. With some effort, it is easy for a reasonably intelligent human to realise that the recommendations are being made by a machine. Even the most carefully designed recommender systems give away the fact that their intelligence is artificial once in a while.
To take a familiar example, people talk about the joy of discovering books in bookshops, and about the quality of recommendations given by an expert bookseller who gets his customers. Now, Amazon perhaps collects more data about its customers than any such bookseller, and uses them to recommend books. However, even a little scrolling reveals that the recommendations are rather mechanical and predictable.
It’s similar with my recommendations on Netflix – after a point you know the mechanics behind them.
In some sense this predictability is because the designers possibly think it’s a good thing – Netflix, for example, tells you why it has recommended a particular video. The designers of these algorithms possibly think that explaining their decisions might given their human customers more reason to trust them.
(As an aside, it is common for people to rant against the “opaque” algorithms that drive systems as diverse as Facebook’s News Feed and Uber’s Surge Pricing. So perhaps some algorithm designers do see reason in wanting to explain themselves).
The way I see it, though, by attempting to explain themselves these algorithms are giving themselves away, and willingly failing the Turing test. Whenever recommendations sound purely mechanical, there is reason for people to start trusting them less. And when equally mechanical reasons are given for these mechanical recommendations, the desire to trust the recommendations falls further.
If I were to design a recommendation system, I’d introduce some irrationality, some hard-to-determine randomness to try make the customer believe that there is actually a person behind them. I believe it is a necessary condition for recommendations to become truly personalised!
Wouldn’t the randomness cause some obscure results and still fail the Turing test?