Harry Potter and the Sorcerer’s AI
The only sounds drifting from Hagrid’s hut were the disdainful shrieks of his own furniture. Magic: It was something that Harry Potter thought was very good.
Readers who spend considerable time on nerdier corners of the internet probably already know what’s going on. To everyone else: No, you aren’t reading the work of a Harry Potter fan with a wildly fluctuating prose. In fact, you’re not reading the writing of any human at all.
These are instead the opening lines of a story titled “Harry Potter and the Portrait of What Looked Like a Large Pile of Ash.” That’s a title penned entirely by an AI trained on J.K. Rowling’s Harry Potter novels by a team of programmers at Botnik Studios.
The Semantic Void
It’s not hard to see why it wound up being shared across every social media platform. It features Ron tap dancing, the evil Death Eaters politely applauding a confession of love, and Harry flinging his own eyeballs at Voldemort.
Unfortunately, it lacks the most important quality in any piece of writing: basic coherence. It’s perfectly grammatical — impressively so — but semantically void.
Natural language processing has made significant strides in composing its own works based on given data sets, producing copy that usually contains correctly formulated sentences (some of which even make sense) and is almost always downright hilarious to its human readers.
What AI can’t yet do is assemble those sentences into a coherent narrative, unless given a strict template to follow, like in a standardized report. So what do we need to do before AI can author book chapters that don’t contain phrases like “Ron saw Harry and immediately began to eat Hermione’s family”?
The only sounds drifting from Hagrid’s hut were the disdainful shrieks of his own furniture.
Computational Linguist William McNeill, a machine learning engineer at AI solutions provider SparkCognition, has spent a lot of time thinking about the problem of natural language generation.
The first thing to understand about NLG is that it’s not what’s under the hood in all those charmingly chatty digital assistants like Amazon’s Alexa, or even automated service agents and chatbots.
“Chatbots, and Siri and Alexa, they follow a script. You can’t really compare them to what Botnik’s doing, because it’s completely different,” McNeill says. Digital assistants don’t actually compose sentences of their own. They simply select the most appropriate pre-written response based on cues in the user’s input for which they’ve been programmed to listen.
“Right now, they’re much more useful than real NLG, but much less sophisticated,” McNeill says. “Chatbots have existed since the ’60s, and the basic technology behind them hasn’t changed.”
So what is NLG then? When scientists talk about NLG now, they’re usually referring to AI-powered natural language processing, the kind of highly advanced technology that produced “Harry Potter and the Portrait of What Looked Like a Large Pile of Ash.”
The favored approach to this sort of NLG is recurrent neural networks. Essentially, an RNN differs from a traditional artificial neural network in that its output is not solely dependent on the most recent input.
It also factors in every input it’s been fed previously. RNNs use loops to retain and use information from previous steps, allowing them to process sequences of inputs that are dependent on one another.
This capability has been key to enabling more advanced NLG. It’s what allows the glimpses of continuity in paragraphs such as “Ron was going to be spiders. He just was. He wasn’t proud of that, but it was going to be hard to not have spiders all over his body after all is said and done.”
Not exactly Nobel Prize-worthy literature, but each sentence successively builds on a single idea in a way that, before RNNs, was unthinkable.
To further improve this capacity for continuity, the most commonly employed type of RNN is a long short-term memory, or LSTM, network. LSTMs can keep track of dependencies over greater distances, allowing continuity in generated text to extend beyond just a few sentences.
Writing Through Statistics
How exactly do LSTMs create written works? In the simplest terms, through statistics. A model working at the word level, when fed a body of text, analyzes the probabilities of any given word or string of words sequentially followed by another word or string of words.
For instance, a noun such as “door” or “hat” is likely to be preceded by either “a,” “an,” or “the.” The word “hermetically” will almost never exist without being immediately followed by “sealed.” There will always be a close parenthesis somewhere in the text after an open parenthesis.
Stretch this statistical analysis across larger and larger blocks of text, as is enabled by LSTMs, and you’ll have AI-produced works of surprising grammatical accuracy.
This capability is already being put to work to accomplish some incredible — and incredibly time-saving — feats. Working from human-created templates, NLG is capable of generating formulaic standardized reports of many kinds, including stock updates, weather reports, product descriptions, and summaries of data analysis.
But will it ever be able to do anything more?
The Surreal Frontier
McNeill isn’t sure what the future holds for natural language generation.
“[LSTM networks] are great at poetry and surreal movie scripts,” he says. “If something is enough like natural language, we’ll read meaning into it. We’re good at pattern recognition, and we’ll infer a lot from the surface. The subjectivity is what makes it fun. It’s giving you the space to read meaning into it, like a horoscope.”
Much like a horoscope, though, this illusion of meaning doesn’t go past the surface. The surreal nature of some of these more creative works just makes it easier for these pieces to sound superficially like the real thing.
“Machines can’t intentionally play with rules and expressions. There’s no underlying meaning,” he says.
Getting better at creating these sorts of cute party tricks isn’t hard, according to McNeill. It’s a question of inputting ever more data, and continuing to refine the ways models encounter that data. That, of course, requires even more advanced mathematical techniques.
The bigger question is how to bridge the scripted, useful tools (like chatbots) and the fluid, useless (creative) stuff?
One of the fundamental issues is how to teach AI to apply general, real-world knowledge. To successfully write a short story about a criminal trial, for example, it’s not enough to understand how to put a sentence together.
It needs to know how the legal system works, what actions constitute a crime, and how humans might reasonably be expected to respond to a wide range of emotional stimuli. It needs to know innumerable details that could be relevant to a case — be it normal speeds for a car or the price of a haircut.
“There’s so much basic, contextual knowledge about the world that you need, and we don’t know how to do that,” McNeill says.
He’s not the only one to recognize this problem. It’s arguably one of the biggest challenges for current deep learning approaches, with their heavy reliance on statistics and little else.
A growing school of thought argues that NLG and deep learning AI in general is hitting a wall in its capabilities, because genuine human intelligence is built out of more than just pattern recognition and statistical analysis.
Humans possess a body of knowledge about the world and are capable of making logical assumptions based on that knowledge.
We know that objects stay in place unless an outside force moves them. We know that if you’re in danger, you should try to get out of danger. These ideas are so simple we rarely even bother to articulate them, but they’re a necessity for writing even a basic news article.
There are several research projects working to fill this gap in AI understanding. Seattle’s Allen Institute for Artificial Intelligence has been experimenting with teaching common sense to an AI. They do this by crowdsourcing hard-coded rules, inferences, and causal knowledge via Amazon’s Mechanical Turk.
These efforts have shown some promise, but they’re also labor-intensive, time-consuming and, so far, hit or miss. Still, such combinations of hard-coding and statistical learning — or, put another way, of nature and nurture — may be the most likely path toward a more genuine model of intelligence.
Ron was going to be spiders. He just was. He wasn’t proud of that, but it was going to be hard to not have spiders all over his body after all is said and done.
Creating Something New
McNeil still isn’t certain how much NLG can improve, even with the introduction of knowledge and improved logical reasoning. After all, machines can be taught how to write, but that doesn’t mean they have anything to say at all.
“Machine learning is all about replicating what it’s seen. But language generation also asks it to create something new,” he says.
For an AI, which has no intent or meaning behind its communication, the creative aspect is randomly generated. In that sense, it’s not generating language; it’s generating something that has the same shape as language.
“It’s almost a philosophical question,” McNeill says. “Even if machine language became indistinguishable from human language, without intent, is that still something fundamentally different? If it’s not communication, can it really be called language?”
McNeill believes it may well be that an AI will never be able to write its own Harry Potter novel until we develop AI with a rough equivalent of sentience. For now, it seems, we may have to content ourselves with classic prose like this:
Harry looked around and then fell down the spiral staircase for the rest of the summer. ‘I’m Harry Potter,’ Harry began yelling. ‘The dark arts better be worried, oh boy!’