Will an Open Source Algorithm Soon Be Writing the News?

February 18, 2019

February 18, 2019 – An open-source artificial intelligence not-for-profit research company, OpenAI, in the last week demonstrated a language-learning algorithm that can write the news, do translations, and answer questions without supervision. In testing the machine learning system achieved state-of-the-art results in 7 of 8 language datasets producing output containing coherent text. In other words, what OpenAI demonstrated was an AI that could do what reporters do and succeed at it without supervision.

OpenAI describes itself as a “non-profit AI research company, discovering and enacting the path to safe artificial general intelligence.” But some would argue that this latest machine learning capability is anything but safe.

In its demonstration, Open AI showed that GPT-2, the machine learning algorithm could take a human-written opening statement, and then from the dataset of information to which it is exposed, write somewhat convincingly on the topic. OpenAI doesn’t want you to be too alarmed noting that GPT-2’s capability works best on short pieces of prose, a few hundred words. After that, it starts to “drift off topic.” It further stated in the demonstration that GPT-2 cannot write a news story where it would have to go out and ask the questions to get the right facts and figures. Based on its dataset exposure, GPT-2 largely writes fiction.

But could it become a “fake news” writing system? That’s a concern that OpenAI takes seriously. That’s why it has decided to not release GPT-2 to the public, in contrast to much of the other work produced by the company. States OpenAI’s policy director, Jack Clark, “I’m worried about….actors generating arbitrarily large amounts of garbage opinion content….actors who do stuff like disinformation.”

So what is the secret to GPT-2’s abilities? What differentiates it from most AI language learning systems, is that its learning is unsupervised, whereas most other machine learning systems are supervised. In supervised learning AI is exposed to a huge data set where every bit of information is labeled with input and output objectives clearly defined.

Humans tend not to learn this way. Rather we get data from many different sources and figure things out as we go. That’s unsupervised learning, and that’s what GPT-2 does. To build it OpenAI used a neural network system known as the Transformer, an invention by researchers at the Google Brain project.

GPT-2 contains 1.5 billion parameters and initially trained using an 8-million web page dataset. The simple objective set by OpenAI was to have the machine learning system read the previous words of a text and predict the next word to follow. The unstructured data exposed GPT-2 to lots of words giving it “the ability to generate synthetic text samples of unprecedented quality.”

The following is a sample of GPT-2 output provided by OpenAI.

What humans at OpenAI wrote:

“In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.”

What GPT-2 added over ten tries (apparently it takes a few before output displays a level of coherence which at times appears downright spooky, and in other sections incoherent):

“The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

“Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

“Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

“Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

“Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

“While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

“Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

“While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

“However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution or at least a change in social organization,” said the scientist.”

Lots of fanciful stuff here and clearly GPT-2 isn’t sure what mythical unicorns really are, but nonetheless, this represents a considerable advance in machine learning attempts to generate synthetic speech. OpenAi has released a smaller version of GPT-2 for the public to try at its OpenAI GitHub site.

OpenAI exposed its natural language synthetic speech generator GPT-2 to an 8-million web page dataset to let it learn in an unstructured way how to generate stories that some might call “fake news.” (Image credit: CNET News)

Postscript: Elon Musk who is a co-founder of OpenAI, along with Sam Altman, President of the seed IT venture capital company, Y Combinator, Greg Brockman, CTO and Co-Founder, Harvard and MIT graduate, Reid Hoffman, CEO of LinkedIn, Jessica Livingston, of Y Combinator, and Peter Thiel, who with Musk co-founded PayPal, created OpenAI to ensure that the field didn’t stray into becoming an existential threat to humanity. Musk, who resigned from OpenAI two days ago to focus, in his words, more of his time on SpaceX and Tesla, must have had second thoughts about the direction OpenAI has taken in creating GPT-2. As an Australian reporter described this program on the News.com.au site, “this technology could absolutely devastate the Internet as we know it.” I don’t think that was what Elon Musk had in mind when he critically assessed the efforts of Microsoft, Google, IBM, and others, in their pursuit of AI. The resignation by Musk from OpenAI may be better explained by the above quote.