What is GPT-3?

GPT-3 is a Generative Pre-Trained Transformer1. The "Transformer" in its name refers to the specific neural network architecture it employs. Transformers were invented at Google, and are particularly adept at understanding words in the full context of all the surrounding words in a sample of text.

At its core, GPT-3 is the ultimate autocomplete tool. The model has ingested a vast corpus of words from the internet, and uses this data to make predictions about what words or sentences are most likely to follow a given input sequence of text. You feed GPT-3 a snippet of text, and its deep learning model predicts the most probable continuation of that text, essentially "autocompleting" it.

GPT-3 has been trained on trillions of words - for context, all of Wikipedia makes up only 0.6% of its training data. It utilizes 175 billion parameters in its neural network, compared to just 1.5 billion for its predecessor GPT-2. This required well over $10 million of compute resources to train.

GPT-3 is the flagship product of OpenAI, an AI research company founded with the vision of creating artificial general intelligence (AGI)Their mission is to ensure this AGI benefits humanity as a whole. OpenAI started as a pure non-profit but latterly pivoted to a commercial model with a $1B investment from Microsoft. It's ongoing commitment to openness is, shall we say cough unclear.

Why is it such a big deal?

Natural language AI models based on neural nets have been around for quite some time2, but there are several reasons why GPT-3 represents a big leap forwards in capabilities.

Firstly, GPT-3 is versatile and task-agnostic. It ingested a huge variety of text during training, without tuning for any particular domain. This makes it adaptable to a wide range of applications. Previous models were specialized for certain tasks like translation or text summarization.

Secondly, GPT-3 can mimic styles of text and express novel concepts far beyond what was previously thought possible. It can easily write haikus, rap songs or Shakespearean sonnets. It can generate text that convincingly copies the style of a famous philosopher like Bertrand Russell, or even a less famous philosopher3 like Taylor Swift. Even more surprising, GPT-3 can take simple prompts and express completely new concepts it was never directly trained on4.

Thirdly, GPT-3 approaches skills previously thought to be distant frontiers of AI. Tasks like 2 arithmetic, completing analogies and logical reasoning were seen as far beyond the capabilities of current systems. Yet GPT-3 exhibits basic competence at many of these. Some are even emergent in a zero-shot setting, i.e. you don't even need to show it examples of what you want it to do first. This raises profound questions about the nature of intelligence and how close AI systems are to more advanced reasoning.

Finally, and perhaps most importantly, GPT-3's generated text appears to be remarkably human-like and coherent. It's output is indistinguishable from human-generated text in many conversational settings. The 'Turing Test' (to be indistinguishable from a human response) was until recently seen as the ultimate test of an AI. GPT-3 appears to have soared past this mark with nary a backwards glance.

Philosophical implications

Given its impressive mastery of language, GPT-3 provokes deep philosophical questions:

If an AI system can generate coherent text, does this entail some form of understanding, or is it just a "meaningless word salad" that happens to accidentally mimic human language? This question strikes at the heart of consciousness - What's the difference between this and how a human brain works, and does it matter? GPT-3's ability to pass constrained Turing tests raises the question - should we declare victory in AI based on this older benchmark? Or does it suggest we need fundamentally new tests to assess for true intelligence?

Since GPT-3 merely reflects its training data, it inherently perpetuates the societal biases therein. Do we really want to train AIs based on the depravity that exists on the internet? Without careful controls, could the spread of systems like GPT-3 exacerbate issues like gender and racial prejudice? This speaks to the urgent need for ethics and oversight in AI development.

Looking ahead, GPT-3 demonstrates a versatility and fluency with language that represents a new layer in the tech stack with immense disruptive potential. If GPT-3 gets better at tasks that were teh traditional domains of humans, what will the impact be on jobs? Will new roles emerge from this technology, or will it obsolete skills? This question applies to any profession where language is the intermediate medium for data processing (hint: that encompasses is a lot of knowledge work).

The future of natural language AIs

Despite its prodigious language skills, GPT-3 seems unlikely to be the final model or architecture when it comes to natural language AI. Its outputs still seem brittle and lacking deeper comprehension in many cases. We may need more radical innovations to achieve human-level artificial general intelligence. Many of the incredible examples we see today are clearly cherry-picked results from within a set of less impressive answers. GPT-3 provides a glimpse of the near future, but not necessarily the end game.

Yet this makes GPT-3's achievements all the more impressive and exciting. Its versatility could unlock a productivity bonanza in the near term. GPT-3 represents one of the most thought-provoking AI systems created to date. For both its tangible potential applications today, and its role in advancing us towards an artificial general intelligence for tomorrow.

I, for one, welcome our new transformer overlords.


  1. I discussed GPT-3 in August 2020 on this podcast episode https://startupification.fm/episodes/20-gpt-3. This article was co-authored by AI based on the transcript of that episode. It represents my views and knowledge at the time of recording - not necessarily my views and knowledge now (July 2023) and as such I have dated the article to the time of recording. Notably, we didn't get our first glimpse of chatGPT until over 2 years later.
  2. One of the earliest and most influential neural network models in the field of natural language processing is Word2Vec, developed by researchers at Google and introduced in 2013.
  3. Arguably
  4. A 'gangster rap in the style of Bertrand Russell about the Crufts dog show' was unlikely to be in the training data set but will be straightforward for GPT-3.