🧠 Large Language Models, ChatGPT, and You

Our brains are organic GPUs, liquid cooled to 98.6 degrees.

Mar 09, 2023

English and math have always been two different subjects in school. But what if I was to tell you that English is math? Or should I say, how we compose sentences is nothing more than pattern recognition, statistical analysis, and some handy repetition. Now, I’m not going to get into the infinitely complex topic of explaining how humans learn and use language, but I’ll pose an idea; At the core, computers are super-powered calculators. If a computer can successfully communicate with humans using written language, does that mean we can use math to speak? The answer is yes.

If you’ve been on the internet recently, you’ve heard of something called ChatGPT. The name explains a lot about the concept. GPT stands for Generative Pre-trained Transformer, which is a fancy way of saying “computer program trained to generate human-like text.” ChatGPT enables you to chat with a GPT model. It’s completely broken the internet, and the conversations can sometimes feel like we’ve finally reached the future that Terminator depicted. But they also leave you wondering, “How the heck does this GPT thing work?”

The Generative Pre-trained Transformer is many things in one. It’s a Generative (it generates new text) Pre-trained (it’s seen a lot of text from various places) Transformer (it transforms input text into output text). GPT is actually one implementation of a Large Language Model (LLM). And a key innovation that enables all of these LLMs to work is called embedding. We’ll start there.

An embedding model is a black box that converts words into coordinates on a graph. It can take single words, phrases, sentences, or even whole paragraphs. The goal of embedding is to be the bridge between what we read and what computers read. When comparing the coordinates of similar phrases (“Dog”, “puppy”, “golden retriever”), they’ll be graphed nearby each other. This same process is applied to sentences by preserving the order of the words and applying the embedding to the whole sentence, through a process called positional encoding. I can’t explain how this works, but let’s continue knowing we have this magical black box.

We use cosine similarity to determine if two embedded vectors are similar.

When you can turn words into numbers, Wikipedia becomes a Sudoku puzzle. Researchers at companies like OpenAI (creators of ChatGPT) were able to scrape billions of lines of text from all over the internet, put them through an embedding model, and feed them into a machine learning model (the transformer) that could learn the abstract patterns we use when communicating ideas through language. That’s why these models are called Large Language Models, because they’re very large and very language.

When you ask ChatGPT something like “What is the meaning of life?”, does it actually know the answer? Sadly, the answer is no. ChatGPT doesn’t think; it graphs. In a rapid-fire sequence of linear algebra, ChatGPT takes the question you asked, encodes it, finds nearby text in its coordinate plane and returns that text to you in a way that is statistically similar to other answers it’s seen to this question.

A depressing way to look at this would be to evaluate how I’m writing this memo right now. Each word I type on this paper is my brain making the best statistical choice, given the words I’ve written so far, my historical knowledge of how I should structure these words and the vocabulary at my disposal. In the end, writing is just the regurgitation of twenty-eight years of words and sentences that I’ve been exposed to, strung together in a way that satisfies some innate optimization problem wired up inside of my premotor cortex.

I’ve been having a lot of fun playing with ChatGPT and the other models OpenAI has available. Learning more about these foundational technologies has been intellectually enjoyable, but it’s also pushed me to wonder about more fundamental philosophical questions. How does the world change if large language models become widespread? What does that mean for our understanding of language, consciousness, and creativity? I don’t have any definitive answers yet, but maybe if I’ll find some insights by asking ChatGPT.

🧠 Large Language Models, ChatGPT, and You

Our brains are organic GPUs, liquid cooled to 98.6 degrees.

Discussion about this post