2022 has been called the “year of AI” by many media outlets (Forbes, Washington Post, many others) In particular, ChatGPT has sparked both the imagination and fears of the public. I personally find that many misunderstand what ChatGPT actually is and instead get caught up in what it seems to be. So why am I writing a blog piece aiming to do my part to help clear up these misconceptions? ChatGPT is a powerful landmark AI application that garners both undue fear and exaggerated claims – I hope that you, dear reader, would be in a better place to navigate between these two extremes after reading this article.
What is ChatGPT?
ChatGPT (Chat Generative Pre-trained Transformer) is a large language model (LLM) develped by OpenAI for conversation modeling. What the heck is a large language model?
First – what’s a language model? A language model is a model that is shown many examples (“trained”) of how to continue a sentence. (What is a model you ask? I will return to the word “model” in a bit.) For example, if I typed:
At the stroke
a language model could continue it as:
At the stroke of midnight [70%]
At the stroke of a pen [30%]
Where the percentages are the frequency those sequences of words occur. Your google searches, smart phones, and email applications already incorporate language models to do this. This is because the language model has been shown enough text (remember – Apple, Google, etc have so much data on you) — it has learned what words are likely to follow others. But what if the sentence previous to “At the stroke” contained the name “Brian Gutekunst” (the General Manager of the Green Bay Packers for non-sports fans out there) , or “Cinderella”? How does that change how a language model generates text? If the previous sentence contains the name “Brian Gutekunst”, the continuation is more likely to be:
At the stroke of a pen, Brian Gutekunst made Aaron Rodgers the highest paid Quaterback in the NFL
However, if the previous sentence contains “Cinderella”, the continuation would more likely be something like:
At the stroke of midnight, Cinderella’s carriage turns into a pumpkin
Language models that takes into account the context like the examples I gave above are called “Large Language Models” – LLMs. So – ChatGPT is a LLM that has been trained for conversation modeling – how to continue a conversation. I should mention that ChatGPT is more specific/narrow in scope than OpenAI’s GPT-3 (Generative Pre-trained Transformers – 3rd generation) which was released in June 2020. If you are interested in the differences betweeen ChatGPT and GPT3, feel free to reach out to me. For the purposes of this blog, GPT-3 “can do more things” than ChatGPT. (Also GPT-4 is coming later this year)
So, basically, ChatGPT is a LLM “fine tuned” from GPT-3 to produce text in a dialogue system (the “Chat” in ChatGPT) that meets human standards. They did this by scraping internet conversations and bringing users (specifically, OpenAI hired a bunch of hourly contractors) into the loop to incorporate context on which possible word continuations would be more convincing to a human.
ChatGPT Can’t Reason
ChatGPT sounds pretty cool, and in many cases, it is! But it also has many, let’s say, weaknesses that I will get to later. I’ve personally noticed that much of the fear and exaggerated claims around ChatGPT is rooted in misinterpreting the impressive chat modeling as demonstrating some sort of human-like reasoning. I suppose two reasons contributing to the misunderstanding is that A) the dialogue interactions can be convincing and B) ChatGPT, like most other LLMs, are built with algorithms called neural networks that are loosely inspired by connections between neurons in the brain.
ChatGPT can not reason. It can not do math. It can not solve logic puzzles. It can not invent new knowledge. It’s definitely not sentient. Ask it standardized math questions for high school students – specifically AMC10/AMC12 – and watch it fail horribly. My favorite example is this hilarious example from the AI luminary Andrew Ng, where ChatGPT thinks an Abacus is faster than a GPU. (Which is equivalent to saying an Abacus is faster than a PlayStation 5) The conversation itself is complete nonsense.
ChatGPT isn’t close to demonstrating human-level reasoning, and I am personally skeptical that the AI field is even close. Both OpenAI and DeepMind want to build machines with human-like reasoning capabilities through research in what has been coined “Artificial General Intelligence” – or AGI. But there is no universal agreement on how AGI should be defined or how reasoning should even be quantified at the level of a machine. The community is split on whether AGI even should be a goal of the field, which you can read more about in this excellent discussion at MIT Technology Review. How can a scientific community make progress if the community can’t agree on its foundational premises and most pressing problems? It will take some time to cut through the current level of confusion around AGI. At the other skeptic extreme, I’ve heard the claim that ChatCPT simply memorized every interaction on the internet. This is definitely not the case and is mathematically impossible – from information theory a LLM memorizing all the text on the Internet would take more space than it would take to store the entirety of the Internet in compressed form.
ChatGPT Approximates, Which is Very Useful
So if ChatGPT is not reasoning, nor is it memorizing, what is it doing, exactly? It’s approximating. Specifically, what make neural networks so powerful is that they are fantastic function approximators. I suppose “buy our awesome function approximator” isn’t a great marketing strategy, and it’s perhaps hard to capture the imagination of the media and the general public that way. But like the force gives a Jedi its power, so does the Universal Approximation Theorem (which is quite deep and profound) gives neural networks their power. “Oh no – not a math lecture” you may be saying, dear reader, but stay with me I promise I’ll be brief and cute animal pictures are coming. By the way – for the purposes of this blog you can consider “function approximations” and “models” (Remember Large Language Models, LLMs, from earlier? Of course you do) as synonymous.
Much of science, engineering, economics, and finance have functional relationships that can be explicitly written down. For example, the market price of a house usually exhibits a linear relationship with the size of the house:
You can easily fit a straight line with the above example. But say you want to write a function to automatically recognize pictures of cats and dogs?
How do you even begin to write a function (or – if you prefer – build a model) to do this? With neural networks, you don’t have to, it will approximate the function for you if shown enough examples, ie, it is “trained”. A bit of oversimplification, but there are two broad model classes: Discriminative and Generative:
In our cats and dogs example, a discriminative neural network model will approximate a function that separates cats from dogs. For a brilliant illustration and satire, Mike Judge nails it in this short and hilarious HBO’s Silicon Valley clip. A generative neural network model will attempt to learn the underlying probability function of cats and dogs separately. This is where the “G” in “ChatGPT” comes from – Chat Generative Pre-trained Transformers.
So what exactly is ChatGPT approximating? It’s approximating conversations from both the internet and transcriptions from OpenAI contractors, emulating the style and word choices of these conversations. ChatGPT guesses what words and phrases should come next in a conversation, and assigns a probability for which word /phrase should come next based on the conversations it has previously seen . We don’t know for sure what data was used exactly in training, but it is some combination of guided dialogue transcriptions they sourced from their contractors and data gathered from the internet called the “common crawl” that is filtered for internet conversations from sources such as twitter, reddit, stack overflow, and stack exchange. This is the “P” in ChatGPT, Chat Generative Pre-trained Transformers
So what is ChatGPT useful for? Generally, neural network models are useful for automation: making difficult or mundane tasks much easier/faster. (As an “elder millennial”, I remember the days of physical maps and Mapquest. The seamless integration of models in Google Maps has automated navigation, for example) As ChatGPT is currently the most sophisticated conversation modeler, it’s immediately useful for automating many use cases revolving around Questioning and Answering (Q&A). However, since conversation modeling is quite broad, it can also be used for many things outside of Q&A. Some early applications include:
- Customer Service and Triage. Customer service is one of the bread and butter applications of Q&A. There are already many chat bots in the wild for automated customer service, and they will likely get more sophisticated and ubiquitous as LLMs such as ChatGPT mature.
- Content Creation, such as outlining and form letters. ChatGPT has seen many outlines during training. So, ask ChatGPT to create a structure of an article, or a book, or a thesis, or a proposal, or a letter of intent, and it will do quite well. This also works for the complete content of things like form letters.
- Code snippets. This seems surprising, but remember that since ChatGPT approximates conversations from its training set, it has seen many lines of code on Stackoverflow, which also links to open source programming documentation. Ask ChatGPT to write code that corresponds to the type of code that exists in code documentation and it will do quite well at reproducing it. The actual code may not do what you want it to do (indeed, Stackoverflow banned generated code for this reason), so you should test/edit the generated code in small pieces. However, this will change programming education, possibly lowering the bar to entry. Which leads me to….
- Education. As ChatGPT is very good at drafting content (see 2. and 3.) the applications to education is obvious. However, this is currently quite controversial.
- Domain-specific assistants. It’s beyond the current scope of ChatGPT, but they have a lot of future promise as domain-specific assistants.
- Language Assistants. ChatGPT isn’t there yet, but the components are there to make Babel Fish a reality.
- Generating Better Product Copy. Related to 2.; I do see ChatGPT being used for generating product descriptions that make products easier to sell.
ChatGPT is still new so there is likely to be other innovative applications.
Is ChatGPT a glorified BS artist? Weaknesses, pitfalls and Future
The limitations and pitfalls of ChatGPT are many. I’ll start here:
- The model does much better in areas well represented as digital text (computer code, politics, science) than in areas that are not. The more sentences there are of a topic in the training set, the better ChatGPT is able to generate responses. (The more aligned the conversation is with the “P” in ChatGPT, the better it does) Remember – ChatGPT is a very sophisticated function approximator, and it can only approximate responses based on the data it has seen.
- ChatGPT will reproduce misinformation from any of its input sources — it is not an intelligent system that tries to balance or weight different perspectives.
- Because the choice of words captures tone and reveals biases, ChatGPT will tend to reproduce the tone and bias of the articles in its input corpus. Confident, scientific, or racist, it will reproduce anything.
- Speaking of bias – as ChatGPT is a function approximator, (sorry for repeating myself), the bias-variance tradeoff is unavoidable.
- The sources of individual fragments are lost. This is not information retrieval. (And because this is not information retrieval, declarations regarding the end of Google are premature)
- Generative models (the G in ChatGPT) are notoriously hard to verify.
What does one do to verify truth of responses from ChatGPT? How do you handle its bias? Since ChatGPT can not do information retrieval, how can you verify its sources? These are serious questions. What do you call a system that sounds confident about what it says regardless of whether what it’s saying is true? Harry Frankfurt wrote an entire book cautioning against such a person – the Bullshitter. Some folks react to this fact with a hint of nihilism – Wired Magazine argues that of course ChatGPT is a BS artist because the internet is inherently BS. Blind trust in ChatGPT can cause a lot of confusion and harm, such as perpetuating harmful stereotypes and biases, increased political polarization, lead business leaders to make costly ill-informed decisions, and scam artists trying to rip off the unsuspecting public, which is already happening.
LLMs such as ChatGPT are here to stay. What does that mean moving forward? ChatGPT is clearly disruptive, with many useful applications, but I want to leave three parting thoughts about ChatGPT moving forward. My first comment is a sobering reality check: We will inhabit an AI future that offers most significant returns to investors. Microsoft just recently invested 10 billion dollars, taking a 49% stake in OpenAI. (For a company that started as a non profit, how times have changed) My second comment is that I hope more leaders will speak out on the ethical concerns of LLMs specifically and AI in general moving forward. I am encouraged that Demis Hassabis of DeepMind spoke to Time Magazine yesterday urging caution. I anticipate that the messaging may be mixed moving forward, but I hope more leaders follow in Demis’s footsteps. My third and final comment is that I advocate what Wired Magazine calls “AI supply chain transparency” to build public trust. It is not a panacea, but it’s a place to start.
Post Script: A Cheers to Google.
To quote Isaac Newton – “If I have seen further [than others], it is by standing on the shoulders of giants.” The “T” in ChatGPT stands for Transformers, which was invented by Google Research in their landmark 2017 paper, Attention Is All You Need. It is a landmark AI paper, and one of the most influential AI discoveries in the history of the field. Large Langauge Models are possible in large part due to their work. So, cheers!