Generative AI

What is generative AI? How does it work? What are the implications?

ChatGPT, Bard, and other large language models (LLM) have proliferated at an unprecedented rate compared with the roll outs of social media, or even internet search. Within just two months, ChatGPT grew from 0 to over 100 million users and over 1.8 billion visits per month. By comparison, TikTok reached 100 million users after ~9 months, and Instagram reached that milestone after about 2.5 years. Why has ChatGPT been so much more successful in attracting users? Before answering this question we first answer another, more fundamental question - what, exactly, is ChatGPT?

ChatGPT is a form of language model that is structured in such a way to make accurate associations between words so as to be able to detect not just patterns, but patterns that coincide with the meaning of language. The models themselves are built using layers - different sections that perform specific functions. For example, many models have an encoding layer, which converts words into a number representation (specifically, a vector). Once text has been encoded it is often passed to an embedding layer, where the information is transformed into a "condensed" representation that typically contains some context. For example, if the encoded input has five (5) components, then after embedding, the output might be two (2) components. The representation contains information about the syntax and semantics given the words that come before and after whatever word or phrase is the point of focus. This process is capable of representing meaning and context. Another layer, called an attention layer, helps the model focus on specific parts of text that is most relevant given the context. Identifying the relevance is similar to getting the gist of a query and enables a more accurate response. Yet another layer, called a feedforward layer, applies a mathematical function to the representations created by the embedding layer, which enables learning of additional context, or 'meta-context', also referred to as high level abstractions. And finally, recurrent layers enable the model to learn dependencies in text. These models are forms of deep-learning models, which are so named because the motivation behind them is the human brain. But while these models are able to process an incredible amount of information and recognize patterns in some domains as well or even better than humans, the architecture doesn't quite line up with how our brains work. Surprisingly to many, these types of models were not invented by OpenAI, but OpenAI's chatGPT is the first such model to have commercial successful and swift adoption.

ChatGPT and other chatbots are specific kinds of models called transformers. These models generate one word at a time, in a probabilistic manner, based on the prior word. For example let's examine the sentence: "Upon seeing his mother for the first time in years, Brady ran into her arms and gave her a huge _____". The final word could kiss, hug, smile and so forth. The model can quickly examine this pattern in its vast data it has been trained on and return the highest probability word, in this case, hug. How do these models do this? Within the model are lots and components, which are often compared to members of a massive orchestra (see Calin Cretu's highly accessible explanation here: An ML Engineer Explains ChatGPT). Each musician can be thought of as a "node" and contributes a "weight" that can either increase or decrease the strength of a signal. As an orchestra plays for the first time together, the results may not be well, music to our ears, but as each "node" learns and adjusts the orchestra turns into an incredibly well coordinated, well oiled machine that can produce wonderful symphonies. Likewise, LLMs require a lot of training (or learning) before they become accurate. As Calin notes in her article, transformers can have many, many layers, that can "be stacked on top of each other, run in parallel, merged, and so on" to be able to generate the most accurate output possible. The training process for many LLMs typically involves guesses by the model, which are compared with correct responses (supervised learning) so that the model can be "rewarded" for a correct response and not rewarded for an incorrect response. If the response was incorrect, various nodes will adjust their weights until the correct answer is provided.

From Input to Output

The specific transformer architecture used by GPT is complicated and requires some knowledge of model architecture, but can be found in the original papers published and the OpenAI blog. We'll break down the process step by step, and use the original GPT for simplicity. First, a word is encoded. Encoding simply means giving a word a value (a number) that corresponds to the word. For example, we can assign "the" to 1; "it" to 2; and so forth. The original GPT had encodings for 50,257 words. GPT-3 takes things one step further and encodes sets of characters (as opposed to full words). This process is also sometimes called tokenization - all text gets transformed into tokens, which can be full or parts of words. If we examine one word at a time and view its associated vector, we'd see a vector with thousands of zeroes with a 1 somewhere. The embedding process takes that giant vector (which is mostly filled with zeroes), and represents it with far fewer numbers. How many numbers and how do we determine how many? Each number can be conceived as a dimension, or property of a word (these might correspond to concepts we recognize, like "positive", or more abstract concepts that we would not recognize). A model can have any number of dimensions; the original GPT had 12,288 dimensions. In the model, the embedding network has weights, which are multiplied with the encoding to produce a set of numbers called an embedding matrix. Note that in language, order matters, so the model also encodes the sequence.

LLMs are not exactly new; these models have been around for several years, and until recently, have had success in a limited set of domains. What is different this time? Why only now are we seeing a phase transition in AI? While OpenAI does have its own proprietary model (as does Google, Facebook and other companies), the major difference between the current attempts and past models is 1) the amount of data that the model has "consumed", or trained on; that is, the model is fed just about all the catalogued information available on the internet to refine all of its parameters (values that impact how the information is represented and transformed as it is passed from layer to layer in the model). And 2) the number of parameters used in training is massively larger than any prior model... 175 billion or so for GPT-3 (GPT-4 is rumored to have trillions, up to 170 trillion!), compared with its predecessors, which trained on about hundreds of millions or a few billion parameters. The breakthroughs appear to occcur somewhere between 50-100 billion parameters, so long as enough training data has been fed into the model. These two factors are both very important for the performance of LLMs. A model with 70 billion parameters can outperform a model with 175 billion parameters if the former has been trained on significantly more data.

The result is that ChatGPT can be thought of as a reflection of the language created (mostly) by humans that resides on the catalogued internet. It is capable of what is has seen - it can mimic the words that indicate emotion, including anxiety, stress, even elation, or worry because it has seen countless examples of expression across an incredible array of circumstances and voices. Humans function through language, so providing a sophisticated entity like an LLM with a nearly endless catalog of language is akin to giving it the keys of human essence. It should not be surprising, then, given enough training and complexity, the model can produce language that is, for the most part, indistinguishable from human generated content. And further, since the model is trained in our likeness, many people assign sentience to various models (however, for many reasons, any form of potential sentience 1) is unlikely; and 2) would be fundementally different than human sentience, making it both difficult to define and even harder to recognize with confidence).

ChatGPT and other chatbots also now serve as a different kind of conduit to information on the internet. While most of us still use Google's search engine, a growing number of people are beginning to default towards Bing, ChatGPT and other chatbots as the sole source of information for many searches. Generative AI programs have a distinct advantage in that they can replace most of the search process. While both chatbots and Google search require input from the user, the ouput generated is fundamentally different. Google simply requires more effort by the user. In addition to generating a query the user is faced with the task of assessing and selecting from among the results. This process requires (albeit even if lazy) evaluation of the source and form of content, to ensure that the results provide a response expected by the user. In contrast, chatbots completely do away with evaluating and vetting of sources and results and provide a single response that presumably represents the best answer to a query.

Google, Generative AI and Psychology

The point at which the process of searching, assessing, and obtaining information diverges vis a vis Google or chatbots, is the point at which psychologists stick a proverbial pin. Limited work, including that done by researchers at CBMS, has identified the patterns of behavior involved in Google and "menu" style searching, beginning with a query (Google style), or, selection from among prepopulated queries (menu style). Prior beliefs about topic or person, which can be related to many factors, including membership of particular groups that bear on one's identity, and how a person or topic is treated broadly in the media, influence how search is performed. For example, if one is inclined to believe that genetically modified foods are harmful, search queries reflect that bias. Or, if one is inclined to believe that vaccines are dangerous, again, search queries are likely to reflect that bias. Is the case the same for ChatGPT and other chatbots? To get at this question we need to understand how Google search and ChatGPT (and other chatbots) differ.

Google is essentially a giant library with its own categorization system. So long as Google has catalogued a website, that website is a candidate to turn up in search results. However, unlike a real library (in the historical sense), the results do not come in a simple order, such as alphabetical, for example. Google's results appear after an opaque process that isn't fully transparent to users, or for that matter, non-Google developers. The results are sorted and displayed as a function of "relevance.” Several factors are considered and contribute to Google's concept of relevance, and in many instances the idea works very well and is not controversial. If we are traveling and wish to search for nearby restaurants, Google will attempt to find our location, restaurants within a reasonable distance from that location and then return those results. In this instance, relevance is easily defined. But what about the case in which the search isn't as black and white? Google attempts to understand the intent of the searcher based on the keyword search and similar searches by other users to sift through billions of webpages to return results that reflect the intent. At this point the user views the search results, which can be any number of pages in length. The user is then in control to select among the displayed choices, or attempt another search. Here, Google hands over responsibility to the user to take the next step - content selection and consumption. Depending on the nature of the query, the user may see highly similar information among the results, or, in some cases, potentially conflicting information from different pages. Note, however, that simply by providing a set of results in a particular order often has consequences, which are covered in our How Search Works article.

Chatbots remove some of the steps required in traditional search. In fact, chatbots become our proxies to conduct the entire search process post query. When we do this we give up some of the control that we can have (but do not always effectively exercise) using Google-style search. We put all of our eggs in the proverbial basket; that is, instead of receiving what amounts to a menu in Google search, we rely on the chatbot to selectively filter and sift through content to produce a singular voice text response. Like Google, the AI has decided what is relevant, but the information we are exposed to is drastically different. Google hands us the original voices from existing webpages, sorted in the order that the algorithm believes will be most useful for us. Chatbots summarize any number of sources into a conversational answer, even making affirmative or negative claims, in its own voice. This is perhaps the definitive difference between these platforms: Google, while centralized in the same sense that a particular library could be considered centralized, offers access to many different sources, at times with differing perspectives and viewpoints (however, Google's apparent strength in the regard is also its weakness; see our piece on How Search Works for more). In this sense, information sources are highly decentralized, whereby Google controls which pages are featured and more likely to receive attention (outcomes that can be manipulated by SEO, search engine optimization). In contrast, chatbots simply provide one answer, swinging the pendulum firmly back into the centralization camp. Thus, with generative AI, we are explicitly relying on the model to provide the one best answer to our query. Sometimes we are provided with source information, sometimes not. Nonetheless, these AI systems are likely to have profound implications for how we search for and interact with information, both of which affect important psychological functions and outcomes.

Similar to the introduction of Google and the plethora of other digital applications and platforms that became to dominate daily life, the U.S. and most other countries have no standard training or guidelines for use. While Google and OpenAI provide descriptions of their technologies, most users are far more concerned with convenience and simply using these platforms than understanding the benefits and risks, including the assumptions being made as answers and output is accepted, used, or committed to memory. Accepting assumptions, which in the case of Google and generative AI are many, is a cognitive decision, even if we are not consciously aware of participating in the decision process. Our brains make prolific use of heuristics, or shortcuts, to help us navigate the world, including both our physical and non-physical environments, such as our information environments. Among the major assumptions that most of us accept is that generative AI is expert, that we should accept the output at face value and place it on the top of the ladder as a source of authority on most any matter. We then take the output as authoritative, and often commit it to memory as if we have read it from original source material.

The issue, however, is not that we are making implicit assumptions. The use of heuristics can be very efficient, saving us time, and for many endeavors, are on the right track and help us achieve our goals. The issue is that we often do not realize we are deploying them; we are not even aware of the assumptions we are making as we navigate various platforms. We simply take for granted important considerations, such as reliability, accuracy, etc. and that the models are in fact providing a comprehensive representation of the best available information. In other words, we are quick to adopt new technologies and platforms without a shred of critical thinking regarding how they work, how responses are generated, how our input relates to the output and the related implications, among other things. Chatbots can provide quick and often generally realiable information, but they do provide and cannot replace original information.

Another overlooked aspect of using generative AI is related to the centralized aspect of these platforms. If we begin to rely solely on chatbots for our information queries, we will reduce the diversity of presentation in our information diets. Further, we reduce our capacity to discover the individual voices that make up most of the information floating on the web. In contrast, traditional web search provides an opportunity to discover authors and experts that differ in style and manner of communication of information. In this sense, traditional search contains more of a social component with potential for either social or parasocial connection with real human beings. Chatbots, at least in the current instantiation, do not provide this social possibiilty to the same extent, or at all, with real humans. Instead, chatbots offer what has been called by many, synthetic relationships (Synthetic humanity), which are often defined as being exlusively parasocial. Parasocial relationships are one-way (people may feel a connection to say, an influencer on social media, but the influencer rarely if ever actually develops a real, meaningful two-way relationship with their follower) in the sense that chatbots such as ChatGPT do not have explicit goals, emotions, feelings and so forth that are ruminated on a regular basis, that might prompt it to seek out interaction. Chatbots can, however, mimic human interests and emotions to such an extent that people will perceive the chatbot as a bonafide companion. Since we have no precedent with which to draw upon, the psychosocial consequences of the proliferation of syntetic relationships is highly uncertain.

The implications of moving back towards a centralized information ecosystem (generative AI in the form of chatGPT and others), after some two decades of a shift towards a decentralized and highly fragmented system (search engines), have not been examined in detail. However, to the extent more citizens begin using chatbots that provide consistent information, it may be that people increasingly have access to and draw from a more similar information base. Among the benefits of doing so include, perhaps, increased agreement on facts and a more sturdy foundation from which trust can be rebuilt. Trust, not just between citizens and the government, but between citizens themselves, has eroded since the 1960s (Trust, a history). The fragmented information ecosystem coupled with the dominant economic model that emerged as the web matured incentivized a stoking of division which further eroded trust in recent decades. Distrust amongst citizens has been linked to less willingness to engage in community and social activities, which can result in fewer and less meaningful social connections. Social connections and support are vital for our sense of well-being, physical and mental health. Could finding common ground via a shift in our information ecosystem recalibrate the nature of our interactions? And could increasingly starting our interactions from more common ground rehabilitate our sense of shared reality? If trust can be reestablished between citizens at a higher rate, the declines in meaningful social engagement may reversed. The results research targeting these questions may have far reaching implications for repairing our increasingly fractured, distrusting society.

We believe that education efforts that explain these technologies, including how they work, how people use them, how to compare and test the information to source information, among other aspects, should be mandated, especially so for children and adolescents. Younger people are spilling themselves out onto every digital street, safe and unsafe, with very little training or guidance regarding the benefits or risks, or understanding regarding how interacting with technologies and platforms may affect them.

Previous
Previous

The Social Self’s Impact on Mental Wellbeing

Next
Next

Longitudinal fMRI scans of high vs low technology users