The recent spate of announcements by tech titans such as Microsoft, Google, Apple, OpenAI, NVIDIA, et al, has started a serious buzz among technology gurus and business leaders. This buzz is a continuation of the overarching headlines emanating out of Davos 2024, the consensus there that AI and Generative AI (this was specifically mentioned) as the means to, firstly, transform society and, secondly, to achieve greater revenues. While computer science graduates are revelling in the availability of new AI technologies, most of us are not sure what the buzz is about. Sure, we are all using ChatGPT, but how is this going to transform our lives? This article attempts to unpack the technologies associated with AI, especially that of Generative AI that is at the heart of the buzz. In Part I, the technical complexities of Gen AI is unpacked; in Part II, the business use cases of Generative AI is discussed.
What is Generative AI? We’ve all heard of AI, but Generative AI? Is this something else?
To answer this, we need to go one step back and properly understand Artificial Intelligence (AI). Broadly speaking AI can be equated to a discipline. Think of science as a discipline; within science we get chemistry, physics, microbiology, etc; in the same way AI is a broad discipline, and within AI there are several subsets such as ML (Machine Learning), algorithms to perform specific tasks, Expert Systems (mimicking human expertise in specific topics to support decision making), Generative AI, etc.
In recent times the last named, ‘Generative AI’ (or Gen AI), has been making huge waves, especially from December 2022. On 30 November 2022 a startup outfit, OpenAI, announced the public release of Chat GPT. And since then Generative AI has become a rage. To put this into perspective, Google Translate took 78 months to reach 100 million users; Instagram took 20 months, TikTok took 9 months. Chat GPT took 2 months to reach 100 million users! Generative AI is a big deal, folks. It may be prudent, at this stage, to briefly define the term Generative AI: this refers to a type of Artificial Intelligence that generates new or original content in the form of text, images, language translation, audio speech, music, programming code, etc. it’s still early days of Gen AI, at present most Gen AI models are centred around the outputs named above (text, images, language translation); however the range of outputs could be endless, perhaps it could include urban planning, special therapies, virtual church sermons, esoteric sciences, etc; it will no doubt grow to eventually cover almost every aspect of human endeavour. To the question ‘is Generative AI different from AI’, the answer is that Generative AI is a manifested form of AI, or a subset of AI, or an avatar of AI, just as chemistry is a subset of science. The general term used to describe an AI system is ‘MODEL’; Chat GPT can be called a Model.
The word ‘Chat’ in Chat GPT means just that, a conversation - either a voice or text (or combination) conversation between the user and Chat GPT. It's useful to unpack ‘GPT’; therein, in fact, lies the technical understanding of AI and Generative AI. G stands for Generative which has already been explained (generation of original or new content); P stands for Pre-trained. This needs to be understood as it’s one of the core concepts of AI. Since a machine cannot think intuitively, it can, in the AI world, be ‘trained’ to ‘think’ in a particular way on a particular subject eg, it can be trained to translate between, say, German, English, French, Chinese and Zulu – from any one of the 5 to another – a translation model. Such a Gen AI model cannot tell you how fast a Ferrari can go, but it can tell you that ‘Ferrari’ comes from the Italian word’ ferraro’, which means ‘blacksmith’ in English. This is based on ‘training’ the tool on large sets of data, using Deep Learning technologies. In order for the app to tell you, for example, that the output is ‘he put his head on the pillow and slept’ it needs to know from its data sets about gender (‘he’), pillows, and its association with sleep (this is referred to as ‘context’). Part of the pre-training involves the sequence of the words in context to man, pillow and sleep. The Developer keeps ‘training’ the model until it is able to spit out ‘he put his head on a pillow and slept’. From this knowledge of many such items, in context, it predicts the word that follows the preceding word. During the process of learning, it isn’t inconceivable that it could have outputted “the pillow is a tasty rice dish” – this is called ‘hallucination’ – yup, machines hallucinate without taking drugs, folks.
The key here is that the model has to be trained on, firstly, vast amounts of data, and, secondly, with meticulous attention. And this leads us to another common phrase or jargon used in the AI world – Large Language Models or LLMs. In fact, Chat GPT is a Large Language Model! If we have to define LLM, it could be defined as a next word prediction tool. From where do the developers of LLMs get data to carry out the Pre-training? They download an entire corpus of data mainly from websites such as Wikipedia, Quora, public social media, Github, Reddit, etc. it is moot to mention here that it cost OpenAI $1b (yup, one billion USD) to create and train Chat GPT – they were funded by Elon Musk, Microsoft, etc. Perhaps, that is why it not an open-source model!!
Let’s now unpack the ‘T’ of ‘GPT’. This refers to Transformer. This is the ‘brain’ of Gen AI; Transformers may be defined as machine learning models; it is a neural network that contains 2 important components: an Encoder and a Decoder.
Here’s a simple question that could be posted to ChatGPT: “What is a ciabatta loaf?”. Upon typing the question in ChatGPT, the question goes into the Transformer’s Encoder. The 2 operative words in the question are ‘ciabatta’ and ‘loaf’. The word ‘Ciabatta’ has 2 possible contexts – footwear and Italian sour dough bread (Ciabatta means slippers; since the bread is shaped like a slipper, it is called ‘ciabatta’).
The context in this question would be provided by the term ‘loaf’ which refers to a food item – such as: a loaf of bread, or a meat loaf. ChatGPT is a Pre-Trained model; it will therefore select food item instead of footwear given the context of ‘loaf’ in the question, and then further finds that bread (loaf) is the context to be chosen instead of meat loaf – ciabatta bread or loaf is a known expression. It will continue to run words sequentially (this happens in parallel with all the words) and is able to predict that ciabatta is a bread – and continuous sequencing is likely to spit out something to the effect that “Ciabatta is Italian sour dough bread”. It has to be understood that the answer in ChatGPT may not always be correct, as it is dependent on the quality of training and finetuning it has underwent. In most of the answers, though, the outputs are stunningly correct – a testament to the meticulous way it has been developed; something the industry refers to as ‘attention’.
Did you know that Gen AI has been in use well before the advent of ChatGPT? In 2006 Google Translate was the first Gen AI tool available to the public; If you fed in, for example, “Directeur des Ventes” and asked Google Translate to translate the French into English, it would return “Sales Manager”. (By the way, Transformers was first used by Google). And then in 2011 we were mesmerised by SIRI which was such a popular ‘toy’ initially among iPhone users. Amazon’s Alexa followed, together with chatbots and virtual assistants that became a ubiquitous feature of our lives – these are all GenAI models. As can be seen, we’ve been using Gen AI for a while, however no one told us that these ‘things’ were Generative AI models!
Unpacking GPT
The term “Chat” in ChatGPT signifies a conversation, whether through text or voice,between the user and the system. “GPT” stands for Generative Pre-trainedTransformer. “Generative” refers to the AI’s ability to create original content, while“Pre-trained” highlights a core concept in AI where models are trained on vastdatasets to perform specific tasks, like translation between languages. For instance,a translation model can’t provide insights like a Ferrari’s speed, but it can explainlinguistic origins, such as Ferrari deriving from the Italian word for “blacksmith”. Thiscapability is honed through deep learning, where the model learns associations and context from extensive data. The training process involves predicting the next wordin a sequence based on prior words, which can sometimes lead to errors like“hallucinations” – unexpected outputs such as “the pillow is a tasty rice dish”. Thisdemonstrates how AI learns and operates within defined parameters without humanintuition.
The key here is that the model has to be trained on, firstly, vast amounts of data,and, secondly, with meticulous attention. And this leads us to another commonphrase or jargon used in the AI world – Large Language Models or LLMs. In fact, ChatGPT is a Large Language Model! If we have to define LLM, it could be defined as anext word prediction tool. From where do the developers of LLMs get data to carryout the Pre-training? They download an entire corpus of data mainly from websitessuch as Wikipedia, Quora, public social media, Github, Reddit, etc. it is moot tomention here that it cost OpenAI $1b (yup, one billion USD) to create and train ChatGPT – they were funded by Elon Musk, Microsoft, etc. Perhaps, that is why it not anopen-source model!!
Let’s now unpack the ‘T’ of ‘GPT’. This refers to Transformer. This is the ‘brain’ ofGen AI; Transformers may be defined as machine learning models; it is a neuralnetwork that contains 2 important components: an Encoder and a Decoder. Here’s asimple question that could be posted to ChatGPT: “What is a ciabatta loaf?”. Upontyping the question in ChatGPT, the question goes into the Transformer’s Encoder.The 2 operative words in the question are ‘ciabatta’ and ‘loaf’. The word ‘Ciabatta’has 2 possible contexts – footwear and Italian sour dough bread (Ciabatta meansslippers; since the bread is shaped like a slipper, it is called ‘ciabatta’).In the context of “loaf,” ChatGPT, a Pre-Trained model, would prioritize food itemsover other meanings. For instance, given “loaf,” it would likely choose “bread” over“footwear,” recognizing “ciabatta bread” as a specific example. The model processeswords sequentially and can predict associations like identifying ciabatta as an Italiansourdough bread. However, ChatGPT’s responses aren’t always flawless, as accuracydepends on its training and fine-tuning. Despite occasional errors, its answers areoften remarkably precise, reflecting meticulous development involving techniqueslike “attention,” which enhances its ability to focus on relevant details in dataprocessing.
Did you know that Gen AI has been in use well before the advent of ChatGPT? In2006 Google Translate was the first Gen AI tool available to the public; If you fed in,for example, “Directeur des Ventes” and asked Google Translate to translate theFrench into English, it would return “Sales Manager”. (By the way, Transformers wasfirst used by Google). And then in 2011 we were mesmerised by SIRI which was sucha popular ‘toy’ initially among iPhone users. Amazon’s Alexa followed, together withchatbots and virtual assistants that became a ubiquitous feature of our lives – theseare all GenAI models. As can be seen, we’ve been using Gen AI for a while, howeverno one told us that these ‘things’ were Generative AI models!