What Will Be the Lasting Value of LLMs Beyond the AI Hype Cycle?

This article originally appeared in Intelligent CIO. Republished with permission.

An essential characteristic of wisdom is the alignment between three worlds: the world that we think that exists, the world that really exists and the world that we would like to exist. The greater the alignment the more likely it is that we will make good decisions.

It’s been more than two years since the introduction of ChatGPT and the unprecedented mix of confusion, inflated expectations, and exaggerated valuations associated with the “AI” hype cycle. As with most innovations, that hype cycle now begins to subside. However, large consulting companies and a variety of software vendors, who aim at maximizing the value that they can receive from all the commotion, have no interest in letting it subside. They work to keep the hype going rather than helping the industry align on the world that really exists and ground itself as to what is possible. We have perhaps lived through a historical case study of the result on markets of extreme hype on a subject most people know so little about.

“AI” is in some ways a catch-all label often referenced as the panacea to all problems. Some refer to it as a revolution that will impact the world like the invention of the steam engine, the generation and efficient transmission of electricity, the invention of the computer or other major paradigm shifts in our global economy.

Despite the chaos of the moment, there is much to celebrate. I too celebrate the progress made and it is significant. However, it is crucial to clarify and align the three worlds so we can act based on the world that really exists as we strive for the world we would like to exist. Decisions based on a lack of clarity and understanding of AI can have large and lasting economic and human consequences.

Artificial intelligence is a (very broad) field of study that has been around since at least 1956 and it makes absolutely no sense to attribute to it anything different than the benefits of other major fields of study.

Computational systems and techniques that stem from it have been contributing to the progress of many business and scientific areas. In the past decade, the development of deep neural network architectures (one of many approaches in machine learning) have achieved extraordinary results in areas such as natural language processing (NLP), computer vision, speech recognition and synthesis, and scientific computing. Systems based on these so called “deep learning” architectures dominate the news and there is little, if any, debate that these architectures produce state-of-the-art (SOTA) results for specific tasks, in their respective areas. That last part is important to understand. The software system that is SOTA in one area is not the same system that produces SOTA results in another area. Moreover, although multimodal models are on the rise, they are not equally adept to all tasks, and it is unlikely that they will be capable of replicating the success of their special, purpose-built ‘cousins’.

In the last couple of years, AI hype has largely been related to a specific area of deep learning – the so-called GenAI. In the context of NLP, the main idea behind these systems is that tasks such as answers to questions, templates for writing, and summarizations of text can be created by generating content through sequences of tokens, one sequence at a time (the ‘sequence’ here can be interpreted as a sequence of tokens ranging from 1 to N tokens and a token is typically a fragment of one or more words).

Historical progress in NLP evolved from structural to symbolic, to statistical, to (neural network) pre-trained language models (PLMs) and lastly to LLMs – lately we have seen techniques for the distillation of LLMs and the generation of small language models but I don’t want to digress.

Language modeling before the era of deep learning focused on training task-specific models through supervision whereas PLMs are trained through self-supervision with the aim of learning representations that are common across different NLP tasks. As the size of PLMs increased, so did their performance on tasks. That led to LLMs that significantly increased the number of their model parameters and the size of their training dataset.

GPT-3 was the first model to achieve, purely via text interaction with the model, “strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.”

Today’s LLMs accurately respond to task queries when prompted with task descriptions and examples. However, pre-trained LLMs fail to follow user intent and perform worse in zero-shot settings than in few-shot. Fine-tuning is known to enhance generalization to unseen tasks, improving zero-shot performance significantly. Other improvements relate to either task-specific training or better prompting.

The abilities of LLMs to solve diverse tasks with human-level performance come at the cost of slow training and inference, extensive hardware requirements and higher running costs. Such constraints are hard to accept and that led to better architectures and training strategies. Parameter efficient tuning, pruning, quantization, knowledge distillation and context length interpolation are some of the methods widely studied for efficient LLM utilization.

The strengths and, more importantly, the weaknesses of these language models are not yet fully explored. Industry benchmarks are crucially important as we navigate to the world that really exists.

These benchmarks are important because (1) they ground us in terms of what is feasible, and (2) they point out areas where the model either requires improvement or is not most effective or efficient.

The lasting value of LLMs will be the ability to create computer interfaces based on natural language. Humans who previously were unable to perform certain tasks because they were lacking domain-specific knowledge or technical skills will be able to invoke actions that cross several computational systems and obtain information which they previously could not access. LLMs will retain desirable characteristics in terms of natural language processing while their size will continue to shrink, ultimately finding their way onto every device.

Beyond execution of NLP tasks, primarily as the Input/Output interface of natural language for computing systems, LLMs will be integrated with other software components to which they will delegate actions that the LLM would execute. These delegations will be made to more specialized software that can execute tasks (other than NLP) more efficiently and effectively. At a high level, that could be achieved by augmenting the design of applications that are LLM-centric today through extensions that enhance comprehension.

As AI hype subsides, innovation continues and the winners in this next phase of innovation will make decisions based on the world that really exists. That is a world where we recognize that LLMs are not the way to “general artificial intelligence” but instead are powerful foundations for computer interfaces based on natural language – that is going to be the lasting value of LLMs.