There are deep concerns about the safety of the very recent developments in Large Language Models like ChatGPT. These can be fall into three categories:
- Hallucinations; Generation of output that is false is not uncommon.
- Deceptions; The training material could easily contain false or undesirable information.
- Evolution; The threat of an unpredictable emergence of dangerous capabilities.
LLMs are based on neural networks, algorithms whose name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.
In the case of language models, the meaning of words and the rules of grammar are defined by context, so an individual word is given meaning only by how it relates to other words. All the words give each other meaning.
Something like this happens in computer languages, as I discovered when, in the 60s, thinking that these new fangled computers might be useful, I enrolled in a WEA class in Fortran. We were introduced to a small number of words and punctuation being used with specific meanings, like Statement, DO, IF, GO TO, SUBROUTINE, END, ( , ) together with punched cards that formed a “deck” and the more familiar concepts of variables and arithmetic functions. These elements combined to form a program and the role of each element was defined by how it interacts with all the other elements. It wasn’t until the tutor had introduced all of them and shown how they fitted together did it all come together and make sense.
Most children learn to read at age of 6 to 7 so it looks like they have a usable language model which serves as a foundation to a process of developing greater abilities call “education”.
Education is all the bits and pieces one picks up in life, like rote learning and memories, but it also includes what seems to be the equivalent of neural net models, for me two examples are: looking at how geology affects landscape and how technology and culture affects the built environment. In a process described as “getting your eye in” one begins to see how the lie of the land is influenced by geological stratification and rock types, and how the style and age of housing is influenced by the movement of people and growth of industry.
The inner workings of neural nets is complex and obscure but the end result is a process that mimics what seems to happen in the brain, which is also complex and obscure.
The educated person ends up having many Models which can be invoked when appropriate.
In contrast to this LLMs have just the one Large Model and the many Models of the educated person get bundled into the one. No wonder it can get confused!
The proposed alternative is to do the opposite of this and to take a relatively small training text and whittle down the size of the model by systematically reducing the amount of work that has to be done by the training process.
To start with we would take a body of text which has a consistent, high quality style, preferably out of copyright, for example the novels of Charles Dickens (17 books, about 3.8 million words, Walter Scott (27 books) or Jane Austin (6 books 0.7 million words).
Each word is tagged with data including part of speech and anything else useful that can be worked out by an algorithm.
The vocabulary would be analysed and less frequently used words replaced by placeholders that indicate the part of speech etc. The words that are left is the basic vocabulary
The resulting text is then used to train the language model and defines the meaning of the words in the basic vocabulary.
In use the meaning of words not in the basic vocabulary will be introduced by means of “Cyclopaedias” (as in encyclopedia but limited in range) which define the word using only the words in the basic vocabulary or that have been defined by another Cyclopaedia.
The size of a language model is likely to be a strong function of vocabulary size (exponential or factorial?) so this alternative could be much smaller.
The exciting thing about Chat GPT is that you can talk to it, and you get replies like you would from a live person. Life is complicated, and most people struggle most of the time. Sometimes it can be a time-consuming battle to do something on line and a relief to get help from a real person. A reliable person-like Chat is the prize, but are Large Language Models, with questions about security, safety and reliability the way to go?
We are looking at something like a personal assistant with game changing abilities. What do we want to use it for? The key issue is trust.