What are language models ?

Jayanti prasad Ph.D
3 min readMar 3, 2024

--

I assume that you are familiar with Large Language Models (LLM) and if you follow academicians, researchers and content provided by academic institutions you may not find much new in this article. However, there is a very high probability that you got interested in LLM because of the content provided by public/social media and big tech giants in that case this article may be useful to you. Look at the following articles :

The above three examples are good enough to make case — These giants have no interest in teaching you fundamentals, mainly they do not want their content have a barrier of technical background. If you really want to know & understand what language models, in particular large language models, you must know basics of probability & statistics. If you are not familiar with conditional probability and Bayesian you may need to check that first here :

Language Models

In general text in a language is made of symbols, which could be letters (for example in english) but the letters do not carry any meaning. The smallest meaning carrying entities in a language are words and text is made of sequence of words, which could be organised in the form of sentences, paragraphs, pages, chapters etc.

In any language not all possible sequences of words are allowed or make ant sentence for example “bananas is card made paint red” is a meaningless sentences. Sentences must be formed according to the grammar of the language. Apart from this, we can notice the following two patterns:

  • In any sentence there is some correlation between the words being used. For example, it is more probable to have ‘sky’ after ‘blue’ than ‘cat’ and we can discover / learn all such patterns from the historical data or ‘corpus’.
  • There may be strong correlations between a set of sequences (made of words). For example, “What is the capital of India” is strongly correlated with “The capital of India is New Delhi” than the “Hight of Mount Everest is 8,848 meters”. There are many different ways by which sequences can be correlated with each other — questions/answer is one and translation is another.

The above two points are very important to understand what is language Model is. If you do not know what a model is then you can follow this simple definition. A model is a mathematical / statistical frameworks that how any observational/experimental data is generated. A model can help us to find out how much fuel we need to drive our car 100 kilometre or can can guide us how we should divide our money for the purpose of investment etc.

Now let us come to the Language Models. A language model must give us the probability of a sequence of words, for example, it must give the probability of the sentence ‘the sky is blue’ something like 78% and “a sharp banana flying” 3.2 % ! (do not get series these numbers are just indicative ).

If we can write programs which can scan millions of documents then it is possible to build models which can tell us about the probabilities of the sequence.

A language model ‘M’ is a statistical / probabilistic / machine learning model that gives us the joint probability P(w1,w2,w3,…,wn) of a sequence of words [w1,w2,w3,,,,wn].

Note that there are many ways to build such models and we can generalize these models in many different ways.

For example, if we have two sequences [x1,x2,x3,,,,xn] and [y1,y2,y3,…,ym] then a language model can be used to find out the conditional probability P(y1,y2,y2,,,,ym|x1,x2,x3,..,xn). This is exactly the concept that is used in machine translation.

Note that the sequence I have discussed above need not to be sentences always — they can time series, DNA sequence or anything.

Large Language Models

Large Language Models or LLM are just Language Models and are trained on huge volume of data so learn patterns which are highly robus and generalisable. If you want to know more you can follow the links given below:

Please link, comment & share if you find the article useful and if you want to know more update about the issue you can follow me.

--

--

Jayanti prasad Ph.D
Jayanti prasad Ph.D

Written by Jayanti prasad Ph.D

Physicist, Data Scientist and Blogger.

No responses yet