﻿ A Statistical Estimation of the No. of Headwords in a Dictionary
 Science Experiments For science fair projects, lesson plans, classroom activities and research projects
• Repeat Famous Experiments and Inventions
• The Scientific Method - How to Experiment
• The Display Board

• 1. Statistical estimation of the No. of headwords in a dictionary
2. Statistical estimation of the size of your vocabulary
3. The growth of the English language vocabulary over the years

 Experiments Home Mathematics Dictionary Statistics Mathematics Science Fair Projects Home Statistics & Probability Geometry & Trigo Applied Mathematics Number Theory Learning & Cognition Miscellany

## Dictionary Vocabulary Statistics

 My collection of the Oxford Advanced Learner's Dictionary

In order to estimate the No. of entries / headwords in a dictionary, first of all we have to sample the dictionary. The best and simplest way is the systematic sampling method which basically is about selecting sampling elements at regular intervals beginning at a random start.

In our case, we choose randomly a dictionary page (not the first and not the last since those pages in most cases are not typical) and from here on we sampled the dictionary at equal intervals.

Say, our page randomly chosen for the start page is 79. Now we have to choose an interval, say 100. The meaning is that we are going to count the dictionary headwords on pages 79, 179, 279, 379, 479...to the end. Smaller intervals mean more precise results.

If a medium sized dictionary contains 1000 pages or more an interval of 100 could be enough since the meaning is that it is sampled at least 10 times. For better results it’s possible to reduce the intervals to 50, less is impractical since it entails too much work and the meaning is that we are missing the advantages of statistics.

After we tabulate the sampling results, we calculate the average number of headwords per page and multiply it by the total No. of the pages of the dictionary. By that we get a good estimation of the No. of headwords the dictionary contains.

To improve the results it is recommended to omit from the calculation of the average No. of headwords per page the extremes results, the highest and the lowest.

For example, if the average No. of headwords per page is 19 and the dictionary contains 1000 pages than the estimation is that the dictionary contains 19,000 headwords.

In the same way you can estimate your vocabulary size. After you have calculated the vocabulary size of the dictionary, you use again the systematic sampling method by choosing a random page number from the dictionary for the start point and a proper interval to repeat. Then you check, for example, if you know the meaning of the word located first on the sampled pages.

For example, if you have sampled 36 pages and you know the meaning of 12 words located first on those pages then the meaning is that your vocabulary size is 12/36 (one third) of the dictionary's vocabulary. If the number of headwords in the dictionary is 30,000 then your vocabulary size is about 10,000.

It is possible to compare your vocabulary size estimation by using different dictionaries, but beware not to chose a too small dictionary that does not contain all or most of your vocabulary.

For our experiments we chose the Oxford Advanced Learner's Dictionary since it’s large enough for statistical sampling and it’s relevant for the estimation of the size of English vocabulary students possess.

We tried our statistical method on all editions of the Oxford Advanced Learner's Dictionary, the results are as follow:

 Edition Headwords per Page Dictionary Pages Dictionary No. of Headwords 0th (1941)* 16.3 1283 20,849 1th (1948) 14.9 1517 22,664 2nd (1963) 19.8 1170 23,189 3rd (1974) 21.7 1021 22,155 4th (1989) 15.6 1492 23,339 5th (1995) 17.7 1392 24,624 6th (2000) 22.2 1508 33,477 7th (2005) 24.2 1780 43,033 8th (2010) 25.6 1796 45,956

The complete calculations could be seen here (xls)

The immidiate conclusion is that till the 5th edition (1995) there was a moderate growth in the number of entries / headwords. Only after that, there is an exponential increase. Is this meaning that after 1995 the English language developed faster than before? Maybe, but before coming to this conclusion is advisable to check the possibility that sub-headwords before 1995 were upgraded to full headwords in the dictionary editions after 1995.

*The Oxford Advanced Learner's Dictionary of Current English, started life as the Idiomatic and Syntactic Dictionary, published in Japan in 1942 by Albert Sydney Hornby who was a teacher of English studies at a small college in Japan. Then it came under the wing of the OUP (Oxford University Press), which decided it would be the perfect counterpart for the prestigious OED (Oxford English Dictionary) because it explained spelling, grammar, phonetics, and usage much more extensively than would a dictionary for native English speakers.