Difference between revisions of "Number of words in English"

From Teflpedia
(Solutions: link)
(See also)
 
(45 intermediate revisions by 4 users not shown)
Line 1: Line 1:
Although the claim is often made that "English has more words than any other language" it is not that easy to count the '''number of words in English.'''
+
Although the claim is often made that "[[English]] has more [[word]]s than any other [[language]]" it is not that easy to count the '''number of words in English''',<ref>[http://oxforddictionaries.com/words/the-oec-facts-about-the-language "The OEC: Facts about the language"] [[Oxford Dictionaries]]. Retrieved 30th September 2012.</ref> or, for that matter in any language.
  
== Counting "words" ==
+
Many of the problems identified below are about how we wish to define the "words" we are counting.  We can count based on semantics (meaning), based on orthography ([[spelling]]) or both.
  
Different studies use differing criteria when counting the number of "[[words]]", [[lexeme]]s or vocabulary items in a language. Estimates for English vary between 500,000 and 2 million words. A medium-sized [[dictionary]] may contain some 100,000 entries. ''The New Oxford Dictionary of English'', published in 1998, is the biggest single-volume dictionary and contains 350,000 words, of which 52,000 are scientific and technical words, although it avoids over-technical terminology. On the other hand, the 20-volume ''OED'', the definitive dictionary of the English language, contains over half a million lexemes.
+
Consequently, different studies use differing criteria when counting the number of "[[words]]", [[lexeme]]s or vocabulary items in a language. Depending on the criteria used, estimates for English may vary between 500,000 and 2 million words - or many more. We identify below some of the many criteria which one would need to consider.  
  
Would one count [[conjugation]]s or [[past participle]]s used as [[adjective]]s? Species names for flowers and insects which are common to all languages? Chemical names? With these you can dwarf the number of "normal" words in any language. And the 500,000 different names for fungi...
+
=== Grammatical changes===
  
Equally difficult is the question of whether a word is actually used - it may exist but be so obsolete that it isn't used any more. Do we count it or not? Do we count [[slang]]? Do we count regional words? Do we count a word if it is used in the UK but not in the US or in all international varieties of English (including Indian English, which has a large selection of words from native languages.)
+
Would [[conjugation]]s or [[past participle]]s used as [[adjective]]s be counted as separate words? In other words, the word "(to) close" is obviously a word.  Should the past tense "closed" be counted as a separate word? In some verbs the past participle is formed differently to the past tense.  Are past participles which differ from past tenses different words?  That is to say, does the set "speak, spoke, spoken" represent three words and the set "close, closed, closed" only represent two?
  
== Defining words ==
+
Does the word "closed" when it is used as adjective in "a closed door" count as a separate word?
  
A constant debate is whether concepts such as facts – the names of people or places and other proper names be considered as forming part of one’s personal [[lexicon]] when calculating its size. Undoubtedly, the name, or fact, ''Shakespeare'', is as much a part of the English language as the word ''literature'' or ''drama''. And the fact/word ''London'' is probably used more often than the word ''village'' or ''town''. Thus, given the overlapping of criteria, calculating the size of one’s own vocabulary is complicated and must vary according to many different factors.
+
===Scientific words===
  
Likewise, terms such as UNESCO and NATO, both well-known [[acronym]]s on an international level, must undeniably count as being part of an educated person’s vocabulary.
+
Should we count species names for flowers and insects and the 500,000 different names for fungi which are common to all languages? What about names for chemicals? How about medical names for diseases? With these you can dwarf the number of "normal" words in any language.
 +
 
 +
===Status of a word===
 +
 
 +
Equally difficult is the question of whether a word is actually used - it may exist but be so obsolete that it isn't used any more. Do we count it or not? Do we count [[slang]]? Do we count regional words? Do we count a word if it is used in the UK but not in the US or in all international varieties of English - including Indian English, which has a large selection of words from native languages?
 +
 
 +
===Acronyms===
 +
 
 +
There are a vast number of [[acronyms]] in the language some of which, such as UNESCO and NATO, are known internationally.  Others, such as [[TEFL]]  or [[CELTA]], are only used by small communities. How would one decide whether to include them or not?
 +
 
 +
===Various spellings ===
  
 
If a word has two spellings, does that count as one word or two? Or two past participles like "lighted" and "lit" or "dived" and "dove"? Does "dove" as a bird count as a separate word?
 
If a word has two spellings, does that count as one word or two? Or two past participles like "lighted" and "lit" or "dived" and "dove"? Does "dove" as a bird count as a separate word?
 +
 +
===Multiple meanings===
  
 
Furthermore, given that over eighty per cent of all words in English have more than one meaning – ''water'' as a verb and noun; ''lock'' as a verb and noun related to keys, or as a construction on a canal or river to regulate the ascent or descent of boats, or as a hold in wrestling or judo, or as in a lock of hair – should one count each meaning of the same word – the same combination of letters – as a different item? Surely if a person knows five meanings of the same word, he or she has a more extensive vocabulary than another person who knows only one meaning?
 
Furthermore, given that over eighty per cent of all words in English have more than one meaning – ''water'' as a verb and noun; ''lock'' as a verb and noun related to keys, or as a construction on a canal or river to regulate the ascent or descent of boats, or as a hold in wrestling or judo, or as in a lock of hair – should one count each meaning of the same word – the same combination of letters – as a different item? Surely if a person knows five meanings of the same word, he or she has a more extensive vocabulary than another person who knows only one meaning?
Line 21: Line 33:
 
=== Get and phrasal verbs ===
 
=== Get and phrasal verbs ===
  
Take one of the most frequently used verbs in English – ''get''. Should we consider the [[phrasal verb]]s ''get at, get away, get back, get by, get in, get off, get on, get over, get through, get up'' and a dozen other uses of ''get'' plus one other word, such as ''get home'' or ''get fat'' or ''get fatter'' or even more additions, such as ''get away with'', ''get rid of'', ''get over something'', ''get your own back on somebody'', as one [[lexeme]] -– ''get'' -– or an expression, a [[set phrase]], an [[idiom]]? In a dictionary, these, and many others, might all be included under the entry ''get''. And what about the inflections: ''gets, got/gotten, getting''? Unlike other European languages, Modern English has very few inflections and contrary to what many people think, is surprisingly regular, despite its many exceptions.
+
[[Phrasal verb]]s are verbs formed by two (or more) parts. They express a single concept such as "run away" or "wake up". So should they be counted as a single word?
 +
 
 +
Take one of the most frequently used verbs in English – ''get''. Should we consider the phrasal verbs ''get at, get away, get back, get by, get in, get off, get on, get over, get through, get up'' and many more as a word?
  
=== Prefixes ===
+
Then there are get forms where "get" means "become" such as ''get fat'' or ''get fatter''. Should these be one [[lexeme]] -– ''get'' -– or an expression, a [[set phrase]], an [[idiom]]? In a dictionary, these, and many others, might all be included under the entry ''get''. And what about the inflections: ''gets, got/gotten, getting''?
  
Or words beginning with prefixes such as ''un''-, as in ''unhappy, untidy, unlikely'', many of which are not included in dictionaries because of their apparent obviousness. The same occurs with adverbs ending in -''ly'', or inflections of nouns (singular and plural), adjectives (comparison) and, as we saw above, the [[past tense]]s of most verbs unless they are so irregular as to cause possible confusion. Thus, ''bad'', ''worse'' and ''[the] worst'' would probably be included as three separate entries, whereas in the case of more regular adjectives such as ''cold,'' its regular comparative and superlative – ''colder'', ''[the] coldest'' – would probably be included under one single entry: ''cold''.  
+
To further complicate the issue, phrasal verbs usually have multiple meanings.  As the previous paragraph asks - should each different idea represented by these meanings be counted as a distinct word?
 +
 
 +
=== Prefixes, suffixes and inflections ===
 +
 
 +
How should we count words beginning with prefixes such as ''un''-, as in ''unhappy, untidy, unlikely''? Many of which are not included in dictionaries because of their apparent obviousness. The same occurs with adverbs ending in -''ly'', or inflections of nouns (singular and plural), adjectives (comparison) and, as we saw above, the [[past tense]]s of most verbs unless they are so irregular as to cause possible confusion. Thus, ''bad'', ''worse'' and ''[the] worst'' would probably be included as three separate entries, whereas in the case of more regular adjectives such as ''cold,'' its regular comparative and superlative – ''colder'', ''[the] coldest'' – would probably be included under one single entry: ''cold''.
  
 
=== Binomials ===
 
=== Binomials ===
  
And whilst on the subject of [[antonym]]s, what’s the difference between learning single words – ''big'' and ''small'' – and [[binomial]] expressions like ''black and white'', ''thick and thin'', ''boys and girls'', ''ladies and gentlemen'', ''eggs and bacon'', ''fish and chips'', ''socks and shoes''?
+
One might also ask if there is a difference between learning single words – ''big'' and ''small'' – and [[binomial]] expressions like ''black and white'', ''thick and thin'', ''boys and girls'', ''ladies and gentlemen'', ''eggs and bacon'', ''fish and chips'', ''socks and shoes''?
 +
 
 +
=== Numbers ===
 +
 
 +
Should we count individual numbers as words? "One" is pretty obviously a word and so is "two".  "First" and "second" are clearly words as well.  "Twenty" is a word but twenty-one in hyphenated - when spelt. So at what point, if any, do we stop calling numbers words?
 +
 
 +
One could argue that they stop becoming words when the spelling convention makes them separate - but that would mean that a hypothetical non-English language which spelt them as a single word would have an infinite number of words, as is the case of Italian (''ottantotto'') Dutch (''achtentachtig'') or German (''achtundachtzig'').
 +
 
 +
===A sample case===
 +
 
 +
To illustrate some of the above let us consider the very simple word "record" and its various derivations.
 +
 
 +
=====Verbs=====
 +
 
 +
The verb "to record" has an initial meaning of "make a note of" or something of that nature and a newer meaning of "make an audio or video copy".  It is conjugated as record, records, recorded, recorded, recording.
 +
 
 +
There is another verb which is "to break a record", meaning "to do something better than anybody else".  For historic reasons this is a series of words but could easily have been one word.  The verb "to break" is conjugated as usual to form the various tenses.
 +
 
 +
=====Nouns=====
 +
 
 +
As a noun "a record" can be a plastic disc; an unequalled feat; or an note which has been taken.  A "recording" may be an audio or video record of an event.  A recorder may be a person which makes a record, a device which makes recordings or - with a totally different meaning - a wind instrument typically used by kids in primary school as an introductory instrument.
 +
 
 +
=====Adjectives=====
 +
 
 +
One can also use "record" as an adjective as in "a record time".  "Recording" functions as an adjective in "a recording contract". "Recorded" functions as an adjective in "a recorded conversation".
 +
 
 +
=====Pronunciation=====
 +
 
 +
In common with many such pairs, "re'''<u>cord</u>'''" as a verb and "'''<u>re</u>'''cord" as a noun are pronounced differently.
 +
 
 +
=====Question=====
 +
 
 +
So how many words should we count?
  
 
== Solutions ==
 
== Solutions ==
 +
Given the above, is there any way that we can usefully talk about numbers?
 +
 +
===Counting the number of words we use===
 +
 
''See [[Word#How_many_words_do_I_know.3F | How many words do I know?]]''
 
''See [[Word#How_many_words_do_I_know.3F | How many words do I know?]]''
  
 
One solution might to try to estimate the [[vocabulary]] of the average [[native speaker]], but even this presents difficulties. Partly because we all have an active and a passive vocabulary and partly because we can often "know" words we have never seen before, either because of their [[context]] or because they are made up of other parts of words we already know.
 
One solution might to try to estimate the [[vocabulary]] of the average [[native speaker]], but even this presents difficulties. Partly because we all have an active and a passive vocabulary and partly because we can often "know" words we have never seen before, either because of their [[context]] or because they are made up of other parts of words we already know.
  
== Spelling ==
+
===Counting the words in a dictionary===
One of the consequences of this long and varied history is that English [[spelling]] no longer corresponds particularly well with English [[pronunciation]], giving rise to calls for [[spelling reform]].
+
 
 +
One might imagine that simply counting the words in a dictionary would provide the answer. But dictionary compilers have to consider all the issues outlined above.
 +
 
 +
And then one might ask "Which dictionary"? A medium-sized [[dictionary]] may contain some 100,000 entries. ''The New Oxford Dictionary of English'', published in 1998, is the biggest single-volume dictionary and contains 350,000 words, of which 52,000 are scientific and technical words, although it avoids over-technical terminology. On the other hand, the 20-volume ''OED'', the definitive dictionary of the English language, contains over half a million lexemes - many of which are obsolete.
 +
 
 +
Counting the words in other dictionaries would give other results depending on the objectives of the editorial staff. Counting the words in half a dozen dictionaries and dividing the total by the number of dictionaries would certainly give an average number - but, given the issues identified above, would this number be meaningful in any way?  And would it be possible to the same with other languages to make any meaningful comparison?
  
==See also==
+
==References==
*[[American English v. British English]]
+
<references/>[[category:index]]
*[[English]]
 
*[[Learning English conversation questions]]
 
*[[Standard English]]
 
  
{{stub}}
 
  
  
[[Category:vocabulary]]
+
[[category:vocabulary]]

Latest revision as of 02:20, 27 June 2020

Although the claim is often made that "English has more words than any other language" it is not that easy to count the number of words in English,[1] or, for that matter in any language.

Many of the problems identified below are about how we wish to define the "words" we are counting. We can count based on semantics (meaning), based on orthography (spelling) or both.

Consequently, different studies use differing criteria when counting the number of "words", lexemes or vocabulary items in a language. Depending on the criteria used, estimates for English may vary between 500,000 and 2 million words - or many more. We identify below some of the many criteria which one would need to consider.

Grammatical changes[edit]

Would conjugations or past participles used as adjectives be counted as separate words? In other words, the word "(to) close" is obviously a word. Should the past tense "closed" be counted as a separate word? In some verbs the past participle is formed differently to the past tense. Are past participles which differ from past tenses different words? That is to say, does the set "speak, spoke, spoken" represent three words and the set "close, closed, closed" only represent two?

Does the word "closed" when it is used as adjective in "a closed door" count as a separate word?

Scientific words[edit]

Should we count species names for flowers and insects and the 500,000 different names for fungi which are common to all languages? What about names for chemicals? How about medical names for diseases? With these you can dwarf the number of "normal" words in any language.

Status of a word[edit]

Equally difficult is the question of whether a word is actually used - it may exist but be so obsolete that it isn't used any more. Do we count it or not? Do we count slang? Do we count regional words? Do we count a word if it is used in the UK but not in the US or in all international varieties of English - including Indian English, which has a large selection of words from native languages?

Acronyms[edit]

There are a vast number of acronyms in the language some of which, such as UNESCO and NATO, are known internationally. Others, such as TEFL or CELTA, are only used by small communities. How would one decide whether to include them or not?

Various spellings[edit]

If a word has two spellings, does that count as one word or two? Or two past participles like "lighted" and "lit" or "dived" and "dove"? Does "dove" as a bird count as a separate word?

Multiple meanings[edit]

Furthermore, given that over eighty per cent of all words in English have more than one meaning – water as a verb and noun; lock as a verb and noun related to keys, or as a construction on a canal or river to regulate the ascent or descent of boats, or as a hold in wrestling or judo, or as in a lock of hair – should one count each meaning of the same word – the same combination of letters – as a different item? Surely if a person knows five meanings of the same word, he or she has a more extensive vocabulary than another person who knows only one meaning?

Get and phrasal verbs[edit]

Phrasal verbs are verbs formed by two (or more) parts. They express a single concept such as "run away" or "wake up". So should they be counted as a single word?

Take one of the most frequently used verbs in English – get. Should we consider the phrasal verbs get at, get away, get back, get by, get in, get off, get on, get over, get through, get up and many more as a word?

Then there are get forms where "get" means "become" such as get fat or get fatter. Should these be one lexeme -– get -– or an expression, a set phrase, an idiom? In a dictionary, these, and many others, might all be included under the entry get. And what about the inflections: gets, got/gotten, getting?

To further complicate the issue, phrasal verbs usually have multiple meanings. As the previous paragraph asks - should each different idea represented by these meanings be counted as a distinct word?

Prefixes, suffixes and inflections[edit]

How should we count words beginning with prefixes such as un-, as in unhappy, untidy, unlikely? Many of which are not included in dictionaries because of their apparent obviousness. The same occurs with adverbs ending in -ly, or inflections of nouns (singular and plural), adjectives (comparison) and, as we saw above, the past tenses of most verbs unless they are so irregular as to cause possible confusion. Thus, bad, worse and [the] worst would probably be included as three separate entries, whereas in the case of more regular adjectives such as cold, its regular comparative and superlative – colder, [the] coldest – would probably be included under one single entry: cold.

Binomials[edit]

One might also ask if there is a difference between learning single words – big and small – and binomial expressions like black and white, thick and thin, boys and girls, ladies and gentlemen, eggs and bacon, fish and chips, socks and shoes?

Numbers[edit]

Should we count individual numbers as words? "One" is pretty obviously a word and so is "two". "First" and "second" are clearly words as well. "Twenty" is a word but twenty-one in hyphenated - when spelt. So at what point, if any, do we stop calling numbers words?

One could argue that they stop becoming words when the spelling convention makes them separate - but that would mean that a hypothetical non-English language which spelt them as a single word would have an infinite number of words, as is the case of Italian (ottantotto) Dutch (achtentachtig) or German (achtundachtzig).

A sample case[edit]

To illustrate some of the above let us consider the very simple word "record" and its various derivations.

Verbs[edit]

The verb "to record" has an initial meaning of "make a note of" or something of that nature and a newer meaning of "make an audio or video copy". It is conjugated as record, records, recorded, recorded, recording.

There is another verb which is "to break a record", meaning "to do something better than anybody else". For historic reasons this is a series of words but could easily have been one word. The verb "to break" is conjugated as usual to form the various tenses.

Nouns[edit]

As a noun "a record" can be a plastic disc; an unequalled feat; or an note which has been taken. A "recording" may be an audio or video record of an event. A recorder may be a person which makes a record, a device which makes recordings or - with a totally different meaning - a wind instrument typically used by kids in primary school as an introductory instrument.

Adjectives[edit]

One can also use "record" as an adjective as in "a record time". "Recording" functions as an adjective in "a recording contract". "Recorded" functions as an adjective in "a recorded conversation".

Pronunciation[edit]

In common with many such pairs, "record" as a verb and "record" as a noun are pronounced differently.

Question[edit]

So how many words should we count?

Solutions[edit]

Given the above, is there any way that we can usefully talk about numbers?

Counting the number of words we use[edit]

See How many words do I know?

One solution might to try to estimate the vocabulary of the average native speaker, but even this presents difficulties. Partly because we all have an active and a passive vocabulary and partly because we can often "know" words we have never seen before, either because of their context or because they are made up of other parts of words we already know.

Counting the words in a dictionary[edit]

One might imagine that simply counting the words in a dictionary would provide the answer. But dictionary compilers have to consider all the issues outlined above.

And then one might ask "Which dictionary"? A medium-sized dictionary may contain some 100,000 entries. The New Oxford Dictionary of English, published in 1998, is the biggest single-volume dictionary and contains 350,000 words, of which 52,000 are scientific and technical words, although it avoids over-technical terminology. On the other hand, the 20-volume OED, the definitive dictionary of the English language, contains over half a million lexemes - many of which are obsolete.

Counting the words in other dictionaries would give other results depending on the objectives of the editorial staff. Counting the words in half a dozen dictionaries and dividing the total by the number of dictionaries would certainly give an average number - but, given the issues identified above, would this number be meaningful in any way? And would it be possible to the same with other languages to make any meaningful comparison?

References[edit]