Before I offer any definition or etymology for the word data, take a look at the two sentences below and decide which one sounds right.
(a) The data is good.
(b) The data are good.
I’m betting that even if you are not quite sure which is technically correct you’ll have a preference for one or the other.
According to the Oxford English Dictionary, the correct sentence should be (b) because the actual word data is the plural form of the singular datum. But curiously, this doesn’t make things feel better if you believed (a) was the more correct.
The good news is that it is not uncommon to find “The data is…” being used in the real world i.e. outside a dictionary that includes Latin etymology.
Using Google, the phrase “the data is…” scored 7,660,000 ghits compared with “the data are…” racking up 9,250,000. This supports the idea that more people get it right than wrong – but it’s hardly conclusive. And it also suggests that there’s no shame in using “the data is…” because another 7,659,999 other folks are on on your side.
The word datum is of Latin origin and means “A thing given or granted; something known or assumed as fact, and made the basis of reasoning or calculation; an assumption or premiss from which inferences are drawn.” (OED, Vol. IV, 264). Ultimately it is the past participle of the verb dare, which means “to give” – hence the notion of something that is given.
The plural form is much more common and relates to “facts, esp. numerical facts, collected together for reference or information.” In 1899, William Wade Pullen published the exciting tome Engineering Tables and Data, a thrill-a-minute page turner containing pages upon pages of… well, tables of data!
However, since the 1940’s and the rise of the Computer Age, the use of the word data as a mass noun has been evident. A mass noun is one that cannot be counted, such as food, music, or information. Check out the following examples and – as before – see which sound right:
(c.1) The food is good.
(c.2) The food are good.
(d.1) The music is good.
(d.2) The music are good.
(e.1) The information is good.
(e.2) The information are good.
If you opted for the examples that use “is” as being right, that’s because count nouns don’t take the plural verb.
The clue to how data has slipped into being acceptable in the sentence “The data is good” can be seen with the mass noun information. What appears to have happened is that the word data has become synonymous with information and also taken on its mass-noun characteristics.
In the 1964 AFIPS Conference Proceedings XXVI, you’ll find the phrase “Data is transferred to main storage as soon as two bytes are accumulated,” whereas the 1969 Condensed Computer Encyclopedia offers “Data are recorded on the tape…” Just one year later in 1970, Chandor et al. use “Data is sometimes contrasted with…” in their Dictionary of Computers.
Clearly what we are seeing is the flip-flopping of the word data as being either the plural of a count noun (datum) or a non-countable mass noun synonymous with information.
In my humble opinion – and even with the numbers against me – I’m up for recommending that we accept the fact that data has become a mass noun and is used more often than not as an alternative to information. I’m predicting that in 10 years time you’ll see the ghits for the is and are reversed.
To me, the data is mounting…