Cheat Sheet of Parameters That Can Be Set for the Ngram Viewer
Quoted with adaptation from the fuller set of tips provided on the About Google Ngram Viewer page on the Ngram Viewer website. (See the latter page for fuller explanations with examples.)
- Wildcard search (“search phrase*”)
- When you put a * in place of a word, the Ngram Viewer will display the top ten substitutions. For instance, to find the most popular words following “University of”, search for “University of *”.
- Case sensitive/insensitie search
- By default, the Ngram Viewer performs case-sensitive searches: capitalization matters. You can perform a case-insensitive search by selecting the “case-insensitive” checkbox to the right of the query box.
- Inflection search (“search phrase_INF”)
- An inflection is the modification of a word to represent various grammatical categories such as aspect, case, gender, mood, number, person, tense and voice. You can search for them by appending _INF to an ngram. For instance, searching “book_INF a hotel” will display results for “book”, “booked”, “books”, and “booking”:
- Part-of-speech Tags
- (“searchword_ADJ”) adjective
- (“searchword_ADV”) adverb
- (“searchword_PRON) pronoun
- (“searchword_DET) determiner or article
- (“searchword_ADP) an adposition: either a preposition or a postposition
- (“searchword_NUM) numeral
- (“searchword_CONJ) conjunction
- (“searchword_PRT) particle
- (“searchword_ROOT) root of the parse tree These tags must stand alone (e.g., _START_)
- Example: Consider the word tackle, which can be a verb (“tackle the problem”) or a noun (“fishing tackle”). You can distinguish between these different forms by appending _VERB or _NOUN: , etc.
- Most frequent part-of-speech tags for a word can be retrieved with the wildcard functionality. For example: query cook_*:
- Start and End of Sentences (“_START_”) (“_END_”)
- The Ngram Viewer tags sentence boundaries, allowing you to identify ngrams at starts and ends of sentences with the START and END tags, for example: “_START_ President Lincoln”)
- Dependency Relations (“mainword=>dependentword”)
- Sometimes it helps to think about words in terms of dependencies rather than patterns. Let’s say you want to know how often tasty modifies dessert. That is, you want to tally mentions of tasty frozen dessert, crunchy, tasty dessert, tasty yet expensive dessert, and all the other instances in which the word tasty is applied to dessert. For that, the Ngram Viewer provides dependency relations with the => operator:\.
- Root Word in Sentence (“_ROOT_=>searchword”)
- Every parsed sentence has a _ROOT_. Unlike other tags, _ROOT_ doesn’t stand for a particular word or position in the sentence. It’s the root of the parse tree constructed by analyzing the syntax; you can think of it as a placeholder for what the main verb of the sentence is modifying. So here’s how to identify how often will was the main verb of a sentence: “_ROOT_=>will”. This will return results in which “will” is part of the sentence Larry will decide. but not Larry said that he will decide, since will isn’t the main verb of the latter sentence.
- Ngram Compositions
- The Ngram Viewer provides five operators that you can use to combine ngrams: +, -, /, *, and :.
||sums the expressions on either side, letting you combine multiple ngram time series into one.
||subtracts the expression on the right from the expression on the left, giving you a way to measure one ngram relative to another. Because users often want to search for hyphenated phrases, put spaces on either side of the - sign.
||divides the expression on the left by the expression on the right, which is useful for isolating the behavior of an ngram with respect to another.
||multiplies the expression on the left by the number on the right, making it easier to compare ngrams of very different frequencies. (Be sure to enclose the entire ngram in parentheses so that * isn’t interpreted as a wildcard.)
||applies the ngram on the left to the corpus on the right, allowing you to compare ngrams across different corpora.
- Searching inside Google Books
- Below the graph, we show “interesting” year ranges for your query terms. Clicking on those will submit your query directly to Google Books. Note that the Ngram Viewer is case-sensitive, but Google Books search results are not.
- Corpus Selection [“(searchworld:eng_2012)”, “(searchword:fre_2012)”, etc.]
- The : corpus selection operator lets you compare ngrams in different languages, or American versus British English (or fiction), or between the 2009 and 2012 versions of our book scans. Here’s chat in English versus the same unigram in French: (chat:eng_2012) versys (chat:fre_2012)
- Corpora: Below are descriptions of the corpora that can be searched with the Google Books Ngram Viewer. All corpora were generated in either July 2009 or July 2012; we will update these corpora as our book scanning continues, and the updated versions will have distinct persistent identifiers. Books with low OCR quality and serials were excluded.
|Informal corpus name
|American English 2012
||Books predominantly in the English language that were published in the United States.
|American English 2009
|British English 2012
||Books predominantly in the English language that were published in Great Britain.
|British English 2009
||Books predominantly in simplified Chinese script.
||Books predominantly in the English language published in any country.
|English Fiction 2012
||Books predominantly in the English language that a library or publisher identified as fiction.
|English Fiction 2009
|English One Million
||The “Google Million”. All are in English with dates ranging from 1500 to 2008. No more than about 6000 books were chosen from any one year, which means that all of the scanned books from early years are present, and books from later years are randomly sampled. The random samplings reflect the subject distributions for the year (so there are more computer books in 2000 than 1980).
||Books predominantly in the French language.
||Books predominantly in the German language.
||Books predominantly in the Hebrew language.
||Books predominantly in the Spanish language.
||Books predominantly in the Russian language.
||Books predominantly in the Italian language.