The Distribution of English Isograms in Google Ngrams and the British National Corpus
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
StandardStandard
Yn: Opticon1826, 2017.
Allbwn ymchwil: Cyfraniad at gyfnodolyn › Erthygl › adolygiad gan gymheiriaid
HarvardHarvard
APA
CBE
MLA
VancouverVancouver
Author
RIS
TY - JOUR
T1 - The Distribution of English Isograms in Google Ngrams and the British National Corpus
AU - Breit, Florian
N1 - Opticon1826 ceased publishing in 2016 keep the entry as unpublished
PY - 2017
Y1 - 2017
N2 - The study of isograms—words in which each letter occurs the same number of times—has thus far largely been limited to manual search for examples in sources such as dictionaries, and accounts have principally limited themselves to simply listing the known isograms of various categories. This paper presents the results of a corpus study of English isograms from Google Ngrams (ca. 1 trillion words, ~13 million types) and the British National Corpus (ca. 100 million words, ~6 million types). The paper discusses methodological issues relating to the automated mining of isograms, explores the distribution of isograms in relation to word-length and frequency, and presents several new isograms, which have so far gone unnoticed in the literature. Moreover the paper describes the resultant dataset of English isograms and the tools used to create it, which are made freely available and can be used to further study the distribution of isogramy in English and other languages.
AB - The study of isograms—words in which each letter occurs the same number of times—has thus far largely been limited to manual search for examples in sources such as dictionaries, and accounts have principally limited themselves to simply listing the known isograms of various categories. This paper presents the results of a corpus study of English isograms from Google Ngrams (ca. 1 trillion words, ~13 million types) and the British National Corpus (ca. 100 million words, ~6 million types). The paper discusses methodological issues relating to the automated mining of isograms, explores the distribution of isograms in relation to word-length and frequency, and presents several new isograms, which have so far gone unnoticed in the literature. Moreover the paper describes the resultant dataset of English isograms and the tools used to create it, which are made freely available and can be used to further study the distribution of isogramy in English and other languages.
KW - English
KW - glottometrics
KW - isograms
KW - logology
KW - orthography
M3 - Article
JO - Opticon1826
JF - Opticon1826
SN - 2049-8128
ER -