Avesta: A Lexico-Statistical Analysis

The Avesta: A Lexico-Statistical Analysis (Direct and Reverse Indexes, Hapax Legomena and Frequency Counts). By RAIOMUND DOCTOR. Acta Iranica, vol. 41. Louvain: PEETERS, 2004. Pp. 666. [euro]105; $116.

The aim of this book, as stated in the introduction (p. 1), is to provide a tool that, like Bloomfield’s Vedic Concordance (1906), «would permit the user to exhaustively locate all and every occurrence of a given word within the major Avestan texts,» as well as a reverse index that «would allow the scholar to identify each and every word in the Avesta by its ending.» In practice, it is «an Index … of the Geldnerian version of the Avesta … which is still considered to be the normative version.» In addition, the book contains «computer-generated lexicostatistical data as to the frequencies of individual words, length-wise sorts of all Avestan words and minimal pairs.»

The corpus that has been indexed is Geldner’s text, and the alphabetical order is that of Bartholomae’s Altiranisches Worterbuch. This means, for instance, that forms that Geldner preferred in his Prolegomena and later were used in the Altiranisches Worterbuch are not listed here. Nor is a century of corrections to Geldner’s text taken into account; for example, Geldner’s vispaiia.irina (Y 19.17, after the Pahlavi Yasna manuscripts) is listed under vispaiia and irina, although it was emended by Benveniste (1964) to vispaiieirina (after the Persian Videvdad Sade mss. vispaiie.irina and the Yasna Sade mss. vispiie.irina). It also means that a large part of the Avestan vocabulary, that of the texts not in Geldner, is not included (see p. 3). This part of the vocabulary is, of course, included in the Altiranisches Worterbuch. The transcription is also that of Altiranisches Worterbuch. Thus the letter [[eta].sup.v] ([[eta].sup.v]h) is not included, [[eta].sup.v] ([[eta].sup.v]h) is used only rarely, and the old h is used for x.

There are three main indexes: an alphabetical index of all the words in Geldner’s Avesta, a reverse index of same, and an index of hapax legomena. The main index contains the bare words in Geldner’s corpus with references. It is therefore nothing like Bloomfield’s Vedic Concordance, which gives the words in context. The reader still needs to refer to Geldner, Bartholomae’s Altiranisches Worterbuch, and later literature. For no stated reason, Pazand words, i.e., Middle and New Persian words in Avestan script found in Geldner, have been included, but without indication that they are not Avestan words. Four such words are listed on p. 5 as examples of words with the un-Avestan 1: Persian bald and salar and Arabo-Persian xalk (for xaliq ‘creator’) and maxluk (for maxluq ‘created, creation’). Compounds that Geldner prints with a period (including words with prefixes) have been split into their parts, which are listed separately. Thus, a word like upairi.z[TEXT NOT REPRODUCIBLE IN ASCII]maisca (instrumental plural of upairi.z[TEXT NOT REPRODUCIBLE IN ASCII]ma- ‘who are on the earth’, a thematized compound from upairi + z[TEXT NOT REPRODUCIBLE IN ASCII]m- + -a-, + -ca ‘and’) is indexed under upairi and z[TEXT NOT REPRODUCIBLE IN ASCII]maisca. Vacillation in Geldner gives, for instance, both xrvidrum and xrvi and drum, separately. Also words with unetymological periods in the manuscripts are separated; for instance, aes[TEXT NOT REPRODUCIBLE IN ASCII]m.mahya (Y. 48.12), which is commonly assumed to be for *aes[TEXT NOT REPRODUCIBLE IN ASCII]mahiia, appears under aes[TEXT NOT REPRODUCIBLE IN ASCII]m and mahiia; and dum (2nd plural middle ending), which is listed as occurring once in Y. 45.1, goes with mazda[eta]ho.dum, also listed as occurring once, for mazda[eta]ho.dum, although Geldner’s text has mazda[eta]hodum (like gusodum, ibid.). The two parts of reduplicated forms, for instance [gamma]zara. [gamma]zarantis, are also listed separately.

The series of «frequency-wise distribution lists,» contains «the frequencies of all the characters, along with their valid combinations» (p. 7). Deferring to «traditional convention,» the author has decided to treat what are traditionally classified as vowels and consonants as such, although «the terms consonant and vowel are misnomers as applied to the Avesta.» Since the corpus is Geldner’s text, the distribution of s ~ s ~ s reflects Geldner’s idiosyncrasies (as well as those of the manuscript writers he relied on), with s used consistently before y (ii), e.g., masya- (today usually masiia-, while s [sii] is commonly used only where it represents an etymological *ci, e.g., Sauua-, Old Indic cyav-) and s as his default s-value (used when Geldner did not cite a specific manuscript reading), also initially and in groups, e.g., savaite and xsma (today s is commonly used only where it represents an etymological rt, e.g., masiia-, Old Indic martiya-).

The list of minimal pairs is the only index in this book that, in my opinion, has any usefulness, but only to some extent, as many obvious pairs are not listed, e.g., pairs with [x.sup.v] such as [x.sup.v]aro ~ jaro, baro (according to p. 8, the list is complete). Differently from the other lists (see p. 5), in this list st is counted as one phoneme(?) and is the last in alphabetical order; as far as I can see, no «minimal pairs» involving st are listed, but they exist, e.g., tasta ~ ta[delta]a, ~ tava. Again, since the corpus is Geldner’s text, several minimal pairs are simply manuscript variants. For instance, nimato ‘felt’ (V. 8.1), beside n[TEXT NOT REPRODUCIBLE IN ASCII]mato (V. 9.46), has the variant n[TEXT NOT REPRODUCIBLE IN ASCII]mato in K1, Mf2, Jp1, L2, L1 and is clearly the correct reading. Vacillation in Geldner’s text has led to minimal pairs such as (p. 551) frayazaesa ~ frayazaesa and the inclusion of Pazand words to minimal pairs such as fravas (Pazand) ~ fravah (in fravahe <frava- proper name).

The last index is a «length-wise distribution list of canonical forms,» that is, a list of words arranged by number of letters (not syllables), «commencing with the shortest word in terms of length and ending with the longest» and is supposed to be «especially useful for a study of the canonical profile of the Avestan word structure» (p. 8). The list is apparently intended to be alphabetical by the Latin alphabet, but is only partly so. For instance, it begins with a i o s a i u a [TEXT NOT REPRODUCIBLE IN ASCII] at ai (p. 569), [x.sup.v] is in the alphabetical position of q, and so on. Since the transcription is that of the Altiranisches Worterbuch, v and y count as one letter also when they are not initial and written uu and ii (e.g., va ‘both’ is actually spelled uua, which Meillet [1920] proved is disyllabic uua). The shortest words listed are mostly Pazand words; for instance, i found 351 times is the Middle Persian ezafe, s found twice is not in the general index but is probably the Middle Persian 3rd singular enclitic pronoun, and so on.

The bibliography contains 6.5 pages of literature on «corpus linguistics» and 1.5 pages of, at first glance arbitrarily chosen, literature on Avestan (and Old Persian) language and grammar, but actually identical with a bibliography found on the amateur website http://www.beepworld.de/members30/buggiffm2002/zartosht-.htm (which also contains literature on Middle Iranian). Note that Bartholomaeo S., De antiquitate, etc. (1789), refers to Paulinus a Sto. Bartholomaeo’s De antiquitate, and Schroeder G. A., Commentatio, etc. (1831), refers to Peter von Bohlen’s Commentatio, a dissertation defended 12 March 1831 with Gustav Adolph Schroeder as respondent.

No information is provided about the author, who, according to the website of the College de France, is at Poona University and was chercheur and maitre de conferences at the College 2003-4, where he worked on the computerization of Sanskrit texts.

One wonders why it was decided to publish this book in the Acta Iranica series and why Peeters agreed to publish it.