HIML & AMD Sanskrit Sorting Issues

HIML & AMD Sanskrit Sorting Issues

HIML, AMD are two of the latest ayurvedic indexes published. HIML is great. AMD is mediocre. But both have several issues in common. Yes, sorting has gone wrong. And sorting is something rather important for a dictionary or word index. Is it not?

Directions for Use
The harmonization of the indexes requires attention to some points which may facilitate finding the lemma searched for. This is because of variations in spelling between volumes I/II and volume III.
Majuscules and minuscules may not always have been used consistently.
Compound nouns are sometimes written as one word, sometimes as two; the latter may be with or without a hyphen.
Slight variants in spelling (for instance: mythic/mythical) may be disregarded.
Spelling variants are retained when present in the sources.
Further, it is not clear in some instances whether a word is a proper name or a title (see, for example: Bindusāra, Viśva).
It is useful to compare, in the general index, lemmata such as: kinds of.. ./types of… /varieties of…, and: diseases/disorders.
Those acquainted with Sanskrit may compare fever/jvara, etc.
Peculiar features of the index program:
— letters with a diacritic mark precede those without such a mark;
— words with a bracketed part precede those without brackets;
— lemmata consisting of two or more words precede those written as one word;
— compounds with a hyphen come after those without it.
The titles/author featuring in the headings of the volumes 1A and 1B (Caraka-saṃhitā, Suśrutasaṃhitā, Astāñgahrdayasaṃhitā, Astāñgasaṃgraha, Vāgbhata) are not indexed as far as these parts are concerned. For these the reader is referred to the contents.

The sorting in both the books is miserable. There is not sorting logic in AMD and some logic in HIML.


1) Nowhere was it stated that it will be sorted in English alphabet. When opening a Sanskrit book, I expect devanagari ordering. Why should I look for «bh» somewhere inside «b»? And «ś» with «ṣ» are somewhere in the middle of «s»? Fascinating? No. In the age of automatic part of speech tagging for a Sanskrit corpora we can’t even make Sanskrit sorting as it should be. Yes, it was 10 years ago, but for the last 20 years not much has changed for Sanskrit. If we don’t speak about it, nothing will change.

2) If it is stated that «letters with a diacritic mark precede those without such a mark», you don’t get the feeling of a mix you actually will get. So it’s not a feature, it’s indexing software failure. Total failure. All diacritics are treated as they are equal to the basic character. To find a word beginning with ā and ū is a miracle (good that there are not many starting with ī).

If I would see the source text, I could get it right. But I guess I never will. And there will be a lot of errors in indexes related to Sanskrit matters.