Hellwig’s Devanagari OCR

Batch Sanskrit OCR (1.0.0.9 beta)

1) Every document is [New Document*]. In the File Menu I see «Open» and «New» stack file. Where is Save?
2) As before zoom between different pages is lost.
3) A RegEx after recognition cleanup would be a good idea.
।। ६ई४ए ।। If you have at least a single number between two double dandas, it means nothing else can be there than just one or several more numbers. ई and ए — dirt. Very many same similar elementary to fix issues like ।। ६ई३ ।।, ।। ६ई७ ।। , ।। ६इ६इ ।1 If all shlokas have 2 numbers there is a good chance that a 3 number shloka in the middle is just junk.

4) One can’t copy any selected words from the «Recognized text» box on the lower side of the window. I have to press «Save recognized text», choose «Copy to clipboard» and press «Save!» instead of a single copy paste as in MS Office documents.

5) Do not try to recognize text above or bellow lines. सष००प्त०ण्ण००स०ष is better left as Introduction.

Video: http://youtu.be/4adQMnLgUeE

Specimens:

https://www.dropbox.com/s/445y04pnvl1ff1o/HOSv14-noOCR.pdf 6 Mb
https://www.dropbox.com/s/olm1u5sejhub25j/HOSv14-OCR.pdf 22.7 Mb

??

???? ????? तन्मादृणनां किं नाम तदरं स्यात् ० यस्य स्यादीदृशः फलवि- पाकः ० यत्सततं देहीति वक्ति । तत्सर्वथा धनहीनस्य ममाधुना ० नेह श्रेयः । उक्तं च ।

वसेन्मानाधिकं स्थानं मानहीनं न संवसेत् ।

मानहीनं सुरैः सार्ध विमानमपि वर्जयेत् ।। ६ई३ ।।

6 एवमुत्वाप्यहं पुनरप्येवमचिन्तयम् । किमर्थितां कस्यचित्क- रोमि । तदेतत्कष्टतरम् । यत्कारणम् ०

 

कुच्चस्य कीटखातस्य दावनिष्कुषितत्वचः ।

9 तरोरप्यूषरस्यस्य वर जन्म न चार्थिनः ।। ६ई४ए ।।

कखे गद्गदता स्वेदो मुखे वैवर्ण्यवेपथ ।

म्रियमाणस्य ओचहानि यानि तान्येव याचतः ।। ६इभ ।। तदर्थित्वमपि जघन्यम् ।

वैराग्याहरणं धियो ऽपहरणं मिष्याविकल्पास्पदं

पर्यायो मरणस्य दैन्यवसतिः शङ्कानिधानं परम् ।

 

1६ ८र जाघवमास्पद च ??? माानना-

मर्थित्वं हि मनस्विनां न नरकात्पश्यामि वस्त्वन्तरम् ।। ६इ६इ ।1 अपि च ।

18 निर्द्रव्यो हियमेति दीपरिगतः प्रभ्रश्यते तेजसो

निस्तेजाः परिभूयते परिभवाविर्वेदमागच्छति ।

निर्विण्णः षुचमेति शोकमनसो बुद्धिः परिभ्रश्यति

21 निर्धीकः क्षयमेत्यहो निधनता सर्वापदामास्पदम् ।। ६ई७ ।। —

आप च ।

वरमहिमुखे क्रोधाविष्टे करौ विनिवेशितौ

91 विषमपि वर पीत्वा सुप्तं यमस्य निवेशने ।

 

If not enough memory
1) give a chance to get back, like in SEO Frog
2) cancel is not working in «Exporting pdf document» after Out of memory OK pressed. It cancels only 2-4 minutes after pressing. 1.2 RAM is used when software working.
3) Rotate by 180 — no batch function. The OCR turned 180 all my scans and I can not fix it. Why now and not before — no idea. The first page was all ok. Why did it started to turn them around — who knows?
4) If recognition failed, and I create a new stack file (to save the current one), no save button is offered. So even the last and only way to save a file is lost.

Rigveda Parallel Text

Rigveda in Russian, German and English translation.

rv01.001.01
॥ ऋग्वेदः मण्डलं 1॥ अ॒ग्निमी॑ळे पु॒रोहि॑तं य॒ज्ञस्य॑ दे॒वमृ॒त्विज॑म्। होता॑रं रत्न॒धात॑मम्॥
agnim īḷe purohitaṃ yajñasya devam ṛtvijam |
hotāraṃ ratnadhātamam ||
Агни призываю я – во главе поставленного
Бога жертвы (и) жреца,
Хотара обильнейшесокровищного.
Agni berufe ich als Bevollmächtigten, als Gott-Priester des Opfers, als Hotr, der am meisten Lohn einbringt.
I Laud Agni, the chosen Priest, God, minister of sacrifice, The hotar, lavishest of wealth.
rv01.001.02
अ॒ग्निः पूर्वे॑भि॒रृषि॑भि॒रीड्यो॒ नूत॑नैरु॒त। स दे॒वाँ एह व॑क्षति॥
agniḥ pūrvebhir ṛṣibhir īḍyo nūtanair uta |
sa devāṃ eha vakṣati ||
Агни достоин призываний риши –
Как прежних, так и нынешних:
Да привезет он сюда богов!
Agni war von den früheren Rishis und ist von den jüngsten zu berufen; er möge die Götter hierher fahren.
Worthy is Agni to be praised by living as by ancient seers. He shall bring hitherward the Gods.
rv01.001.03
अ॒ग्निना॑ र॒यिम॑श्नव॒त्पोष॑मे॒व दि॒वेदि॑वे। य॒शसं॑ वी॒रव॑त्तमम्॥
agninā rayim aśnavat poṣam eva dive-dive |
yaśasaṃ vīravattamam ||
Агни, посредством (него) пусть достигает он богатства
И процветания – изо дня в день –
Сияющего, мужеобильнейшего!
Durch Agni möge er Reichtum und Zuwachs Tag für Tag erlangen, ansehnlichen, der die meisten Söhne zählt.
Through Agni man obtaineth wealth, yea, plenty waxing day by day, Most rich in heroes, glorious.

https://www.dropbox.com/s/x59tybhpgtauyf4/RV.pdf

https://www.dropbox.com/s/6144bkpsdkbrwkk/RV.doc

https://www.dropbox.com/s/f2d1d4f9rirwg0k/RV-sa-hn-ru-de-en.html

OCR Errors in IAST Books

Creative Powers of the OCR-

  • gati >>gall, gaH, gaii, gaU (high) (OCR medium creative)
  • ex. >> rr, cx, er. es. e,x. esc. ea. e». e,r. e.r, esc.
  • H >> << lī, Ii, lI, ll, il, IL, īl, K, R, jj etc. (high) (OCR most creative)
  • all >> ai, aU, aū, au, aH, alt, ali, ail, alī, «11, ai(, oli, nll, aZI, aZZ, aZi (medium) (OCR most creative)
  • l >> ), ] (low)
  • m >> rn. (high)
  • y >> g (high)
  • b >> <<h (high)
  • , >> << j, J (high)
  • r >> v (medium)
  • d >> <<a, ā (medium)
  • t >> << i (high)
  • us >>mm, mr (low)
  • t >> f (medium)
  • ī >> t,f (medium)
  • c >> <<e (high)
  • y >> jj, v iļ (low)
  • u >> it (low)
  • 2 >> z ; 5,8 >> S, 3 >>8, S (numbers are many times misread.) (OCR most creative)
  • l >> <<i (medium)
  • ,>> << ! (low)
  • k >> <<h (medium)
  • : >> r, ; (low)
  • ti >> H, U (low)
  • l >> 1 (low)
  • k >> lc (low)
  • ā >> << u, ū (low)
  • y >> ļ/ , ļļ, u, v (very low)
  • y >> g (almost 90%)
  • rī >> ñ, m, z (low)
  • u >> n, īr
  • il >> U
  • a >> o