Guest guest Posted September 8, 2004 Report Share Posted September 8, 2004 Objet: Release of Sanskrit platform De: huet 8 septembre 2004 18:57:09 GMT+02:00 À: INDOLOGY Cc: huet This is to announce the Sept 8th release of my Sanskrit computational linguistics platform, available at http://pauillac.inria.fr/~huet/SKT/ The verbal morphology is now complete, for the present system (present, imperfect, optative, imperative), passive, perfect, future and aorist. This may be tested at http://pauillac.inria.fr/~huet/SKT/DICO/grammar.html where you submit the root (in Velthuis transliteration form) and the present class. E.g.: bhuu 1, as 2, m.rj 2, han 2, haa 3, hu 3, daa 4, su 5, p.r 6, yuj 7, k.r 8, j~naa 9, namas 10, etc. Conversely, a lemmatiser (at http://pauillac.inria.fr/~huet/SKT/DICO/index.html#stemmer) attempts to tag inflected words. Try for instance apibat, akaar.siit, dudoha, vaahyate, etc (clicking on Verb). An experimental segmenter/tagger is also available. It will segment simple sentences, verb final, such as maarjaarodugdha.mpibati, analysing the required sandhi. In tagger mode, it gives a shallow parsing of the sentence or phrase, linking to the dictionary. It may be used for instance to tag verb forms with preverbs, such as utti.s.tha, and to decompose compounds. It is able to correctly analyse forms such as ihehi (iha-aa-ihi = come here), and adhiiye (I learn, with sandhi duplicating the i). The current syntax of recognized phrases is N*.(1+V) with V=(P+1).R that is a list of noun forms followed optionally with a verb, where a verb is optionally a preverb followed with a root form; the current set P of prefixes is: ati, adhi, adhyava, anu, anuparaa, anupra, anuvi, anta.h, apa, apaa, api, abhi, abhini, abhipra, abhivi, abhisam, abhyava, abhyaa, abhyut, abhyupa, ava, aa, ut, udaa, upa, upani, upasam, upaa, upaadhi, tira.h, ni, ni.h, nirava, niraa, paraa, pari, parini, parisam, paryupa, pi, pura.h, pra, prati, pratini, prativi, pratisam, pratyaa, pratyut, prani, pravi, pravyaa, praa, vi, vini, vini.h, viparaa, vipari, vipra, vyati, vyapa, vyava, vyaa, vyut, sa, sa.mni, sa.mpra, sa.mprati, sa.mpravi, sa"mvi, sam, samava, samaa, samut, samudaa, samudvi, samupa. Full lists of declined forms are available as downloadable free linguistic resources. The first one comprises 135,000 declined nouns (it includes pronouns, numbers, participles, particles and undeclinable forms such as absolutive and infinitive forms). The second one comprises 74,000 conjugated root forms. These data bases are available in pdf format and XML (given with a DTD). This work is still in a very preliminary form, since it has not been tested yet on real corpus. I beg the indulgence of the reader, since many mistakes and omissions remain. He is kindly requested to report them to me. Secondary conjugations are not systematically generated, they are included only if explicitly reported in the dictionary. Aorist forms are also included on need, I did not attempt to generate all forms given in Whitney roots. Also participles, infinitives, periphrastic future and perfect, are not systematically generated. Conditional and precative are missing. On the other hand, I generally generate both active (parasmaipada) and middle (aatmanepada) forms, so report non attested forms only when they do not make sense semantically. All these tools are available one click away from the Sanskrit dictionary index, at http://pauillac.inria.fr/~huet/SKT/DICO/ Enjoy Gerard Huet Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.