Jump to content
IndiaDivine.org

Release of Sanskrit platform

Rate this topic


Guest guest

Recommended Posts

Objet: Release of Sanskrit platform

De: huet

8 septembre 2004 18:57:09 GMT+02:00

À: INDOLOGY

Cc: huet

 

This is to announce the Sept 8th release of my Sanskrit computational

linguistics platform,

available at http://pauillac.inria.fr/~huet/SKT/

The verbal morphology is now complete, for the present system (present,

imperfect,

optative, imperative), passive, perfect, future and aorist.

 

This may be tested at

http://pauillac.inria.fr/~huet/SKT/DICO/grammar.html

where you submit the root (in Velthuis transliteration form) and the

present class.

E.g.: bhuu 1, as 2, m.rj 2, han 2, haa 3, hu 3, daa 4, su 5, p.r 6, yuj

7, k.r 8, j~naa 9,

namas 10, etc.

 

Conversely, a lemmatiser (at

http://pauillac.inria.fr/~huet/SKT/DICO/index.html#stemmer)

attempts to tag inflected words.

Try for instance apibat, akaar.siit, dudoha, vaahyate, etc (clicking on

Verb).

 

An experimental segmenter/tagger is also available. It will segment

simple sentences, verb final, such as maarjaarodugdha.mpibati,

analysing the required sandhi.

In tagger mode, it gives a shallow parsing of the sentence or phrase,

linking to the

dictionary.

It may be used for instance to tag verb forms with preverbs, such as

utti.s.tha,

and to decompose compounds. It is able to correctly analyse forms such

as

ihehi (iha-aa-ihi = come here), and adhiiye (I learn, with sandhi

duplicating the i).

 

The current syntax of recognized phrases is N*.(1+V) with V=(P+1).R

that is a list of noun forms followed optionally with a verb, where a

verb is optionally a

preverb followed with a root form; the current set P of prefixes is:

ati, adhi, adhyava, anu, anuparaa, anupra, anuvi, anta.h, apa, apaa,

api,

abhi, abhini, abhipra, abhivi, abhisam, abhyava, abhyaa, abhyut,

abhyupa,

ava, aa, ut, udaa, upa, upani, upasam, upaa, upaadhi, tira.h, ni, ni.h,

nirava, niraa, paraa, pari, parini, parisam, paryupa, pi, pura.h, pra,

prati, pratini, prativi, pratisam, pratyaa, pratyut, prani, pravi,

pravyaa, praa, vi, vini, vini.h, viparaa, vipari, vipra, vyati, vyapa,

vyava, vyaa, vyut, sa, sa.mni, sa.mpra, sa.mprati, sa.mpravi, sa"mvi,

sam, samava, samaa, samut, samudaa, samudvi, samupa.

 

Full lists of declined forms are available as downloadable free

linguistic resources.

The first one comprises 135,000 declined nouns (it includes pronouns,

numbers, participles,

particles and undeclinable forms such as absolutive and infinitive

forms). The second one

comprises 74,000 conjugated root forms. These data bases are available

in pdf format and

XML (given with a DTD).

 

This work is still in a very preliminary form, since it has not been

tested yet on real

corpus. I beg the indulgence of the reader, since many mistakes and

omissions remain.

He is kindly requested to report them to me.

Secondary conjugations are not systematically generated, they are

included only if

explicitly reported in the dictionary. Aorist forms are also included

on need, I did

not attempt to generate all forms given in Whitney roots. Also

participles, infinitives,

periphrastic future and perfect, are not systematically generated.

Conditional and precative

are missing. On the other hand, I generally generate both active

(parasmaipada) and middle

(aatmanepada) forms, so report non attested forms only when they do not

make sense

semantically.

 

All these tools are available one click away from the Sanskrit

dictionary index, at

http://pauillac.inria.fr/~huet/SKT/DICO/

Enjoy

 

Gerard Huet

Link to comment
Share on other sites

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...