Proofing tools are invaluable help in high quality document creation. Tilde has developed all the main features: Spelling Checker, Grammar Checker, Thesaurus, Hyphenator, Morphological Analyzer. Our most recent development is a new version of the Grammar Checker.
As Latvian is highly inflected language with a high morphological ambiguity, there are many long distance agreements between words and phrases in a sentence for which we need a deep syntactic analysis of phrases and sentence to find possible errors. The Latvian grammar checker is based on a parser. The parser works with two sets of rules — rules describing correct syntactic structures of Latvian grammar and rules describing grammar errors.
The grammar checking system consists of separate components each having its own task. Most of them must be called in a certain order as each component relies on data structures prepared by the previous component. The incoming text is split into separate token objects and sentence boundaries are detected in a tokenizer module. Subsequent components work only with a sentence, not with all incoming text at once.
One of the following token types is assigned to every token object: word, abbreviation, punctuation and numeric. In a simple error location module simple formatting errors are located using regular expressions. The analyzer module adds morphological analysis to every token. The parser component performs parsing using a given rule set. The parse walker component extracts the error trees from the parse result matrix and generates suggestions for error fixing. Results from this component and from the simple error locator are passed to the result preparation module which merges results and returns to a calling application.