This lexicon is a speech lexicon, exported from Crimsonwing’s text-to-speech (TTS) database into a .txt file. In its original form and together with the Maltese Speech Engine Diphone repository, it was used for building Crimsonwing’s text-to-speech system.
The file is in txt format, with each line per word form containing the information of part of speech, written form, phonetic form, syllables, stress position and language (separated by commas).
After unzipping the file, the folder contains the files:
- Maltese TTS - Database Schema.pdf (the original documentation for the entire speech database)
With one entry per line and data values separated by comma, the structure of an entry follows the structure: PartOfSpeech,WrittenForm,PhoneticForm,Syllables,StressPosition,Language
For example, the verb form niktbu “we write” is represented as:
The attributes and values in the lexicon are:
• PartOfSpeech (Abbreviation, Acronym, Adjective, Adverb, Article, Conjunction, Interjection, Letter, Noun, Numeral, Participle, Preposition, Pronoun, Verb, Unknown)
• WrittenForm (string: orthographical representation of the entries)
• PhoneticForm (string: representation of the entries in IPA)
• Syllables (string: representation of the entries in IPA, with syllable boundaries indicated with a hyphen)
• StressPosition (number indicating the syllable carrying word stress; values are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
• Language (English, French, Italian, Maltese, Unknown)
NB: The value “0” for stress position only applies to two entries which do not contain values for PhoneticForm and Syllables (for some unknown reason). The respective word forms are the adjective iffullata “crowded, congested” and the participle immankat “mutilated, disabled”.