The web service is a tool which takes text as input and returns a list of tokens. The tokens can be orthographical words, numerals and punctuation marks. The tokeniser was designed to work on Maltese texts. The download for this resource only contains the narrative description in a Word file.
The WSDL link is http://metanet4u.research.um.edu.mt/services/MtTokeniser?wsdl.
The service has one method which can be invoked:
• String tokenise(String text, Boolean tokenTags, String separator)
The method takes has three parameters:
This is the text that will be tokenised
This is a boolean variable. If tokenTags is true than the output tokens will be wrapped in tags
(ex: <token> tagged_text </token>). If false, the token will have no tags.
This is a string which will be used to separate one token from another in the output string.
Input data format: text string with sentences
Output data format: a text string with the tagged sentences in the format <sentence> sentence_text </sentence>