System and method for automatic diacritizing Vietnamese text
3:45 CH,24/12/2014

Investors: Mrs. Dang Thi Mai Huong, Mr.Nguyen Viet Hai

Description:

Systems and methods for automatic diacritizing Vietnamese text entered using physical and virtual computer keyboard are provided. In accordance with some embodiments, a method for automatic diacritizing Vietnamese text is provided, the method comprising: detecting a phrase ending character and automatically diacritizing a previously entered phrase while user may continue entering other phrases; detecting a special character, or, on a virtual computer keyboard, a touch event on a previously enter word, to allow manual diacritizing Vietnamese text.

Technical field of the invention

The present invention relates to diacritization of Vietnamese text and more particular to an automatic diacritization system and method to support typing and editing Vietnamese on a physical or virtual keyboard.

It is therefore a primary objective of the present invention to provide a method and a system which (i) automatically diacritizes non-diacritical text entered by user, without any manual intervention from user (e.g. to choose between different word variants); and (ii) allows user to type and edit in the same text area, without the need to switch between edit and type mode.

This object is achieved by designing a user interface component that keep track of the movement of user typing cursor to predict user intention based on the current and historical context: when user is in typing mode or has just finished typing a word, a phrase or a sentence; and when user is in editing mode or is about to correcting a syllable. In addition, the user vocabulary, used by the language model for automatic diacritization of text, can optionally be shared among users. As such, any improvement to the automatic diacritization will be beneficial to all shared users.

Due to the ambiguity nature of non-diacritical Vietnamese text, even with native speakers, it is not always easy to choose correct diacritized words from a few valid options. Therefore the automatically diacritized text may still be incorrect. According to the invention, the system allows users to manually edit or correct an incorrectly diacritized syllable using a popular Vietnamese typing method such as Telex, VNI or VIQR. Optionally, especially on mobile device, the system allows user to tap on the incorrectly diacritized syllable and then choose from a pop up list of word correction options the correct syllable with diacritics.

A system and method for diacritization of text, according to this invention includes: detecting phrase ending characters entered by user and diacritizing the most recently entered phrase by employing an optimization solver to search for the diacritized phrase with the highest score; detecting special characters entered by user and adding, removing or changing diacritics of word previously entered or diacritized by employing either TELEX or VNI or VIQR typing methods; building, updating and maintaining a vocabulary of phrases with score.

As an alternative, a system and method for diacritization of text on an electronic device with touch screen keyboard includes: detecting phrase ending characters entered by user and diacritizing a previously entered phrase by employing a solver to search for the diacritized phrase with the highest score; detecting a touch event on a previously enter word to show a list of word correction options, and replacing previously entered word by a user-selected correction option.

These and other objects, features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.

Source: National Office of Intellectual Property of Vietnam

 

Bản quyền thuộc Cục Thông tin Khoa học và Công nghệ Quốc gia.
Địa chỉ trụ sở chính: 24 Lý Thường Kiệt - Quận Hoàn Kiếm - Hà Nội.
Tel: (84-04) 38249874 - 39342945 | Fax: (08-04) 38249874 | Email: techmart@vista.gov.vn