Interlingua-Based Numeral Translation In Web-Application With Knowledge-Testing

Abstract

The purpose of this research is developing an Interlingua-based technique of the natural language numeral processing and translation. We propose a three-level generalized numeral model as Interlingua-representation. The model formal grammar describes the natural language numeral structure. The first level grammar rules define that a numeral consists of sign, integer part, separation symbol, and fractional part. The second level decribes numeral integer part as a triad sequence. The third level defines the triad structure. We developed number-into-numeral, numeral-into-number, and translating algorithms based on the model. The algorithms are implemented in the Markov normal algorithms. We realized the model and the algorithms in web-application in the Internet. The web-application has a knowledge-testing function. The function allows to users test numeral convering and translating knowledge. Users from more than 100 countries visit web-application and convert numerals. The largest number of users resides in the US and the Russian Federation. The web-application log contains more than 200,000 records. The largest number of user requests related to the conversion of cardinal numerals of Spanish. The web-application is integrated in toolbox of a complex linguistic web-portal for translators as well. We conclude the Interlgua-based technique is effective for numeral processing and translation and realization in web-applications.

Keywords: Natural Language ProcessingNumeral TranslationMarkov Normal AlgorithmsLinguistic Web-Application

Introduction

We use the following terms in this paper. A numeral is a cardinal numeral having symbolic notation in the text, e.g. «three hundred fifty two». A number is a cardinal numeral having digital notation, e.g. «352».

In the process of text translation program application also converts numerals of the source language into numerals of the target language. However, the numeral translation rules aren’t the same to the language-into-language text translation rules. Also there are number-into-numeral and numeral-into-number converting tasks in the text processing.

In this paper we describe how to process numbers and numerals in the text using the Interlingua representation. We present a web-application for numeral converting and translation, user request statistics. The web-application can be used in the natural language learning as a knowledge-testing system.

Problem Statement

Machine translation is used by many users especially in the Internet. Popular web-translators have many language directions of text translation and demonstrate quality result.

There are two basic text translating technique:

1) translation with rules and small bank of translating equivalents

2) memory translation (or translation memory) (Planas & Fruse, 1999; Dillon & Fraser, 2006) with only huge bank of translating equivalents.

However, no one of techniques can’t get the text meaning and translates numerals perfect. The process of numeral translation has differences from language to language and isn’t similar to text translation. So it is necessary additional tools for numeral translation.

We try to solve a problem of numeral translation using the Interlingua-based technique.

Interlingua-based translation

The Interlingua is an intermediate representation of translating text (Dorr, Hovy & Levin, 2004; Lampert, 2004; Lee & Seneff, 2005). There are two steps in the Interlingua-based translation: 1) converting the source language text into the Interlingua; 2) converting the Interlingua into the target language text. The translation is easy-extending. To add new language in multi-lingual translating system you need to develop the language-Interlingua converting algorithms.

We use such Interlingua-based translation to process numerals in the text. We describe the Interlingua representation with the formal grammar.

G. Hardegree grammars

Gary Hardegree proposed grammars (Hardegree, 1999) for representation of the numeral structure and number transformation into English numerals. The grammars can be intended only for numeral building and used only in English. The grammars also describe the rules of building only for integer parts of numerals, but don’t consider numeral case inflection as there is no case grammatical category in English.

Research Questions

The following research questions guide the current study:

Question 1: Is the Interlingua-based technique effective for the numeral processing and translating?

Question 2: What structure has the Interlingua representation for the numeral processing and translating?

Question 3: How realize the Interlingua-based numeral processing and translating in application accessible for users in countries of the World?

Purpose of the Study

The purpose of this study is to describe an Interlingua representation for the numeral processing and translating using the formal grammar and develop a web-application realized this representation.

Research Methods

Our research has the following stages:

Findings

In the result of our research we have got the following findings.

Three-level generalized numeral model

To describe a generalized numeral structure for the Interlingua representation by the formal grammar, we use the following terms.

Number terms are:

The numeral terms are:

Example 1. In this terms, the number

1 000 400 973 = Z 1 Z 0 Z 0 Z 0 Z 4 Z 0 Z 0 Z9 Z 7 Z 3

presents as:

P + C 1 M 3 C 4 D 2 M 1 C 9 D 2 C 7 D 1 C 3 M 0 EB . □

The terms are necessary for model generalization and language independence.

We analyze the natural languages numeral building rules and develop a three-level generalized numeral model (or model ) as the Interlingua. It consists of the following levels.

Level 1 renders numeral sign and integer and fractional parts. Part delimiters are patch words « comma », « point ».

Level 2 renders three-digit ingredients (triads). Each part is divided into three-digit ingredients beginning with integer and fractional parts separating char. Three-digit ingredients part delimiters are patch words « thousand », « million », etc.

Level 3 renders three-digit ingredients items. Part delimiters are patch words « tens », « hundreds ».

Example 2. Figure 01 illustrates the three-level generalized numeral model structure by the example of number 34 567.89. □

Figure 1: Three levels of number 34 567.89 decomposition
Three levels of number 34 567.89 decomposition
See Full Size >

The model grammar rules are shown below.

Level 1:

K = P + N 1 + E + {| N 2} + B

P = P -| P +| P 0

Level 2:

N 1 = С 0| N 10| N 11| N 12|...| N 1 i |...

N 2 = ({ T 1| C 0} + N 2)| T 1| C 0

N 10 = T + M 0

N 11 = T + M 1 + {| N 10}

N 12 = T + M 2 + {| N 10| N 11}

...

N 1 i = T + M i + {| N 11| N 12|...| N 1( i –1)}

...

Level 3:

T = T 1| T 2| T 3

T 1 = C 1| C 2| C 3| C 4| C 5| C 6| C 7| C 8| C 9

T 2 = T 1 + D 1+ {| T 1}

T 3 = T 1 + D 2 + {| T 1| T 2}

The model is a generalized numeral structure using in programming of number-into-numeral, numeral-into-number, and translating algorithms.

Converting algorithms

We use Markov normal algorithms (Markov, 1954) to implement converting algorithms.

We add new operations and symbols to simplify the algorithm normal scheme:

→ + – to add symbol at the end of the word;

+ – to add symbol at the beginning of the word;

[ Z 0] k k zeros sequence;

_ (underlining) – space symbol.

In this section we present some basic converting algorithms.

Number-into-model integer part converting algorithm replaces digits by numeral terms.

1) Z k γ 1 j γ 2 j C k M j ; j = 0, 1, 2, …; k = 1, 2, …, 9

2) Z k γ 2 j γ 3 j C k D 1; j = 0, 1, 2, …; k = 1, 2, …, 9

3) Z k γ 3 j γ 1 j + 1 C k D 2 ; j = 0, 1, 2, …; k = 1, 2, …, 9

4) Z 0 Z 0 Z 0 γ 1 j γ 1 j + 1 ; j = 0, 1, 2, …

5) Z 0 γ 1 j γ 2 j M j ; j = 0, 1, 2, …

6) Z 0 γ 2 j γ 3 j ; j = 0, 1, 2, …

7) Z 0 γ 3 j γ 3 j ; j = 0, 1, 2, …

8) γ 2 0 M 0 E P 0 C 0 EB

9) S - γ i j C k P - C k •; i = 1, 2, 3; j = 0, 1, 2, …; k = 0, 2, …, 9

10) γ i j C k P + C k •; i = 1, 2, 3; j = 0, 1, 2, …; k = 0, 2, …, 9

11) J γ 1 0 E

12)→ + γ 1 0 EB

An index symbol marks processed symbol in the number.

Model-into-number integer part converting algorithm transforms a numeral integer part into a number integer part.

1) M j γ i L γ i j Z 0 4 i + 3 j L 1 ; i = 1, 2, 3;

L = 0, 1, 2, …; j = L +1, L +2, …

2) M j γ i j γ i j ; j = 0, 1, 2, …

3) C k γ 1 j γ 2 j Z k ; j = 0, 1, 2, …; k = 1, 2, …, 9

4) C k D 1 γ i j γ 3 j Z k [ Z 0]2– i ; i = 1, 2; j = 0, 1, 2, …; k = 1, 2, …, 9

5) C k D 2 γ i j γ 1 j + 1 Z k [ Z 0]3– i ; i = 1, 2, 3; j = 0, 1, 2, …; k = 1, 2, …, 9

6) P + γ i j Z k Z k •; i = 1, 2, 3; j = 0, 1, 2, …; k = 1, 2, …, 9

7) P - γ i j Z k S - Z k •; i = 1, 2, 3; j = 0, 1, 2, …; k = 1, 2, …, 9

8) P + γ i j Z 0 Z 1 Z 0 •; i = 2, 3; j = 0, 1, 2, …

9) P - γ i j Z 0 S - Z 1 Z 0 •; i = 2, 3; j = 0, 1, 2, …

10) P 0 C 0 EB Z 0

11) EB γ 1 0

12) E γ 1 0 J

The algorithm also use the index symbol .

The algorithm using the model checks numeral for errors as well. 6-9 replacements execution is a correct algorithm ending. None of the replacements execution means that a numeral contains an error.

Number-into-model fractional part converting algorithm is shown below.

1) Z k C k ; k = 0, 1, …, 9

2) → B

3) E E

In this algorithm the index symbol is a Greek alphabet letter.

Model-into-number fractional part converting algorithm replaces numeral terms while all terms will processed.

1) C k Z k ; k = 0, 1, …, 9

2) C k B Z k •; k = 0, 1, …, 9

3) EB EB

4) E E

Numeral translation order

The model terms have general nature. In each language, they have unique types. For example, in the Russian language C 0 = « ноль », C 1 = « один », etc.

In some languages, numerals are formed by the rules that are different from the model rules. In this case, numeral is converted by algorithms. Two algorithms are necessary:

1) numeral-into-model converting algorithm transforming a numeral in the model representation into a numeral of the target language;

2) model-into-numeral converting algorithm transforming a numeral of the source language into a model numeral.

These algorithms carry out the opposite actions.

Number-into-numeral converting includes four steps:

1) to execute the number-into-model integer part converting algorithm;

2) to execute the number-into-model fractional part converting algorithm;

3) to execute the model-into-numeral converting algorithm if it exists;

4) to replace model terms by language symbols in required case.

Numeral-into-number converting consists of the following steps:

1) to replace language symbols by model terms;

2) to execute the numeral-into-model converting algorithm if it exists;

3) to execute the model-into-number fractional part converting algorithm;

4) to execute the model-into-number integer part converting algorithm.

Using the model, we can translate numerals. Translation of a numeral from the L1-language into the L2-language includes four steps:

1) to replace the L1-language symbols by the model terms;

2) to execute numeral-into-model converting algorithm if it exists for the L1-language;

3) to execute model-into-numeral converting algorithm if it exists for the L2-language;

4) to replace the model terms by the L2-language symbols in required case.

All algorithms in this paper are implemented in web-application.

Web-application with knowledge-testing function

In 2012 we have developed (with Dmitriy Tsybulko) a web-application for processing of the natural language cardinal numerals. The web-application is available to users of the Internet at http://prutzkow.com/en-us/numbers/ (the English-language version) or http://prutzkow.com/ru-ru/numbers/ (the Russian-language version).

The web-application has the following functions:

1) translation of numerals of Russian, English, German, Spanish and Finnish languages in any direction (because of the Interlingua-based technique);

2) number-into-numeral and numeral-into-number converting;

3) declination of numerals of the Russian language;

4) numerals converting and translating knowledge testing.

The web-application was integrated in the toolkit of the My-Polyglot.com expert network for translators.

User request statistics

Each request to the web-application is recorded in the log. The log uses for web-application debugging and analyzing of request statistics. A record of the log includes the following request data:

There are more than 200,000 records in the log in this moment.

We analyzed the log and present numeral translating direction statistics in Table 01 .

Table 1 -
See Full Size >

In Table 01 , each cell contains three values. The upper value corresponds to 2014, the middle value corresponds to 2015, and the lower value corresponds to 2016.

The most of users of the web-application reside in the US and Russia (Table 02 ). The trend has not changed for the last three years. Users of the web-application live in more than 100 countries on all permanently inhabited continents. The data for Table 02 includes all requests, even erroneous.

Table 2 -
See Full Size >

Conclusion

In the result of this research, we have got the following answers to research questions:

1. The Interlingua-based technique is effective for the numeral processing and translating. The technique allows adding numerals of the new natural language easy writing two converting algorithms: numeral into the Interlgua and the Interlingua into numeral.

2. The Interlingua representation for the numeral processing and translating has a numeral-like structure. We have developed the three-level generalized numeral model describing the Interlingua representation structure.

3. To make practical results of our research accessible for users in countries of the World we have developed the web-application for numeral processing and translation with knowledge-testing function. The web-application is based on the model as the Interlingua representation.We have received many good comments from users of our web-application. The web-application was integrated in a complex linguistic web-portal for translators.

We have published the result of this research in Russian scientific journals as well.

Acknowledgments

We are grateful to our co-author Dmitriy Tsybulko for collaboration in this research.

References

  1. Dillon, S., Fraser, J. (2006). Translators and TM: An Investigation of Translators’ Perceptions of Translation Memory Adoption. Machine Translation, 20 (2), pp 67-79.
  2. Dorr, B. J., Hovy, E., Levin, L. (2004). Machine Translation: Interlingual Methods, Encyclopedia of Language and Linguistics. 2nd ed., Brown, Keith (ed.).
  3. Hardegree, G. (1999). Symbolic Logic, First Course, 3rd ed. McGraw-Hill.
  4. Lampert, A. (2004). Interlingua in Machine Translation, Technical Report.
  5. Lee, J., Seneff, S. (2005). Interlingua-Based Translation for Language Learning Systems. In Proc. of ASRU, Cancun, Mexico.
  6. Markov, A. A. (1954). Theory of algorithms (in Russian). Akad. Nauk SSSR (English trans. published by the Israel Program for Scientific Translation, Jerusalem, Vol. XLII, 1962).
  7. Planas, E., Furuse, O. (1999). Formalizing Translation Memories. In Proc. of MT Summit VII, Singapore, September 13-17, 1999, pp. 331-339.

Copyright information

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

About this article

Publication Date

13 December 2017

eBook ISBN

978-1-80296-032-7

Publisher

Future Academy

Volume

33

Print ISBN (optional)

-

Edition Number

1st Edition

Pages

1-481

Subjects

Cognitive theory, educational equipment, educational technology, computer-aided learning (CAL), psycholinguistics

Cite this article as:

Prutzkow, A. (2017). Interlingua-Based Numeral Translation In Web-Application With Knowledge-Testing. In S. B. Malykh, & E. V. Nikulchev (Eds.), Psychology and Education - ICPE 2017, vol 33. European Proceedings of Social and Behavioural Sciences (pp. 290-298). Future Academy. https://doi.org/10.15405/epsbs.2017.12.29