Thematic Roles-Based Translation in MT Systems

— The main purpose of this paper is to identify a particular thematic roles inventory that can improve the product of machine translation systems. Thematic role relations are among the semantic relations that can disambiguate the meaning of a considerable number of lexical items. Some verbs, for instance, are ambiguous for MT systems and hence are not accurately translated. The meaning of such verbs can be disambiguated by identifying the thematic role relations of their predicates. For example, ‘ to express ’ has two different meanings depending on its thematic role relations of its Predicate. When the verb assigns Patient <liquid> for its Object, it means ‘to squeeze out’, e.g., ‘Italians express coffee”. When it assigns Theme <letter> or <package> for its Object, it has the meaning of ‘sending by rapid transport’, e.g., ‘She expressed the letter to Florida’. The present research will focus only on a group of English verbs that convey a variety of meanings. It will show several problems in the translation of sample verbs due to the lack of thematic roles in the core of the system. The implementation is made on three MT systems: Al Wafi, Sakhr and Google. They all produce incorrect translations of the sample verbs. A suggested translation is proposed for each verb after analyzing its thematic roles and selectional restrictions.


I. INTRODUCTION
A catchall definition of thematic roles can be set as the semantic relations that hold between the verb and the different arguments that can be assigned to this verb. However, they cannot be described in semantic terms only, as Dowty assumes they are "creatures of syntax-semantics interface, and thus require a sound semantic theoretical basis as well as a syntactic one …" (Dowty1991:548). Thematic roles have developed through different linguistic stages starting from Gruber (1965), who introduced the concept using the term thematic relations, and Government and Binding theory (GB) in which thematic relations were introduced in a pure syntactic form as Theta Roles. Then the term was developed into a different semantic concept by Jackendoff (1972) who called these semantic relations 'Thematic Relations'. However, thematic relations can be described as corresponding to Fillmore's Deep Cases that were introduced in the structure of his Case Grammar (Fillmore, 1968). The present research deals with the concept using the term Thematic Roles. It also avails of what Dowty calls Thematic Proto Roles (Dowty, 1991).

II. CRITERIA OF SAMPLE VERBS
The choice of successful samples is mainly based on the verb's ability to assign thematic roles and whether it is possible to map the deep structure to the surface structure of the nominal arguments. For a sample to be successful, some criteria should be followed. First, the sample should be a verb. The scope is limited to verbs only since they are the predicates that assign the thematic roles. Second, it does not, however, deal with all verbs in English. The focus is on the verbs that can convey a variety of meanings. Such verbs may cause some sort of ambiguity during translation and need to be disambiguated. Third, the successful sample verbs indicate that each verb can assign different thematic roles. The diversity of roles is supposed to lead to different translations of the same verb. On the contrary, the unsuccessful sample verbs indicate that each verb bears a single role and hence has a single meaning or translation. In this case, there is no need to conduct such an analysis since there would not be ambiguity in the translation of verbs with single roles.

III. THEMATIC ROLE-BASED TRANSLATION
The significance of developing thematic roles for MT systems lies in the fact that thematic roles are not merely semantic but rather conceptual relations that hold between the predicate and its arguments (Wagner, 2005). MT systems generally fail to produce proper translation in such cases that need thematic-role-based disambiguation. Successful sample verbs are presented in different sentences. Each sentence will be submitted to translation into Arabic via three MT systems; Al Wafi, Google and Sakhr. The outcome translations show mistranslated parts. A suggested successful translation is proposed for each verb in each sentence. It is, simply, a presentation of how each sample verb would be correctly translated if the thematic roles (along with the selectional restrictions) were added.
The following examples present the transitive verb break with two meanings: to cause to separate or divide into pieces and to breach or violate. However, the translation of each sentence produced by the three MT systems does not differentiate between these two distinct meanings.  It seems that the MT system may recognize only one meaning of the transitive form of the verb break: to cause to separate or divide into pieces. In 1.1 He broke the glass plate and 1.2 The ball broke the window; the three systems deal with the verb break as having the meaning to cause to separate or divide into pieces. In this context, the verb assigns an Agent <Animate> for its Subject and a Patient<Physical entity> for its Object. As such, it has the meaning to cause to separate or divide into pieces. The successful translation of break in such a context is ‫.'كسر'‬ This successful translation is produced by the three MT systems. However, the same translation ‫'كسر'‬ cannot be adopted for the same verb break in contexts where it assigns Agent <Animate> for its Subject but Theme <Legal Agreement> for its Object. The change in the thematic roles turns break to mean breach or violate. As such, it should be translated into ‫'خرق'‬ not ‫.'كسر'‬ In 1.3 She broke the law, both Al Wafi and Google keep the same understanding of the verb and produce the same translation which is mistranslation in this context. On the other hand, Sakhr succeeds in producing the proper translation ‫.'خرق'‬ It cannot be assumed that Sakhr is fed with the right logic that produces the proper translation for break and other similar verbs. In 1.4 They broke the contract, Sakhr fails in producing the proper translation of break in a similar context where the verb assigns Agent <Animate> for its Subject but Theme <Legal Agreement> for its Object. This means that the system lacks the right logic to produce the right translation. It is proposed here that thematic roles and selectional restrictions are the most fundamental base for successful translation of ambiguous verbs.  The transitive verb clap in 2.1 assigns an Agent for its Subject. It imposes that such Agent should be <Bird>. For its Object, it assigns Theme <Wings>. In this context, clap means to flap. This meaning is not recognized by any of the three systems. They all mistranslate the verb in 2.1 into ‫'صفق'‬ or applaud, whereas the proper translation in such context is to flap or ‫'رفرف'‬ as it is presented in the proposed translation. In 2.2, the same verb clap assigns an Agent <Human> for its Subject and a Patient <Hands> for its Object. In this context clap means to applaud. It seems that only such context is recognized by the three MT systems. In 3.1 and 3.2, the verb cure has the meaning heal. It assigns the same Theme and imposes the same selectional restrictions on its Object <Disease>. Although the Subject is Agent <Animate> in 3.1 the doctor and a Force <Inanimate> in 3.2 tablets, this does not affect the meaning of the verb. In this context cure is successfully translated into ‫'عالج'‬ by Al Wafi and Sakhr. For Google, it also produces a proper translation ‫'يشف'‬ despite the weak translation of the whole sentence. However, this does not mean that the three systems can keep the successful translation of the same verb in other contexts where the verb changes the thematic roles and selectional restrictions. In3.3 and 3.4, cure assigns Theme for its Object. However, it imposes that this Theme must be <Food> not <Disease>. In such a context cure means preserve rather than heal. Accordingly, it should be translated into ‫.'حفظ'‬ Yet, the MT systems fail to produce this proper translation due to their inability to recognize the change in the selectional restrictions imposed on the Object.  In 4.1 and 4.2, the verb cut as a transitive verb assigns an Agent <Animate> for its Subject and a Patient <Material> or <Cord> for its Object. In this context, cut means to separate or divide with an instrument. The three MT systems are fed with this context and produce the correct translation ‫.'قطع'‬ Google presents alternative successful translations for cut such as ‫,'قص'‬ ‫,'شق'‬ and ‫.'استأصل'‬ However, the same verb cut may change its meaning by changing the thematic roles it assigns for the Subject and Object. In 4.3 and 4.4, cut assigns an Agent <Animate> for its Subject and a Theme <Cereal Grass> for its Object. In this context, cut means to harvest or reap. However, the three systems are not able to disambiguate the meaning of the verb in this context. As such, they keep the same translation ‫'قطع'‬ or ‫'قص'‬ that is obviously incorrect. The proper translation should be ‫'حصد'‬ as it is shown in the proposed translations in 4.3 and 4.4, but since the MT systems do not consider the thematic roles in their logic, they produce mistranslation in some contexts. In 5.1 and 5.2, the verb 'die' assigns Experiencer <Animate> for its Subject. Since it is an intransitive verb it has no Object. Thus, the disambiguation of the meaning of die depends on the thematic roles and restrictions it imposes on its Subject. In the context where it assigns an Experiencer <Aniamte> for its Subject, die means perish or pass away. Al Wafi and Sakhr successfully translate it into ‫'مات'‬ and Google suggests more than one successful translation ‫'مات'‬ or ‫.'توفى'‬ However, in 5.3 and 5.4 where die assigns a Patient <Inanimate> for its Subject, it has the meaning of stop or breakdown. In this context the verb is mistranslated by the three systems which keep the same meaning of the verb die as in 5.1 and 5.2 perish or pass away. The proper translation is proposed, and die should be translated into breakdown or ‫'تعطل'‬ rather than ‫.'مات'‬

Sample sentences Translations
The intransitive form of the verb draw may have two different meanings depending on the thematic roles it assigns for its Subject. In 6.1 She is drawing; the verb assigns an Agent and restricts it to <Human> only. In this context, draw means to make drawings or create images.  Al Wafi provides the successful translation of the verb among other alternatives ‫تثير'‬ ‫تلفت،‬ ‫تسحب،‬ ‫تجتذب،‬ ‫.'ترسم،‬ Google also can produce the successful translation of the verb draw in this context; ‫.'رسم'‬ However, only Sakhr fails to provide the right translation. It translates draw into ‫.'تقترب'‬ In 6.2 the patient's veins don't draw easily, the same intransitive form of the verb draw has a different meaning. It does not mean make drawings or create images. The verb assigns different DOI: http://doi.org/10.24086/ICLANGEDU2023/paper.939 thematic roles for its Subejct and hence its meaning is changed. In 6.2, the verb draw assigns a Theme for its Subject which is restricted to <Vessel> only. Thus, draw here does not mean making drawings. Vessels cannot make drawings or create images. In this context, draw means to cause to flow a liquid. However, the three MT systems fail to produce any successful translation of draw in this context. Al Wafi translates it into ‫'تسحب'‬ and Google provides some alternatives such as ‫تنسحب'‬ ‫تجتذب،‬ ‫.'توجة،‬ Sakhr keeps the same translation ‫'تقترب'‬ as in 6.1. However, it is proposed that the successful translation of draw where it assigns a Theme <Vessel> for its Subject should be ‫'تسيل'‬ or ‫.'تنساب'‬ In 7.1, the transitive verb eat is recognized by AL Wafi, Google and Sakhr as take in solid food and hence properly translated into ‫.'اكل'‬ In this context, eat assigns an Agent <Animate> for its Subject and Patient <Food> for its Object. However, by changing the thematic roles, eat changes its meaning. In 7.2 and 7.3, the verb assigns Force <Chemical> or <Air> for its Subject and Patient <Inanimate> for its Object. Accordingly, the meaning is changed to be corrode, and the three MT systems cannot produce the successful translation.  The systems still deal with the verb eat as having the meaning of take in solid food. Al Wafi and Sakhr translate eat into ‫'اكل'‬ and Google suggests the same translation as well as ‫'تناول'‬ which has the same meaning. However, the successful translation of eat in this context should be ‫'يصديء'‬ rather than ‫.'ياكل'‬ In examples 8.1, 8.2, 4.3 and 8.4, the three MT systems deal with the verb express into having the meaning of state or set forth in word'. Al Wafi translated express as ' ‫,'يبدي‬ Google as ‫'أعرب'‬ and Sakhr as ‫.'عبر'‬ However, the analysis shows other two different meanings of express. In 8.1, the verb assigns Agent <Animate> for its Subject Italians and in 8.2, it assigns Force <Device>machine. In both examples, the verb assigns Patient <Liquid> for its Object. This limits the meaning of the verb to press out or squeeze out. It is worth mentioning here that although the three systems mistranslate the verb in this context, Google produces a near translation of the same verb express with milk as an object in the same context. It translates express milk as ‫الحليب'‬ ‫.'شفط‬ In addition, Al Wafi can recognize the meaning of express as squeeze out or ‫'عصر'‬ with milk only rather than any other liquid. It translates express milk as ' ‫يعصر‬ ‫.'الحليب‬ As for Sakhr, it fails to produce successful translation of express milk at all. It translates it as ‫عن'‬ ‫.'عبر‬ Moreover, the verb express has more meanings other than state or squeeze. In 8.3 and 8.4, the same verb express is translated by Al Wafi as ‫'ابدى'‬ state or set forth in words, by Google as ‫'أعرب'‬ and by Sakhr as ‫.'عبر'‬  However, it means send by rapid transport since it assigns Agent <Human> for its Subject and Theme <Package> for its Object as previously shown in the analysis. The proper translation proposed in such context is ‫'ارسل'‬ rather than ‫.'ابدوا'‬ It is clear in 9.1 and 9.2 that the three MT systems produce the proper translation of the verb gain in such a context where the verb assigns Benefactive <Animate> for its Subject and a Theme <Abstract or Physical Entity> for its Object. The meaning of gain in this context is acquire or win and the proper translation is ‫'كسب'‬ or ‫.'اكتسب'‬ However, the systems keep the same translation for the same verb in 9.3 and 9.4 though they deal with the verb in a different context. In that context, the verb assigns an Agent <Animate> for its Subject, and Goal <Destination> or <Location>for its Object. The meaning of gain here is reach rather than acquire and the proper translation should be ‫'وصل'‬ as shown in the proposed translation for 9.3 and 9.4.  In 10.1 and 10.2, it is not significant to consider the thematic roles assigned by the verb pan for its Subject since it does not affect the meaning of the verb. The fundamental roles that can help disambiguate the meaning of the verb pan are the roles assigned for the Object.    In 11.1 and 11.2, the three MT systems can produce the correct translation of the verb read. The systems deal with the verb in its usual sense to interpret something that is written. As such, 'read' is translated as ' ‫'يقرأ‬ in the context where it assigns an Agent <Animate> for its Subject. However, in 11.3 and 11.4, the same verb is mistranslated by the three MT systems. Al Wafi and Sakhr translate read as ' ‫,'تقرأ‬ whereas Google provides more translations ‫'تقرأ'‬ and ‫.'تنص'‬ This can be due to the change in the thematic roles and selectional restrictions the verb assigns for its Subject in this context. In 11.3 and 11.4, the verb read assigns an Experiencer <Inanimate> for its Subject (play and watch). In this context, read means to indicate or to show. However, the systems produce the same translation ‫'يقرأ'‬ which is a mistranslation. The proper translation as proposed should be ‫.'يبدو'‬ When the verb scrub assigns an Agent <Animate> for its Subject and a Patient <Physical Entity> for its Object, it has the meaning to rub. In 18.1, Al Wafi translates scrub properly as the Arabic ‫.'حك'‬ Google provides ‫'غسل'‬ and ‫'نقى'‬ whereas Sakhr translates it as ‫.'نظف'‬ It seems that this is the only sense of the verb that the systems are fed with. Thus, in 18.2, the systems produce a mistranslation of the verb scrub. In this context, scrub assigns an Agent for its Subject. This Agent is limited to <Human> rather than <Animate>. For its Object, scrub assigns a Theme <Plan> or <Event>. In such context with Agent Subject and Theme Object, scrub cannot mean to rub but rather to cancel. The correct translation should be ‫'ألغى'‬ not ‫.'حك'‬ The mistranslation is due to the fact that the MT systems are not able to recognize the change in the thematic roles that leads to the change in the meaning of the verb.

IV. CONCLUSION
The main results show that most of the sample verbs show a change in their meanings due to a change in their thematic roles as well as their selectional restrictions. This means that the majority of the thematic roles candidates proved that they affect the meaning of the verb. As such, they should be considered for verb sense disambiguation in MT systems.
The following flowchart shows a simple representation of the logical steps the MT system would follow for the purpose of word sense disambiguation. First, the sentence is entered for translation, e.g. She died from cancer. The first process to be made is detecting the verb of the sentence: died. Reviewing the knowledge base, a choice has to be made as for whether the verb is transitive or intransitive. Some verbs have both forms. In such cases the machine has to detect whether there is an object (transitive) or not (intransitive). In the given example, 'died' is an intransitive verb. If the verb is intransitive, then the machine has to identify the subject only (she). The following decision is to review what selectional restriction is imposed on this subject. If it is <Animate>, then it is Experiencer thematic role. If it is <Inanimate>, then it is Patient thematic role. In the example, she is animate and, hence, it is Experiencer. After that the decision of word sense disambiguation is made. Since the verb die assigns an <Animate> Experiencer for its Subject, it means perish. However, if it assigns an <Inanimate> Patient for its Subject, it means break down. The final process to be made is translating the verb. The verb die in the sense of perish is translated into the Arabic verb ‫.'مات'‬

V. RECOMMENDATION
The focus of this thesis is the verb. It examines how the meaning of the verb is affected by the change of the thematic roles and selectional restrictions assigned by the verb. It would be useful to examine whether thematic roles assigned by a given verb affect the meaning of the other nominal arguments not only the predicate of the sentence.