Авто Автоматизация Архитектура Астрономия Аудит Биология Бухгалтерия Военное дело Генетика География Геология Государство Дом Другое Журналистика и СМИ Изобретательство Иностранные языки Информатика Искусство История Компьютеры Кулинария Культура Лексикология Литература Логика Маркетинг Математика Машиностроение Медицина Менеджмент Металлы и Сварка Механика Музыка Население Образование Охрана безопасности жизни Охрана Труда Педагогика Политика Право Приборостроение Программирование Производство Промышленность Психология Радио Регилия Связь Социология Спорт Стандартизация Строительство Технологии Торговля Туризм Физика Физиология Философия Финансы Химия Хозяйство Ценнообразование Черчение Экология Эконометрика Экономика Электроника Юриспунденкция

Direct systems, transfer systems and interlinguas

Читайте также:

Multilingual versus bilingual systems

Bilingual systems may be uni-directional or bi-directional; that is to say,

they may be designed to translate from one language to another in one direction

only, or they may be capable of translating from both members of a language

pair.

A system involving more than two languages is a multilingual system. At one

extreme a multilingual system might be designed for a large number of languages

in every combination, as is the case of the European Commission's Eurotra project (Eurotra was an ambitious machine translation project established and funded by the European Commission from 1978 until 1992).

A more modest multilingual system might translate from English into three other

languages in one direction only (i.e. three language pairs).

An obvious question at this point is whether a truly multilingual system is

in practice — as opposed to theory — preferable to a bilingual system designed

for a specific language pair. There are arguments on both sides; two of the most

successful MT systems illustrate the pros and cons very well: GETA's multilingual

Ariane system and TAUM's English-French Météo system.

The Météo system is an example of a bilingual system which, although separating analysis and generation, exploits similarities and regular equivalences of English and French lexicon and syntax at every stage of the translation process.

Direct systems, transfer systems and interlinguas

There are broadly three basic MT strategies. The earliest historically is the 'direct

approach', adopted by most MT systems of what has come to be known as the first

generation of MT systems. In response to the apparent failure of this strategy,

two types of 'indirect approach' were developed: the 'transfer method', and the

use of an 'interlingua'. Systems of this nature are sometimes referred to as second

generation systems.

The direct approach is an MT strategy which lacks any kinds of intermediate

stages in translation processes: the processing of the source language input text

leads 'directly' to the desired target language output text. In certain circumstances

the approach is still valid today — traces of the direct approach are found even

in indirect systems such as Météo — but the archetypal direct MT system has a

more primitive software design.

The severe limitations of this approach should be obvious. It can be characterized

as 'word-for-word' translation with some local word-order adjustment. It

gave the kind of translation quality that might be expected from someone with

a very cheap bilingual dictionary and only the most rudimentary knowledge of

the grammar of the target language: frequent mistranslations at the lexical level

and largely inappropriate syntax structures which mirrored too closely those of

the source language.

The failure of the first generation systems led to the development of more sophisticated linguistic models for translation. In particular, there was increasing support for the analysis of source language texts into some kind of intermediate representation — a representation of its 'meaning' in some respect — which could form the basis of generation of the target text. This is in essence the indirect method, which has two principal variants.

The first is the interlingua method — also the first historically where the source text is analysed in a representation from which the target text is directly generated. The intermediate representation includes all information necessary for the generation of the target text without 'looking back' to the original

text. The representation is thus a projection from the source text and at the same

time acts as the basis for the generation of the target text; it is an abstract

representation of the target text as well as a representation of the source text. The

method is interlingual in the sense that the representation is neutral between two or

more languages. In the past, the intention or hope was to develop an interlingual

representation which was truly 'universal' and could thus be intermediary between

any natural languages. At present, interlingual systems are less ambitious.

The interlingua approach is clearly most attractive for multilingual systems.

Target languages have no effect on any processes of analysis; the aim of analysis is the derivation of an 'interlingual' representation.

Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach and the transfer approach.

In the direct approach, words are translated directly without passing through an additional representation. In the transfer approach the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.

The second variant of the indirect approach is called the transfer method.

Strictly speaking all translation systems involve 'transfer' of some kind, the

conversion of a source text or representation into a target text or representation.

The term 'transfer method' has been applied to systems which interpose bilingual

modules between intermediate representations. Unlike those in interlingual systems

these representations are language-dependent: the result of analysis is an abstract

representation of the source text, the input to generation is an abstract representation of the target text. The function of the bilingual transfer modules is to convert source language (intermediate) representations into target language (intermediate) representations. Since these representations link separate

modules (analysts, transfer, generation), they are also frequently referred to as

interface representations.

In the transfer approach there are therefore no language-independent representations:

the source language intermediate representation is specific to a

particular language, as is the target language intermediate representation. Indeed

there is no necessary equivalence between the source and target intermediate

(interface) representations for the same language.

In comparison with the interlingua type of multilingual system there are clear

disadvantages in the transfer approach. The addition of a new language involves

not only the two modules for analysis and generation, but also the addition of

new transfer modules, the number of which may vary according to the number

of languages in the existing system: in the case of a two-language system, a

third language would require four new transfer modules.

Why then is the transfer approach so often preferred to the interlingua

method? The first reason has already been mentioned: the difficulty of devising

language-independent representations. The second is the complexity of analysis

and generation grammars when the representations are inevitably far removed

from the characteristic features of the source and target texts. By comparison, the

relative complexity of the analysis and generation modules in a transfer system is

much reduced, because the intermediate representations involved are still language dependent abstractions. At the same time, if the

design is optimal, the work of transfer modules can be greatly simplified and the

creation of new ones can be less onerous than might be imagined.

Although the so-called `transfer' systems dominated the scene (e.g. Ariane, Metal, SUSY, and Eurotra), there were also various `interlingua' systems. Some were still essentially linguistics-oriented (DLT and Rosetta), but others adopted knowledge-based approaches, making use of non-linguistic information related to

the domains of texts to be translated.

MTS can be classified according to the approach they are based on. There are following approaches:

Rule-based approach – it is original method used by most MTS. RBMT is a general term that denotes machine translation systems based on linguistic information about source and target languages basically retrieved from (bilingual) dictionaries and grammars covering the main semantic, morphological, and syntactic regularities of each language respectively. During the 1980s the dominant framework of MT research was the approach based essentially on linguistic rules of various kinds: rules for syntactic analysis, lexical rules, rules for lexical transfer, rules for syntactic generation, rules for morphology, etc. Having input sentences (in some source language), an RBMT system generates them to output sentences (in some target language) on the basis of morphological, syntactic, and semantic analysis of both the source and the target languages involved in a concrete translation task. Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language.

Statistical machine translation (SMT) is a machine translation where translations are generated on the basis of statistical models whose parameters are taken from the analysis of bilingual text corpora. In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). Such bilingual or multilingual text corpora can be any sets of official documents issued by international organization or state governments. Thus, the quality of the translation depends on the scope and quality of bilingual text databases. Statistical approach becomes more frequently used nowadays. The first ideas of statistical machine translation were introduced by Warren Weaver in 1949. Statistical Machine Translation as a research area started in the late 1980s with the Candide project at IBM. Statistical machine translation was re-introduced in 1991 by researchers at IBM's Thomas J. Watson Research Center^[2] and has contributed to the significant resurgence in interest in machine translation in recent years.

Rule Based MT

+ consistent and predictable quality

+ out-of-domain translation quality

+ knows grammatical rules

+ high performance and

+ consistence between versions

- lack of fluency

- hard to handle exceptions to rules

- high development and customization costs

Statistical MT

- unpredictable translation quality

- poor out-of-domain quality

- does not know grammar

- high CPU and disc space requirements

- inconsistency between versions

+ good fluency

+ good for catching exceptions in grammar

+ cheaper development costs

Given the overall requirements, there is a clear need for a third approach through which users would reach better translation quality and high performance (similar to rule-based MT), with less investment (similar to statistical MT). This approach is hybrid machine translation combines advantages of rule-based and statistical MTS. The approaches differ in a number of ways:

Rules post-processed by statistics: Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine.
Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating.

Several MTS used rule-based approach and then combined it with the statistical. Among them were Systran and Prompt translation programs which in 2010 switched to the hybrid machine translation approach.

Поиск по сайту:

Все материалы представленные на сайте исключительно с целью ознакомления читателями и не преследуют коммерческих целей или нарушение авторских прав. Студалл.Орг (0.009 сек.)

Главная | О проекте | Полезные cсылки | Контакты | Случайная страница