Paper: A Statistical Approach To Machine Translation

Webmaster's Note: The whole dataset is available Here. Please download the dataset instead of crawling the website.

Basic Info:

id: J90-2002
title: A Statistical Approach To Machine Translation
authors: Brown, Peter F. (IBM T.J. Watson Research Center, Yorktown Heights NY), Cocke, John (IBM T.J. Watson Research Center, Yorktown Heights NY), Della Pietra, Stephen A. (IBM T.J. Watson Research Center, Yorktown Heights NY), Della Pietra, Vincent J. (IBM T.J. Watson Research Center, Yorktown Heights NY), Jelinek, Frederick (IBM T.J. Watson Research Center, Yorktown Heights NY), Lafferty, John D. (IBM T.J. Watson Research Center, Yorktown Heights NY), Mercer, Robert L. (IBM T.J. Watson Research Center, Yorktown Heights NY), Roossin, Paul S. (IBM T.J. Watson Research Center, Yorktown Heights NY)
venue: CL
year: 1990
pdf: link


Abstract






Incoming Citations
IdTitle
H91-1026Identifying Word Correspondences In Parallel Texts
P91-1017Two Languages Are More Informative Than One
P91-1022Aligning Sentences In Parallel Corpora
P91-1023A Program For Aligning Sentences In Bilingual Corpora
P91-1034Word-Sense Disambiguation Using Statistical Methods
A92-1014Automatic Learning For Semantic Collocation
C92-2069A Linear Least Squares Fit Mapping Method For Information Retrieval From Natural Language Texts
C92-2079Aligning Sentences In Bilingual Texts French - English And French - Arabic
C92-2080Translation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages
C92-2117Shake-And-Bake Translation
H92-1053Dividing And Conquering Long Sentences In A Translation System
J92-4003Class-Based N-Gram Models Of Natural Language
P92-1023GPSM: A Generalized Probabilistic Semantic Model For Ambiguity Resolution
E93-1015Automating The Acquisition Of Bilingual Terminology
H93-1040Evaluation Of Machine Translation
J93-1001Introduction To The Special Issue On Computational Linguistics Using Large Corpora
J93-1004A Program For Aligning Sentences In Bilingual Corpora
J93-1006Text-Translation Alignment
J93-2003The Mathematics Of Statistical Machine Translation: Parameter Estimation
P93-1001Char Align: A Program For Aligning Parallel Texts At The Character Level
P93-1002Aligning Sentences In Bilingual Corpora Using Lexical Information
P93-1004Structural Matching Of Parallel Texts
W93-0301Robust Bilingual Word Alignment For Machine Aided Translation
A94-1006Termight: Identifying And Translating Technical Terminology
A94-1016Three Heads Are Better Than One
C94-1019Two Types Of Adaptive MT Environments
C94-2175Bilingual Text Matching Using Bilingual Dictionary And Statistics
C94-2178K-Vec: A New Approach For Aligning Parallel Texts
J94-4003Word Sense Disambiguation Using A Second Language Monolingual Corpus
P94-1048Dual-Coding Theory And Connectionist Lexical Selection
W94-0101Qualitative And Quantitative Designs For Speech Translation (Invited Talk)
J95-4004Transformation-Based-Error-Driven Learning And Natural Language Processing: A Case Study In Part-Of-Speech Tagging
P95-1033An Algorithm For Simultaneously Bracketing Parallel Texts By Aligning Words
P95-1035An Efficient Generation Algorithm For Lexicalist MT
P95-1050Identifying Word Translations In Non-Parallel Texts
W95-0106Trainable Coarse Bilingual Grammars For Parallel Text Bracketing
C96-1033FeasPar - A Feature Structure Parser Learning To Parse Spoken Language
C96-1037Aligning More Words With High Precision For Small Bilingual Corpora
J96-1001Translating Collocations For Bilingual Lexicons: A Statistical Approach
J96-1002A Maximum Entropy Approach To Natural Language Processing
P96-1009A Robust System For Natural Spoken Dialogue
P96-1021A Polynomial-Time Algorithm For Statistical Machine Translation
P96-1023Head Automata And Bilingual Tiling: Translation With Minimal Representations (Invited Talk)
P96-1041An Empirical Study Of Smoothing Techniques For Language Modeling
W96-0210The Measure Of A Model
J97-2004A Class-Based Approach To Word Alignment
J97-3002Stochastic Inversion Transduction Grammars And Bilingual Parsing Of Parallel Corpora
P97-1046A Comparison Of Head Transducers And Transfer For A Limited Domain Translation Application
P97-1063A Word-To-Word Model Of Translational Equivalence
W97-0407Using Categories In The EUTRANS System
W97-0408English-To-Mandarin Speech Translation With Head Transducers
C98-1004A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
C98-1006Automatic Acquisition of Hierarchical Transduction Models for Machine Translation
C98-1106Term-list Translation using Mono-lingual Word Co-occurrence Vectors
C98-1113Methods and Practical Issues in Evaluating Alignment Techniques
C98-2225Machine Translation with a Stochastic Grammatical Channel
P98-1004A Simple Hybrid Aligner For Generating Lexical Correspondences In Parallel Texts
P98-1006Automatic Acquisition Of Hierarchical Transduction Models For Machine Translation
P98-1110Term-List Translation Using Mono-Lingual Word Co-Occurrence Vectors
P98-1117Methods And Practical Issues In Evaluating Alignment Techniques
P98-2230Machine Translation With A Stochastic Grammatical Channel
W98-1230A Constructivist Approach To Machine Translation
W98-1307Learning Finite-State Models For Language Understanding
P99-1022Dynamic Nonlocal Language Modeling Via Hierarchical Topic-Based Adaptation
P99-1067Automatic Identification Of Word Translations From Unrelated English And German Corpora
P99-1068Mining The Web For Bilingual Text
W99-0602Text-Translation Alignment: Three Languages Are Better Than Two
W99-0905Resolving Translation Ambiguity Using Non-Parallel Bilingual Corpora
A00-1018An Automatic Reviser: The TransCheck System
A00-2011Word-For-Word Glossing With Contextually Similar Words
C00-1019Automated Generalization Of Translation Examples
C00-1064Structural Feature Selection For English-Korean Statistical Machine Translation
C00-2092Data-Oriented Translation
C00-2145A Model Of Competence For Corpus-Based Machine Translation
C00-2168Acquisition Of A Language Computational Model For NLP
C00-2169Processing Self Corrections In A Speech To Speech System
J00-1004Learning Dependency Translation Models As Collections Of Finite-State Head Transducers
H01-1035Inducing Multilingual Text Analysis Tools Via Robust Projection Across Aligned Corpora
N01-1015Re-Engineering Letter-To-Sound Rules
N01-1026Inducing Multilingual POS Taggers And NP Bracketers Via Robust Projection Across Aligned Corpora
W01-0504Knowledge Sources For Word-Level Translation Models
W01-1401Example-Based Machine Translation Using DP-Matching Between Work Sequences
W01-1404Approximating Context-Free By Rational Transduction For Example-Based MT
C02-1046Translation Selection Through Source Word Sense Disambiguation And Target Word Selection
C02-1064Text Generation From Keywords
C02-1099An English-Korean Transliteration Model Using Pronunciation And Contextual Rules
C02-1137A New Probabilistic Model For Title Generation
C02-1158Study Of Practical Effectiveness For Machine Translation Using Recursive Chain-Link-Type Learning
C02-1162Identifying Concepts Across Languages: A First Step Towards A Corpus-Based Approach To Automatic Ontology Alignment
J02-2001Near-Synonymy And Lexical Choice
P02-1024Exploring Asymmetric Clustering For Statistical Language Modeling
P02-1050Evaluating Translational Correspondence Using Annotation Projection
P02-1052Using Similarity Scoring To Improve The Bilingual Dictionary For Sub-Sentential Alignment
W02-0701Corpus-Centered Computation
W02-0704Finding Translation Pairs From English-Japanese Untokenized Aligned Corpora
W02-0902Learning A Translation Lexicon From Monolingual Corpora
W02-1603Plaesarn: Machine-Aided Translation Tool For English-To-Thai
W02-1604English-Japanese Example-Based Machine Translation Using Abstract Linguistic Representations
W02-1605Improving Translation Quality Of Rule-Based Machine Translation
E03-1032Efficient Search For Interactive Statistical Machine Translation
E03-1048A Corpus-Centered Approach To Spoken Language Translation
E03-1076Empirical Methods For Compound Splitting
J03-1005Word Reordering And A Dynamic Programming Beam Search Algorithm For Statistical Machine Translation
J03-3002The Web As A Parallel Corpus
J03-3003Embedding Web-Based Statistical Translation Models In Cross-Language Information Retrieval
N03-1010Greedy Decoding For Statistical Machine Translation In Almost Linear Time
N03-1018A Generative Probabilistic OCR Model For NLP Applications
P03-1011Loosely Tree-Based Alignment For Machine Translation
P03-1019A Comparative Study On Reordering Constraints In Statistical Machine Translation
W03-0304Statistical Translation Alignment With Compositionality Constraints
W03-0414Using 'smart' Bilingual Projection To Feature-Tag A Monolingual Dictionary
W03-0611Learning The Meaning And Usage Of Time Phrases From A Parallel Text-Data Corpus
C04-1030Reordering Constraints For Phrase-Based Statistical Machine Translation
C04-1060Syntax-Based Alignment: Supervised Or Unsupervised?
C04-1117Cognate Mapping - A Heuristic Strategy For The Semi-Supervised Acquisition Of A Spanish Lexicon From A Portuguese Seed Lexicon
J04-2003Statistical Machine Translation With Scarce Resources Using Morpho-Syntactic Information
J04-4002The Alignment Template Approach To Statistical Machine Translation
N04-1022Minimum Bayes-Risk Decoding For Statistical Machine Translation
N04-1023Discriminative Reranking For Machine Translation
N04-1033Improvements In Phrase-Based Statistical Machine Translation
N04-1034Improved Machine Translation Performance Via Parallel Sentence Extraction From Comparable Corpora
P04-1062Annealing Techniques For Unsupervised Statistical Language Learning
P04-1068Creating Multilingual Translation Lexicons With Regional Variations Using Web Corpora
P04-3004Subsentential Translation Memory For Computer Assisted Writing And Translation
W04-1118Do We Need Chinese Word Segmentation For Statistical Machine Translation?
W04-3009Using Higher-Level Linguistic Knowledge For Speech Recognition Error Correction In A Spoken Q/a Dialog
W04-3207Bilingual Parsing With Factored Estimation: Using English To Parse Korean
W04-3228Dependencies Vs. Constituents For Tree-Based Alignment
W04-3250Statistical Significance Tests For Machine Translation Evaluation
H05-1010A Discriminative Matching Approach To Word Alignment
H05-1050Bootstrapping Without The Boot
H05-1086A Translation Model For Sentence Retrieval
I05-1075Automatically Inducing a Part-of-Speech Tagger by Projecting from Multiple Source Languages Across Aligned Corpora
J05-4003Improving Machine Translation Performance By Exploiting Non-Parallel Corpora
P05-2016Dependency-Based Statistical Machine Translation
W05-0104A Core-Tools Statistical NLP Course
W05-0816Comparison Selection And Use Of Sentence Alignment Algorithms For New Language Pairs
W05-0827Improving Phrase-Based Statistical Translation By Modifying Phrase Extraction And Including Several Features
W05-0833Hybrid Example-Based SMT: The Best Of Both Worlds?
J06-4004N-gram-based Machine Translation
N06-1015Word Alignment Via Quadratic Assignment
N06-2011Spectral Clustering For Example Based Machine Translation
P06-1067Distortion Models For Statistical Machine Translation
P06-1092Phoneme-To-Text Transcription System With An Infinite Vocabulary
P06-1125A Phonetic-Based Approach To Chinese Chat Text Normalization
W06-1008A Fast And Accurate Method For Detecting English-Japanese Parallel Texts
W06-1905Keyword Translation Accuracy And Cross-Lingual Question Answering InChinese And Japanese
W06-3103Morpho-Syntactic Arabic Preprocessing For Arabic To English Statistical Machine Translation
W06-3108Discriminative Reordering Models For Statistical Machine Translation
W06-3110N-Gram Posterior Probabilities For Statistical Machine Translation
D07-1003What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA
D07-1055A Systematic Comparison of Training Criteria for Statistical Machine Translation
P07-2026Minimum Bayes Risk Decoding for BLEU
W07-0401Chunk-Level Reordering of Source Language Sentences with Automatically Learned Rules for Statistical Machine Translation
W07-0412Probabilistic Synchronous Tree-Adjoining Grammars for Machine Translation: The Argument from Bilingual Dictionaries
C08-1045Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences
C08-1056Normalizing SMS: are Two Metaphors Better than One ?
C08-2032Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language
D08-1065Lattice Minimum Bayes-Risk Decoding for Statistical Machine Translation
I08-2120Mining Chinese-English Parallel Corpora from the Web
I08-6003Finding parallel texts on the web using cross-language information retrieval
L08-1130Building Bilingual Lexicons using Lexical Translation Probabilities via Pivot Languages
L08-1185Building a Golden Collection of Parallel Multi-Language Word Alignment
L08-1564Automatic Translation of Biomedical Terms by Supervised Machine Learning
P08-1115Generalizing Word Lattice Translation
P08-2036Smoothing a Tera-word Language Model
W08-0301An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
W08-0302Rich Source-Side Context for Statistical Machine Translation
W08-0315The TALP-UPC Ngram-Based Statistical Machine Translation System for ACL-WMT 2008
W08-0510Design of the Moses Decoder for Statistical Machine Translation




Top Similar Papers
By Title
ID Title
C88-1016A Statistical Approach To Language Translation
N01-1018A Finite-State Approach To Machine Translation
H91-1025A Statistical Approach To Sense Disambiguation In Machine Translation
P05-2016Dependency-Based Statistical Machine Translation
I08-2087A Structured Prediction Approach for Statistical Machine Translation
P01-1020A Machine Learning Approach To The Automatic Evaluation Of Machine Translation
W03-1809A Statistical Approach To The Semantics Of Verb-Particles
P07-1091A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
J04-4002The Alignment Template Approach To Statistical Machine Translation
J91-3005Erratum To 'A Statistical Approach To Machine Translation'


By Abstract
ID Title


By Full Text
ID Title
H94-1028The Candide System For Machine Translation
W02-1405Improving A General-Purpose Statistical Translation Engine By Terminological Lexicons
P04-1066Improving IBM Word Alignment Model 1
W06-3106Phrase-Based SMT With Shallow Tree-Phrases
C98-2157Improving Statistical Natural Language Translation with Categories and Rules
P98-2162Improving Statistical Natural Language Translation With Categories And Rules
P07-1020Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction
W05-0824RALI: SMT Shared Task System Description
H01-1062The RWTH System For Statistical Translation Of Spoken Dialogues
J96-1002A Maximum Entropy Approach To Natural Language Processing


By Co-citation
ID Title Num Co-citations
J93-2003The Mathematics Of Statistical Machine Translation: Parameter Estimation 50
P91-1023A Program For Aligning Sentences In Bilingual Corpora 25
P91-1022Aligning Sentences In Parallel Corpora 23
P02-1040Bleu: A Method For Automatic Evaluation Of Machine Translation 20
P02-1038Discriminative Training And Maximum Entropy Models For Statistical Machine Translation 19
J03-1002A Systematic Comparison Of Various Statistical Alignment Models 19
P03-1021Minimum Error Rate Training In Statistical Machine Translation 18
H91-1026Identifying Word Correspondences In Parallel Texts 17
J97-3002Stochastic Inversion Transduction Grammars And Bilingual Parsing Of Parallel Corpora 17
N03-1017Statistical Phrase-Based Translation 16


Citation Summary
Citing sentences
J94-4003 1 16:730 Substantial application of semantic or pragmatic knowledge about the word and its context requires compiling huge amounts of knowledge, the usefulness of which for practical applications in broad domains has not yet been proven (e.g. , Lenat et al. 1990; Nirenburg et al. 1988; Chodorow, Byrd, and Heidron 1985).
J94-4003 2 695:730 It seems, however, that Brown et al. expect that target word selection would be determined mainly by translation probabilities (the second factor in the above term), which should be derived from a bilingual corpus (Brown et al. 1990, p. 79).
W07-0401 3 52:352 Among all possible target language sentences, we will choose the sentence with the highest probability: eI1 = argmax I,eI1 braceleftbigPr(eI 1|f J 1 ) bracerightbig (1) = argmax I,eI1 braceleftbigPr(eI 1) Pr(f J 1 |e I 1) bracerightbig (2) This decomposition into two knowledge sources is known as the source-channel approach to statistical machine translation (Brown et al. , 1990).
P95-1050 4 4:64 1 Introduction In a number of recent studies it has been shown that word translations can be automatically derived from the statistical distribution of words in bilingual paxallel texts (e. g. Catizone, Russell & Warwick, 1989; Brown et al. , 1990; Dagan, Church & Gale, 1993; Kay & Rbscheisen, 1993).
W99-0602 5 10:212 Bilingual alignments have so far shown that they can play multiple roles in a wide range of linguistic applications, such as computer assisted translation (Isabelle et al. , 1993; Brown et al. , 1990), terminology (Dagan and Church, 1994) lexicography (Langlois, 1996; Klavans and Tzoukermann, 1995; Melamed, 1996), and cross-language information retrieval (Nie et al. , * This research was funded by the Canadian Department of Foreign Affairs and International Trade (http://~.dfait-maeci.gc.ca/), via the Agence de la francophonie (http://~.
W08-0301 6 36:195 As to the pioneering IBM word-based SMT models (Brown et al., 1990), IBM models 3, 4 and 5 handle spurious source words by considering them as corresponding to a particular EMPTY word token on the English side, and by the fertility model which allows the English EMPTY to generate a certain number of foreign words.
P91-1034 7 5:89 INTRODUCTION An alluring aspect of the statistical ~pproach to machine translation rejuvenated by Brown et al. \[Brown et al. , 1988, Brown et al. , 1990\] is the systematic framework it provides for attacking the problem of lexical disambiguation.
P91-1034 8 31:89 From the Viterbi alignments for 1,002,165 pairs of short French and English sentences from the Canadian Hansard data \[Brown et al. , 1990\], we have extracted a set of 12,028,485 connections.
P91-1034 9 17:89 STATISTICAL TRANSLATION Following Brown et al. \[Brown et al. , 1990\], we choose as the translation of a French sentence F that sentence E for which Pr (E\[F) is greatest.
P91-1034 10 81:89 This system is an enhanced version of the one described by Brown et al. \[Brown et al. , 1990\] in that it uses a trigram language model, and has a French vocabulary of 57,802 words, and an English vocabulary of 40,809 words.
P91-1034 11 26:89 Brown et al. \[Brown et al. , 1990\], show an example of such an automatically derived alignment in their Figure 3.
P91-1034 12 23:89 The translation model used by Brown et al. \[Brown et al. , 1990\] incorporates the concept of an alignment in which each word in E acts independently to produce some of the words in F. If we denote a typical alignment by A, then we can write the probability of F given E as a sum over all possible alignments: Pr (FIE) = ~ Pr (F, AlE ).
A94-1006 13 115:178 have been used in statistical machine translation (Brown et al. , 1990), terminology research and translation aids (Isabelle, 1992; Ogden and Gonzales, 1993; van der Eijk, 1993), bilingual lexicography (Klavans and Tzoukermann, 1990; Smadja, 1992), word-sense disambiguation (Brown et al. , 1991b; Gale et al. , 1992) and information retrieval in a multilingual environment (Landauer and Littman, 1990).
A94-1006 14 114:178 3 Bilingual Task: An Application for Word Alignment 3.1 Sentence and word alignment Bilingual alignment methods (Warwick et al. , 1990; Brown et al. , 1991a; Brown et al. , 1993; Gale and Church, 1991b; Gale and Church, 1991a; Kay and Roscheisen, 1993; Simard et al. , 1992; Church, 1993; Kupiec, 1993a; Matsumoto et al. , 1993; Dagan et al. , 1993).
P98-1117 15 99:146 RALI/Sallgn The second method proposed by RALI is based on a dynamic programming scheme which uses a score function derived from a translation model similar to that of (Brown et al. , 1990).
P96-1021 16 6:177 1 Motivation The statistical translation model introduced by IBM (Brown et al. , 1990) views translation as a noisy channel process.
C02-1162 17 49:141 WordNet was constructed with what is commonly referred to as a differential theory of lexical semantics (Miller et al. , 1990), which aims to differentiate word senses by grouping words into synonym sets (synsets), which are constructed as to allow a user to easily distinguish between different senses of a word.
C02-1162 18 131:141 There has also been a lot of work involving bilingual corpora, including the IBM Candide project (Brown et al. , 1990), which used statistical data to align words in sentence pairs from parallel corpora in an unsupervised fashion through the EM algorithm; Church (1993) used character frequencies to align words in a parallel corpus; Smadja et al.
C02-1162 19 48:141 4 Experiment Details 4.1 Ontologies The ontologies selected for alignment in this work were the American English WordNet (Miller et al. , 1990) version 1.7, and the Mandarin Chinese HowNet (Dong, 1988).2 There are two main reasons why these particular two ontologies were chosen: they represent very different languages, and were constructed with very different approaches.
C96-1033 20 20:167 A small community have experimented with either purely statistical approaches(Brown et al. , 1990; Schiitze, 1993) or connectionist based approaches (Berg, 1991; Miikkulainen and Dyer, 1991; Jain, 1991; Wermter and Weber, 1994).
D07-1003 21 56:243 Similarly, Murdock and Croft (2005) adopted a simple translation model from IBM model 1 (Brown et al. , 1990; Brown et al. , 1993) and applied it to QA.
C00-2168 22 23:87 In some other approaches, parameters and parameter values are either not sought out or are expected to be obtained automatically (e <, Brown et al. 1990; Goldstein 1998), and, while holding promise for the tiittire as a potential component of an elicitation system, cannot, at this time, lbnn the basis of an entire system of this kind.
W03-2804 23 62:186 The evaluator just needs to indicate whether each of the marked items is an actual error or whether it can rather be considered as an alternative translation This metric resembles very much the one proposed in (Brown et al, 1990).
A92-1014 24 18:224 Though several studies with similar objectives have been reported \[Church, 1988\], \[Zernik and Jacobs, 1990\], \[Calzolari and Bindi, 1990\], \[Garside and Leech, 1985\], \[Hindle and Rooth, 1991\], \[Brown et al. , 1990\], they require that sample corpora be correctly analyzed or tagged in advance.
W06-3110 25 34:125 As a decision rule, we obtain: eI1 = argmax I,eI1 braceleftBigg Msummationdisplay m=1 mhm(eI1,fJ1 ) bracerightBigg (3) This approach is a generalization of the sourcechannel approach (Brown et al. , 1990).
P95-1035 26 24:172 For example, (Beaven, 1992a) employs a chart to avoid recalculating the same combinations of signs more than once during testing, and (Popowich, 1994) proposes a more general technique for storing which rule applications have been attempted; (Brew, 1992) avoids certain pathological cases by employing global constraints on the solution space; researchers such as (Brown et al. , 1990) and (Chen and Lee, 1994) provide a system for bag generation that is heuristically guided by probabilities.
C02-1158 27 16:146 However, in Statistical MT(Brown et al. , 1990), large amounts of translation examples are required in order to obtain high-quality translation.
P04-1068 28 5:207 1 Introduction Compilation of translation lexicons is a crucial process for machine translation (MT) (Brown et al. , 1990) and cross-language information retrieval (CLIR) systems (Nie et al. , 1999).
W08-0302 29 8:197 Phrase-based MT systems are straightforward to train from parallel corpora (Koehn et al., 2003) and, like the original IBM models (Brown et al., 1990), benefit from standard language models built on large monolingual, target-language corpora (Brants et al., 2007).
J04-2003 30 45:412 The translation models they presented in various papers between 1988 and 1993 (Brown et al. 1988; Brown et al. 1990; Brown, Della Pietra, Della Pietra, and Mercer 1993) are commonly referred to as IBM models 15, based on the numbering in Brown, Della Pietra, Della Pietra, and Mercer (1993).
J93-1001 31 322:408 (Sinclair et al. 1987; p. xv) The experience of writing the COBUILD dictionary is documented in Sinclair (1987), a collection of articles from the COBUILD project; see Boguraev (1990) for a strong positive review of this collection.
J93-1001 32 39:408 (Waibel and Lee 1990; p. 4) A number of data collection efforts have helped to bring about this change in the speech community, especially the Texas Instruments' Digit Corpus (Leonard 1984), TIMIT and the DARPA Resource Management (RM) Database (Price et al. 1988).
H91-1026 33 3:200 Much of the current excitement surrounding parallel texts was initiated by Brown et aL (1990), who outline a selforganizing method for using these parallel texts to build a machine translation system.
A94-1016 34 115:136 Instead, we are planning to use an English language model on the output, in a manner similar to that done by speech and statistical translation systems (Brown et al. , 1990).
W02-0902 35 6:199 The seminal work by Brown et al. [1990] at IBM on the Candide system laid the foundation for much of the current work in Statistical Machine Translation (SMT).
N03-1018 36 130:188 We trained an IBM style translation model (Brown et al. , 1990) using GIZA++ (Och and Ney, 2000) on the 500 test lines used in our experiments paired with corresponding English lines from an online Bible.
C02-1064 37 10:207 In such translation, given a source language text, S, the translated text, T,inthe target language that maximizes the probability P(T|S) is selected as the most appropriate translation, T best, which is represented as (Brown et al. , 1990) T best = argmax T P(T|S) = argmax T (P(S|T) P(T)).
C98-1106 38 63:156 2The WORD SPACE method is closely related to Latent Semantic Indexing (bSI)(Deerwester et al., 1990), where document-by-word matrices are processed by SVD instead of word-by-word matrices.
C98-1106 39 15:156 1In fact, this is partly shown by the fact that many MT systems have substitutable domain-dependent (or "user" ) dictionaries . relies on translation probabilities estimated from large bilingual corpora (Brown et al., 1990)(Brown et al., 1991).
P91-1023 40 41:211 Table 3: An Entry in a Probabilistic Dictionary (from Brown et al. , 1990) English French Prob(French \] English) the le 0.610 the la 0.178 the 1' 0.083 the les 0.023 the ce 0.013 the il 0.012 the de 0.009 the A 0.007 the clue 0.007 Table 4: A Bilingual Concordance bank/banque ("money" sense) and the governor of the et le gouvemeur de la 800 per cent in one week through % ca une semaine ~ cause d' ut~ bank/banc ("place" sense) bank of canada have fwxluanfly bcaque du canada ont fr&lnemm bank action.
P91-1023 41 40:211 Aligning sentences is just a first step toward constructing a probabilistic dictionary (Table 3) for use in aligning words in machine translation (Brown et al. , 1990), or for constructing a bilingual concordance (Table 4) for use in lexicography (Klavans and Tzoukermann, 1990).
P91-1023 42 6:211 Introduction Researchers in both machine lranslation (e.g. , Brown et al, 1990) and bilingual lexicography (e.g. , Klavans and Tzoukermann, 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian I-lansards (parliamentary debates) which are available in multiple languages (such as French and English).
P91-1023 43 1:211 A PROGRAM FOR ALIGNING SENTENCES IN BILINGUAL CORPORA William A. Gale Kenneth W. Church AT&T Bell Laboratories 600 Mountain Avenue Murray Hill, NJ, 07974 ABSTRACT Researchers in both machine Iranslation (e.g. , Brown et al. , 1990) and bilingual lexicography (e.g. , Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English).
C08-1045 44 6:150 1 Introduction A wide variety of machine translation (MT) methods are being studied(Nagao, 1996; Brown et al., 1990; Vogel et al., 2003), but to obtain high-quality translations between languages belonging to different families that are alien each other is difficult.
W08-0315 45 20:95 2 Ngram-based SMT System Our translation system implements a log-linear model in which a foreign language sentence fJ1 = f1,f2,,fJ is translated into another language eI1 = f1,f2,,eI by searching for the translation hypothesis eI1 maximizing a log-linear combination of several feature models (Brown et al., 1990): eI1 = argmax eI1 braceleftBigg Msummationdisplay m=1 mhm(eI1,fJ1 ) bracerightBigg where the feature functions hm refer to the system models and the set of m refers to the weights corresponding to these models.
D07-1055 46 49:198 The posterior probability Pr(eI1|fJ1 ) is modeled directly using a log-linear combination of several models (Och and Ney, 2002): pM 1 (eI1|fJ1 ) = exp parenleftBigsummationtextM m=1 mhm(e I1,fJ1 ) parenrightBig summationtext Iprime,eprimeIprime1 exp parenleftBigsummationtextM m=1 mhm(eprime Iprime1,fJ 1 ) parenrightBig (1) This approach is a generalization of the sourcechannel approach (Brown et al. , 1990).
P96-1041 47 4:170 1 Introduction Smoothing is a technique essential in the construction of n-gram language models, a staple in speech recognition (Bahl, Jelinek, and Mercer, 1983) as well as many other domains (Church, 1988; Brown et al. , 1990; Kernighan, Church, and Gale, 1990).
C92-2079 48 32:273 Pour P. Brown et al. d'IBM, le but est de calculer ies param~tres du module probabiliste de traduction automatique qu'ils veulent construire \[Brown et al. , 1988 ; Brown et al. , 1990\].
P93-1001 49 5:194 Introduction Parallel texts have recently received considerable attention in machine translation (e.g. , Brown et al, 1990), bilingual lexicography (e.g. , Klavans and Tzoukermann, 1990), and terminology research for human translators (e.g. , Isabelle, 1992).
P93-1001 50 1:194 Char_align: A Program for Aligning Parallel Texts at the Character Level Kenneth Ward Church AT&T Bell Laboratories 600 Mountain Avenue Murray Hill NJ, 07974-0636 kwc @research.att.com Abstract There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and R/Ssenschein (to appear), Simard et al (1992), WarwickArmstrong and Russell (1990).
D08-1065 51 39:259 Statistical MT (Brown et al., 1990; Och and Ney, 2004) can be described as a mapping of a word sequence F in the source language to a word sequence E in the target language; this mapping is produced by the MT decoder (F).
N04-1023 52 20:201 1.1 Generative Models for MT The seminal IBM models (Brown et al. , 1990) were the first to introduce generative models to the MT task.
N04-1023 53 10:201 1 Introduction The noisy-channel model (Brown et al. , 1990) has been the foundation for statistical machine translation (SMT) for over ten years.
P03-1019 54 14:237 2 is the so-called source-channel approach to statistical machine translation (Brown et al. , 1990).
W03-0304 55 3:87 1 Introduction Since the pioneering work of the IBM machine translation team almost 15 years ago (Brown et al. , 1990), statistical methods have proven to be valuable tools in approaching the automation of translation.
J92-4003 56 23:185 Language Models Figure I shows a model that has long been used in automatic speech recognition (Bahl, Jelinek, and Mercer 1983) and has recently been proposed for machine translation (Brown et al. 1990) and for automatic spelling correction (Mays, Demerau, and Mercer 1990).
W03-0611 57 7:143 This is broadly similar in concept to the use of parallel multilingual corpora in machine translation (Brown et al. , 1990), except that our parallel corpus consists of texts and underlying numeric data, not texts and their translations.
H93-1040 58 20:135 The 1992 Evaluation tested three research MT systems: CANDIDE (IBM, French English) uses a statistical language modeling technique based on speech recognition algorithms (see Brown et al. , 1990).
A00-2011 59 10:171 Many corpus-based MT systems require parallel corpora (Brown et al. , 1990; Brown et al. , 1991; Gale and Church, 1991; Resnik, 1999).
E93-1015 60 5:236 Some applications using information extracted from bilingual corpora are statistical MT (\[Brown et al. , 1990\]), bilingual lexicography (\[Catizone el al. , 1989\]), word sense disambiguation (\[Gale et al. , 1992\]), and multilingual information retrieval (\[Landauer and Littmann, 1990\]).
W06-3108 61 38:203 As a decision rule, we obtain: eI1 = argmax I,eI1 braceleftBigg Msummationdisplay m=1 mhm(eI1,fJ1 ) bracerightBigg (3) This approach is a generalization of the sourcechannel approach (Brown et al. , 1990).
H05-1010 62 11:145 For example, when considering whether to align two words in the IBM models (Brown et al. , 1990), one cannot easily include information about such features as orthographic similarity (for detecting cognates), presence of the pair in various dictionaries, similarity of the frequency of the two words, choices made by other alignment systems on this sentence pair, and so on.
H05-1010 63 16:145 Word alignment is cast as a maximum weighted matching problem (Cormen et al. , 1990) in which each pair of words (e j,f k ) in a sentence pair (e,f) is associated with a score s jk (e,f) reflecting the desirability of the alignment of that pair.
H05-1010 64 6:145 1 Introduction The standard approach to word alignment from sentence-aligned bitexts has been to construct models which generate sentences of one language from the other, then fitting those generative models with EM (Brown et al. , 1990; Och and Ney, 2003).
C98-2225 65 27:179 2 Review: Noisy Channel Model The statistical translation model introduced by IBM (Brown et al., 1990) views translation as a noisy channel process.
N04-1033 66 8:290 1 Introduction In statistical machine translation, we are given a source language (French) sentence fJ1 = f1 :::fj :::fJ, which is to be translated into a target language (English) sentence eI1 = e1 :::ei :::eI: Among all possible target language sentences, we will choose the sentence with the highest probability: eI1 = argmax eI1 'Pr(eI 1jf J 1 ) (1) = argmax eI1 'Pr(eI 1)Pr(f J 1 je I 1) (2) The decomposition into two knowledge sources in Equation 2 is known as the source-channel approach to statistical machine translation (Brown et al. , 1990).
P91-1017 67 10:208 Substantial application of semantic or pragmatic knowledge about the word and its context for broad domains requires compiling huge amounts of knowledge, whose usefulness for practical applications has not yet been proven (Lenat et al. , 1990; Nirenburg et al. , 1988; Chodorow et al. , 1985).
W06-1905 68 52:193 IBM applied the noisy channel model idea to translation of sentences from aligned parallel corpora, where the source language sentence is the distorted signal, and the EACL 2006 Workshop on Multilingual Question Answering MLQA06 32 target language sentence is the original signal (Brown et al. , 1990).
P05-2016 69 8:31 The first work on SMT done at IBM (Brown et al. , 1990; Brown et al. , 1992; Brown et al. , 1993; Berger et al. , 1994), used a noisy-channel model, resulting in what Brown et al.
P06-1092 70 7:176 1 Introduction The noisy channel model approach is being successfully applied to various natural language processing (NLP) tasks, such as speech recognition (Jelinek, 1985), spelling correction (Kernighan et al. , 1990), machine translation (Brown et al. , 1990), etc. In this approach an NLP system is composed of two modules: one is a taskdependent part (an acoustic model for speech recognition) which describes a relationship between an input signal sequence and a word, the other is a language model (LM) which measures the likelihood of a sequence of words as a sentence in the language.
P95-1033 71 6:198 1 Introduction Parallel corpora have been shown to provide an extremely rich source of constraints for statistical analysis (e.g. , Brown et al. 1990; Gale & Church 1991; Gale et al. 1992; Church 1993; Brown et al. 1993; Dagan et al. 1993; Dagan & Church 1994; Fung & Church 1994; Wu & Xia 1994; Fung & McKeown 1994).
P95-1033 72 98:198 A simpler, related idea of penalizing distortion from some ideal matching pattern can be found in the statistical translation (Brown et al. 1990; Brown et al. 1993) and word alignment (Dagan et al. 1993; Dagan & Church 1994) models.
J06-4004 73 9:388 The first SMT systems were developed in the early nineties (Brown et al. 1990, 1993).
P06-1067 74 10:241 N-gram language models have also been used in Statistical Machine Translation (SMT) as proposed by (Brown et al. , 1990; Brown et al. , 1993).
W04-3009 75 31:174 They simplified a statistical machine translation (MT) model called an IBM model (Brown et al. , 1990), and tried to construct a general post-processor that can correct errors generated by any speech recognizer.
W04-3009 76 46:174 2 Noisy Channel Error Correction Model The noisy channel error correction framework has been applied to a wide range of problems, such as spelling correction, statistical machine translation, and ASR error correction (Brill and Moore, 2000; Brown et al. , 1990; Ringger and Allen, 1996).
W04-3009 77 55:174 Following (Brown et al. , 1990), we refer to the number of post-channel words oi produced by a pre-channel word wi as a fertility.
P98-2230 78 27:182 2 Review: Noisy Channel Model The statistical translation model introduced by IBM (Brown et al. , 1990) views translation as a noisy channel process.
W98-1307 79 40:307 This transducer always adopts an "onward" form, in which the output substrings are assigned to the edges in such a way that they are as "close" to the initial state as they can be (see Oncina et al. , 1993 \[15\], Reutenauer, 1990 \[22\]; for a recent reelaboration of these concepts see Mohri, 1997 \[13\]).
P04-1062 80 8:264 But for other tasks, such as machine translation (Brown et al. , 1990), the chief merit of unlabeled data is simply that nothing else is available; unsupervised parameter estimation is notorious for achieving mediocre results.
P04-1062 81 83:264 2.3 Prior work DA was originally described as an algorithm for clustering data in RN (Rose et al. , 1990).
P91-1022 82 7:182 INTRODUCTION Recent work by Brown et al. , \[Brown et al. , 1988, Brown et al. , 1990\] has quickened anew the long dormant idea of using statistical techniques to carry out machine translation from one natural language to another.
P91-1022 83 20:182 TIIE HANSARD CORPORA Brown el al. , \[Brown et al. , 1990\] describe the process by which the proceedings of the Ca.nadian Parliament are recorded.
W01-1401 84 11:167 Statistical Machine Translation (SMT): SMT learns models for translation from corpora and dictionaries and searches for the best translation according to the models in run-time (Brown et al. , 1990; Knight, 1997; Ney et al. , 2000).
W01-1401 85 135:167 Knowledge of EBMT Many EBMT studies (Sato and Nagao, 1990; Sato, 1991; Furuse et al. , 1994; Sadler, 1989) assume the existence of a bank of aligned bilingual trees or a set of translation patterns.
W01-1401 86 14:167 EBMT retrieves the translation examples that are best matched to an input expression and adjusts the examples to obtain the translation (Nagao, 1981; Sadler 1989; Sato and Nagao, 1990; Sumita and Iida, 1991; Kitano, 1993; Furuse et al. , 1994; Watanabe and Maruyama, 1994; Cranias et al. , 1994; Jones, 1996; Veale and Way, 1997; Carl, 1999, Andriamanankasina et al. , 1999; Brown, 2000).
H01-1035 87 27:245 All corpora were automatically word-aligned by the now publicly available EGYPT system (Al-Onaizan et al. , 1999), based on IBMs Model 3 statistical MT formalism (Brown et al. , 1990).
J95-4004 88 144:404 Part-of-speech tagging is an active area of research; a great deal of work has been done in this area over the past few years (e.g. , Jelinek 1985; Church 1988; Derose 1988; Hindle 1989; DeMarcken 1990; Merialdo 1994; Brill 1992; Black et al. 1992; Cutting et al. 1992; Kupiec 1992; Charniak et al. 1993; Weischedel et al. 1993; Schutze and Singer 1994).
J95-4004 89 154:404 Almost all recent work in developing automatically trained part-of-speech taggers has been on further exploring Markovmodel based tagging (Jelinek 1985; Church 1988; Derose 1988; DeMarcken 1990; Merialdo 1994; Cutting et al. 1992; Kupiec 1992; Charniak et al. 1993; Weischedel et al. 1993; Schutze and Singer 1994).
J95-4004 90 13:404 An effort has recently been undertaken to create automated machine translation systems in which the linguistic information needed for translation is extracted automatically from aligned corpora (Brown et al. 1990).
J95-4004 91 11:404 Endemic structural ambiguity, which can lead to such difficulties as trying to cope with the many thousands of possible parses that a grammar can assign to a sentence, can be greatly reduced by adding empirically derived probabilities to grammar rules (Fujisaki et al. 1989; Sharman, Jelinek, and Mercer 1990; Black et al. 1993) and by computing statistical measures of lexical association (Hindle and Rooth 1993).
J97-3002 92 13:359 Parallel bilingual corpora have been shown to provide a rich source of constraints for statistical analysis (Brown et al. 1990; Gale and Church 1991; Gale, Church, and Yarowsky 1992; Church 1993; Brown et al. 1993; Dagan, Church, and Gale 1993; Department of Computer Science, University of Science and Technology, Clear Water Bay, Hong Kong.
N03-1010 93 3:227 1 Introduction Most of the current work in statistical machine translation builds on word replacement models developed at IBM in the early 1990s (Brown et al. , 1990, 1993; Berger et al. , 1994, 1996).
C08-2032 94 4:70 1 Introduction The bilingual lexicon is a crucial resource for multilingual applications in natural language processing including machine translation (Brown et al., 1990) and cross-lingual information retrieval (Nie et al., 1999).
N01-1015 95 146:156 The alignment problem in statistical machine translation (Brown et al. , 1990) is too general: longdistance displacement of large chunks of material may occur frequently when translating whole sentences, but are unlikely to play any role for the letter-to-sound mapping, though local reorderings do occur (Sproat, 2000).
C00-2145 96 103:153 The length of translation segments as well as their most likely lifttim and final words arc calculated based on proba999 Expected Coverage of the System high low % low high Expected 2anslation Quality 1: Sato and Nagao (1990) 2: Carl (1999) 3: Giivenir and Cicekli (1998) e4: Zer (1997) e~: Heyn (1996) 6: Collins (1998) ?: Brown (1997) s: Brown et al.
C00-2145 97 84:153 (Brown et al. , 1990) have a purely holistic view on languages.
C00-2145 98 72:153 phras, word O4,5 $9 o6,7 qb 1 o2,3 e8 molecular mixed holistic Atonricity of Representation el: Sato and Nagao (1990) o4: ZERES Zer (1997) or: Brown (1997) 2: EDGAR Carl (1999) 5: TRADOS Heyn (1996) s: Brown et al.
P02-1052 99 26:169 Proceedings of the 40th Annual Meeting of the Association for (Brown et al. , 1990; Brown et al. , 1993), a number of other algorithms have been developed.
P08-1115 100 6:179 1 Introduction When Brown and colleagues introduced statistical machine translation in the early 1990s, their key insight harkening back to Weaver in the late 1940s was that translation could be viewed as an instance of noisy channel modeling (Brown et al., 1990).
J00-1004 101 228:231 At the same time, we believe our method has advantages over the approach developed initially at IBM (Brown et al. 1990; Brown et al. 1993) for training translation systems automatically.
J96-1001 102 162:576 This approach is quite different from those adopted for the translation of single words (Klavans and Tzoukermann 1990; Dorr 1992; Klavans and Tzoukermann 1996), since for single words polysemy cannot be ignored; indeed, the problem of sense disambiguation has been linked to the problem of translating ambiguous words (Brown et al. 1991; Dagan, Itai, and Schwall 1991; Dagan and Itai 1994).
N04-1022 103 74:155 3 Minimum Bayes-Risk Decoding Statistical Machine Translation (Brown et al. , 1990) can be formulated as a mapping of a word sequence a0 in a source language to word sequence a1a3a2 in the target language that has a word-to-word alignmenta4a18a2 relative to a0 . Given the source sentence a0, the MT decoder a29 a8a25a0a21a13 produces a target word string a1a6a2 with word-to-word alignment a4a5a2 . Relative to a reference translation a1 with word alignment a4, the decoder performance is measured as a7a24a8a12a8a25a1a17a11a23a4a5a13a15a11a30a29 a8a25a0a21a13a12a13 . Our goal is to find the decoder that has the best performance over all translations.
W07-0412 104 90:166 The basic idea of using synchronous TAG for machine translation dates from the original definition (Shieber and Schabes, 1990), and has been pursued by several researchers (Abeille et al. , 1990; Dras, 1999; Prigent, 1994; Palmer et al. , 1999), but only recently in its probabilistic form (Nesson et al. , 2006).
W07-0412 105 20:166 Systems based on word-to-word lexicons, such as the IBM systems (Brown et al. , 1990; Brown et al. , 1993), incorporate further devices that allow reordering of words (a distortion model) and ranking of alternatives (a monolingual language model).
P04-3004 106 22:27 TransSearch exploits sentence alignment techniques (Brown et al 1990; Gale and Church 1990) to facilitate bilingual search at the granularity level of sentences.
W01-1404 107 5:145 Some of these studies have concentrated on finite-state or extended finite-state machinery, such as (Vilar and others, 1999), others have chosen models closer to context-free grammars and context-free transduction, such as (Alshawi et al. , 2000; Watanabe et al. , 2000; Yamamoto and Matsumoto, 2000), and yet other studies cannot be comfortably assigned to either of these two frameworks, such as (Brown and others, 1990) and (Tillmann and Ney, 2000).
P02-1024 108 11:229 1 Introduction The n-gram model has been widely applied in many applications such as speech recognition, machine translation, and Asian language text input [Jelinek, 1990; Brown et al. , 1990; Gao et al. , 2002].
W02-1603 109 39:121 n. a financial institution that accepts deposits and channels the money into lending activities n. sloping land (especially the slope beside a body of water) In order to resolve structural ambiguity, we apply the concept of the statistical machine translation approach (Brown et al. , 1990).
P08-2036 110 6:92 1 Introduction Language models, i.e. models that assign probabilities to sequences of words, have been proven useful in a variety of applications including speech recognition and machine translation (Bahl et al., 1983; Brown et al., 1990).
W04-1118 111 30:191 2 Review of the Baseline System for Statistical Machine Translation 2.1 Principle In statistical machine translation, we are given a source language (French) sentence fJ1 = f1 :::fj :::fJ, which is to be translated into a target language (English) sentence eI1 = e1 :::ei :::eI: Among all possible target language sentences, we will choose the sentence with the highest probability: eI1 = argmax eI1 'Pr(eI 1jf J 1 ) (1) = argmax eI1 'Pr(eI 1)Pr(f J 1 je I 1) (2) The decomposition into two knowledge sources in Equation 2 is known as the source-channel approach to statistical machine translation (Brown et al. , 1990).
J93-1006 112 23:392 Another example is the completely automatic, statistical approach to translation taken by the research group at IBM (Brown et al. 1990), which takes a large corpus of text with aligned translations as its point of departure.
W95-0106 113 87:153 It is interesting to constrast this method with the "parse-parse-match" approaches that have been reported recently for producing parallel bracketed corpora (Sadler & Vendelmans 1990; Kaji et al. 1992; Matsumoto et al. 1993; Cranias et al. 1994; Gfishman 1994).
W95-0106 114 13:153 Numerous experiments have shown parallel bilingual corpora to provide a rich source of constraints for statistical analysis (e.g. , Brown et al. 1990; Gale & Church 1991 ; Gale et al. 1992; Church 1993; Brown et al. 1993; Dagan et al. 1993; Fung & Church 1994; Wu & Xia 1994; Fung & McKeown 1994).
W05-0104 115 32:104 For example, it meant that simple word alignment models like IBM models 1 and 2 (Brown et al. , 1990) and the HMM model (Vogel et al. , 1996) came many weeks after HMMs were introduced in the context of part-of-speech tagging.
C04-1030 116 9:215 1 Introduction In statistical machine translation, we are given a source language (French) sentence fJ1 = f1 :::fj :::fJ, which is to be translated into a target language (English) sentence eI1 = e1 :::ei :::eI: Among all possible target language sentences, we will choose the sentence with the highest probability: eI1 = argmax eI1 'Pr(eI 1jf J 1 ) = argmax eI1 'Pr(eI 1)Pr(f J 1 je I 1) This decomposition into two knowledge sources is known as the source-channel approach to statistical machine translation (Brown et al. , 1990).
P97-1063 117 56:218 It is analogous to the step in other translation model induction algorithms that sets all probabilities below a certain threshold to negligible values (Brown et al. , 1990; Dagan et al. , 1993; Chen, 1996).
P97-1063 118 7:218 1 Introduction Over the past decade, researchers at IBM have developed a series of increasingly sophisticated statistical models for machine translation (Brown et al. , 1988; Brown et al. , 1990; Brown et al. , 1993a).
C08-1056 119 75:155 Giza++ (Och and Ney, 2003) is used to induce, based on statistical principles (Brown et al., 1990), an automatic word alignment of SMS tokens with their normalized counterparts; Moses (Koehn et al., 2007) is used to learn the various parameters of the phrase-based model, to optimize the weight combination and to perform the translation using a multi-stack search algorithm; the SRI language model toolkit (Stolcke, 2002) is finally used to estimate statistical language models.
W05-0833 120 140:152 7 Final Remarks Finally, as (Way and Gough, 2005) observe, it is difficult to explain why to this day SMT practitioners have not made full use of the large body of existing work on EBMT, from (Nagao, 1984) to (Carl & Way, 2003) and beyond, which has contributed greatly to the field of corpus-based MT. From its very inception EBMT has made use of a range of sub-sentential data both phrasal and lexical to perform translations whereas, until quite recently, SMT models of translation were based on the relatively simple word alignment models of (Brown et al. , 1990).
W05-0833 121 45:152 Until quite recently, SMT models of translation were based on the simple word alignment models of (Brown et al. , 1990).
W08-0510 122 5:155 1 Motivation Phrase-based translation has been one of the major advances in statistical machine translation (Brown et al. 1990) in recent years and is currently one of the techniques which can claim to be stateof-the-art in machine translation.
W08-0510 123 6:155 Phrase-based models are a development of the word based models as exemplified by the (Brown et al. 1990).
H05-1050 124 143:287 While they begin with a small translation lexicon, they are sufficiently robust to the choice of this initial seed (lexicon) that it suffices to construct a single seed by crude automatic means (Brown et al. , 1990; Melamed, 1997).
I08-2120 125 8:174 1 Introduction Parallel corpora consisting of text in parallel translation plays an important role in data-driven natural language processing technologies such as statistical machine translation (Brown et al., 1990) and cross-lingual information retrieval (Landauer and Littman, 1990; Oard, 1997).
P96-1009 126 139:301 We employ a fertility model (Brown et al, 1990) that indicates how likely each word is to map to multiple words or to a partial word in the SR output.
P96-1009 127 106:301 To achieve this, we adapted some techniques from statistical machine translation (such as Brown et al. , 1990) in order to model the errors that Sphinx-II makes in our domain.
C92-2080 128 165:169 1992 References \[Brown 90\] P.F.Brown et al. "A Statistical Approach to Machine Translation", Computational Linguistics, Vol.16, No.2, 1990 \[Doi 92\] S.Doi and K.Muraki "Robust Translation and Meaning Interpretation Mechanism based on Examples in Dictionary", Prec.
H05-1086 129 59:178 They used IBM Model 1 (Brown et al. , 1990), to rank documents according to their translation probability, given the query.
H05-1086 130 36:178 In the estimation step, the probability that a term in the sentence translates to a term in the query is estimated using the implementation of IBM 1http://trec.nist.gov 2http://www.ldc.upenn.edu Model 1 (Brown et al. , 1990) in GIZA++ (AlOnaizan et al. , 1999) out-of-the-box without alteration.
P98-1110 131 16:158 relies on translation probabilities estimated from large bilingual corpora (Brown et al. , 1990)(Brown et al. , 1991).
P98-1110 132 64:158 2The WORD SPACE method is closely related to Latent Semantic Indexing (LSI)(Deerwester et al. , 1990), where document-by-word matrices are processed by SVD instead of word-by-word matrices.
J03-1005 133 585:672 The resulting bilingual data have been sentence-aligned using statistical methods (Brown et al. 1990).
J03-1005 134 89:672 (1990) and Brown et al.
P99-1068 135 7:186 (Brown et al. , 1990)) typically rely on large quantities of bilingual text aligned at the document or sentence level, and a number of approaches in the burgeoning field of crosslanguage information retrieval exploit parallel corpora either in place of or in addition to mappings between languages based on information from bilingual dictionaries (Davis and Dunning, 1995; Landauer and Littman, 1990; Hull and Oard, 1997; Oard, 1997).
P96-1023 136 138:283 The node mapping function f for the entire tree thus has a different role from the alignment function in the IBM statistical translation model (Brown et al. 1990, 1993); the role of the latter includes the linear ordering of words in the target string.
P96-1023 137 278:283 We are not advocating an approach in which linguistic structure is ignored (as it is in the IBM translator described by Brown et al. 1990), but rather one in which the syntactic and semantic structure of a string is implicit in the way it is processed by an interpreter.
J93-1004 138 16:365 Introduction Researchers in both machine translation (e.g. , Brown et al. 1990) and bilingual lexicography (e.g. , Klavans and Tzoukermann 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary debates), which are available in multiple languages (such as French and English).
J93-1004 139 56:365 (from Brown et al. 1990) English French Prob (FrenchlEnglish) the le 0.610 the la 0.178 the 1' 0.083 the les 0.023 the ce 0.013 the il 0.012 the de 0.009 the ~ 0.007 the que 0.007 very well documented in the published literature; consequently, there has been a lot of unnecessary subsequent work at ISSCO and elsewhere.
J93-1004 140 1:365 A Program for Aligning Sentences in Bilingual Corpora William A. Gale* AT&T Bell Laboratories Kenneth W. Church* AT&T Bell Laboratories Researchers in both machine translation (e.g. , Brown et al. 1990) and bilingual lexicography (e.g. , Klavans and Tzoukermann 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings), which are available in multiple languages (such as French and English).
J93-1004 141 39:365 Aligning sentences is just a first step toward constructing a probabilistic dictionary (Table 3) for use in aligning words in machine translation (Brown et al. 1990), or for constructing a bilingual concordance (Table 4) for use in lexicography (Klavans and Tzoukermann 1990).
N06-1015 142 13:205 The standard approach to word alignment is to construct directional generative models (Brown et al. , 1990; Och and Ney, 2003), which produce a sentence in one language given the sentence in another language.
N06-1015 143 94:205 Generative alignment models like the HMM model (Vogel et al. , 1996) and IBM models 4 and above (Brown et al. , 1990; Och and Ney, 2003) directly model correlations between alignments of consecutive words (at least on one side).
W97-0407 144 118:197 It is more realistic that the one in (Castellanos et al. , 1994), but, unlike other corpora such as the Hansards (Brown et al. , 1990), it is not unrestricted.
W03-0414 145 12:242 (Brown et al. , 1990; Brown et al. , 1993)) are best known and studied.
J04-4002 146 27:482 Yet the modeling, training, and search methods have also improved since the field of statistical machine translation was pioneered by IBM in the late 1980s and early 1990s (Brown et al. 1990; Brown et al. 1993; Berger et al. 1994).
C04-1117 147 130:161 4 Related Work The rise of the empirical paradigm in the field of machine translation is, to a large degree, due to the wide-spread availability of parallel corpora (Brown et al. , 1990).
C94-2178 148 7:102 Motivation There have been quite a number of recent papers on parallel text: Brown et al (1990, 1991, 1993), Chen (1993), Church (1993), Church et al (1993), Dagan et al (1993), Gale and Church (1991, 1993), Isabelle (1992), Kay and Rgsenschein (1993), Klavans and Tzoukermann (1990), Kupiec (1993), Matsumoto (1991), Ogden and Gonzales (1993), Shemtov (1993), Simard et al (1992), WarwickArmstrong and Russell (1990), Wu (to appear).
W93-0301 149 38:185 2.1.1 Brown et al.'s Model In the context of their statistical machine translation project (Brown et al. , 1990), Brown et al. estimate Pr(f\[e), the probability that f, a sentence in one language (say French), is the translation of e, a sentence in the other language (say English).
W93-0301 150 6:185 1 Introduction Aligning parallel texts has recently received considerable attention (Warwick et al. , 1990; Brown et al. , 1991a; Gale and Church, 1991b; Gale and Church, 1991a; Kay and Rosenschein, 1993; Simard et al. , 1992; Church, 1993; Kupiec, 1993; Matsumoto et al. , 1993).
W93-0301 151 7:185 These methods have been used in machine translation (Brown et al. , 1990; Sadler, 1989), terminology research and translation aids (Isabelle, 1992; Ogden and Gonzales, 1993), bilingual lexicography (Klavans and Tzoukermann, 1990), collocation studies (Smadja, 1992), word-sense disambiguation (Brown et al. , 1991b; Gale et al. , 1992) and information retrieval in a multilingual environment (Landauer and Littman, 1990).
P07-2026 152 30:101 3.2 Baseline System The posterior probability Pr(eI1|fJ1 ) is modeled directly using a log-linear combination of several models (Och and Ney, 2002): Pr(eI1|fJ1 ) = exp parenleftBigsummationtextM m=1 mhm(e I1,fJ1 ) parenrightBig summationtext Iprime,eprimeIprime1 exp parenleftBigsummationtextM m=1 mhm(eprime Iprime1,fJ 1 ) parenrightBig (1) This approach is a generalization of the sourcechannel approach (Brown et al. , 1990).
W99-0905 153 9:135 Due to the recent availability of large text corpora, various statistical approaches have been tried including using 1) parallel corpora (Brown et al. , 1990), (Brown et al. , 1991), (Brown, 1997), 2) non-parallel bilingual corpora tagged with topic area (Yamabana et al. , 1998) and 3) un-tagged mono-language corpora in the target language (Dagan and Itai, 1994), (Tanaka and Iwasaki, 1996), (Kikui, 1998).
W99-0905 154 121:135 , 1990), (Brown et al. , 1991), (Brown, 1997), (Yamabana et hi.
J05-4003 155 9:416 They provide indispensable training data for statistical machine translation (Brown et al. 1990; Och and Ney 2002) and have been found useful in research on automatic lexical acquisition (Gale and Church 1991; Melamed 1997), crosslanguage information retrieval (Davis and Dunning 1995; Oard 1997), and annotation projection (Diab and Resnik 2002; Yarowsky and Ngai 2001; Yarowsky, Ngai, and Wicentowski 2001).
J05-4003 156 106:416 Word alignments were first introduced in the context of statistical MT, where they are used to estimate the parameters of a translation model (Brown et al. 1990).
C00-2092 157 81:175 As in any statistical MT system, we wish to choose the target sentence w~ so as to maximize P(wtlw,) (Brown et al. , 1990, p. 79).
C00-2092 158 27:175 2Links between tree nodes were introduced for TAG trees, in (Schieber and Schabes, 1990), and put to use for Machine Translation by Abeilld et al.
C00-2092 159 112:175 6 4.1 Evahmtion In a manner similar to (Brown et al. , 1990, p. 83), we assigned each of the resulting sentences a category according to the following criteria.
C02-1137 160 82:136 Therefore, title-word-document-word translation probability P(tw|dw) can be learned from the training corpus using statistical translation model (Brown et al. , 1990).
W06-3103 161 38:183 As a decision rule, we obtain: eI1 = argmax I,eI1 braceleftBigg Msummationdisplay m=1 mhm(eI1,fJ1 ) bracerightBigg (3) This approach is a generalization of the sourcechannel approach (Brown et al. , 1990).
J03-3002 162 7:541 They represent resources for automatic lexical acquisition (e.g. , Gale and Church 1991; Melamed 1997), they provide indispensable training data for statistical translation models (e.g. , Brown et al. 1990; Melamed 2000; Och and Ney 2002), and they can provide the connection between vocabularies in cross-language information retrieval (e.g. , Davis and Dunning 1995; Landauer and Littman 1990; see also Oard 1997).
C00-2169 163 69:160 For further processing steps we have to introduce the concept of alignment (Brown et al. , 1990).
W06-1008 164 12:225 A compilationof paralleltexts offered in a serviceableformis calleda parallelcorpus.Parallelcorporaareveryvaluableresourcesin various fields of multilingualnaturallanguageprocessing such as statisticalmachinetranslation(Brown et al. , 1990),cross-lingualIR (Chenand Nie, 2000), and constructionof dictionary(Nagao, 1996).
Copyright © Univ. of Mich. and the CLAIR Group at the Univ. of Mich.
All information provided herein should be considered tentative and still under construction. Further analysis and correction is still being performed. Please remember that all statistics contained herein are the results of independent research and should not be considered a statement of fact regarding any of the papers, authors, or other entities they refer to.