Paper: Statistical Phrase-Based Translation

Webmaster's Note: The whole dataset is available Here. Please download the dataset instead of crawling the website.

Basic Info:

id: N03-1017
title: Statistical Phrase-Based Translation
authors: Koehn, Philipp (University of Southern California, Marina del Rey CA), Och, Franz Josef (University of Southern California, Marina del Rey CA), Marcu, Daniel (University of Southern California, Marina del Rey CA)
venue: HLT-NAACL
year: 2003
pdf: link


Abstract


We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previ- ously proposed phrase-based translation mod- els. Within our framework, we carry out a large number of experiments to understand bet- ter and explain why phrase-based models out- perform word-based models. Our empirical re- sults, which hold for all examined language pairs, suggest that the highest levels of perfor- mance can be obtained through relatively sim- ple means: heuristic learning of phrase trans- lations from word-based alignments and lexi- cal weighting of phrase translations. Surpris- ingly, learning phrases longer than three words and learning phrases from high-accuracy word- level alignment models does not have a strong impact on performance. Learning only syntac- tically motivated phrases degrades the perfor- mance of our systems.




Incoming Citations
IdTitle
P03-1040Feature-Rich Statistical Translation Of Noun Phrases
W03-1001A Projection Extension Algorithm For Statistical Machine Translation
C04-1030Reordering Constraints For Phrase-Based Statistical Machine Translation
C04-1090A Path-Based Transfer Model For Machine Translation
J04-4002The Alignment Template Approach To Statistical Machine Translation
N04-1033Improvements In Phrase-Based Statistical Machine Translation
N04-1035What's In A Translation Rule?
N04-4026A Unigram Orientation Model For Statistical Machine Translation
P04-1023Statistical Machine Translation With Word- And Sentence-Aligned Parallel Corpora
P04-1060Experiments In Parallel-Text Based Grammar Induction
P04-1064Aligning Words Using Matrix Factorisation
P04-1066Improving IBM Word Alignment Model 1
W04-3219Monolingual Machine Translation For Paraphrase Generation
W04-3227Phrase Pair Rescoring With Term Weighting For Statistical Machine Translation
W04-3248A New Approach For English-Chinese Named Entity Alignment
W04-3250Statistical Significance Tests For Machine Translation Evaluation
H05-1009NeurAlign: Combining Word Alignments Using Neural Networks
H05-1021Local Phrase Reordering Models For Statistical Machine Translation
H05-1022HMM Word And Phrase Alignment For Statistical Machine Translation
H05-1023Inner-Outer Bracket Models For Word Alignment Using Hidden Blocks
H05-1024Alignment Link Projection Using Transformation-Based Learning
H05-1096Word-Level Confidence Estimation For Machine Translation Using Phrase-Based Translation Models
H05-1097Word-Sense Disambiguation For Machine Translation
H05-1098The Hiero Machine Translation System: Extensions Evaluation And Analysis
I05-1042Empirical Study of Utilizing Morph-Syntactic Information in SMT
I05-1051Phrase-Based Statistical Machine Translation: A Level of Detail Approach
I05-1053A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation
I05-3011Learning a Log-Linear Model with Bilingual Phrase-Pair Features for Statistical Machine Translation
P05-1032Scaling Phrase-Based Statistical Machine Translation To Larger Corpora And Longer Phrases
P05-1033A Hierarchical Phrase-Based Model For Statistical Machine Translation
P05-1034Dependency Treelet Translation: Syntactically Informed Phrasal SMT
P05-1066Clause Restructuring For Statistical Machine Translation
P05-1068Context-Dependent SMT Model Using Bilingual Verb-Noun Collocation
P05-1069A Localized Prediction Model For Statistical Machine Translation
P05-1074Paraphrasing With Bilingual Parallel Corpora
P05-2016Dependency-Based Statistical Machine Translation
W05-0801Association-Based Bilingual Word Alignment
W05-0820Shared Task: Statistical Machine Translation Between European Languages
W05-0823Statistical Machine Translation Of Euparl Data By Using Bilingual N-Grams
W05-0824RALI: SMT Shared Task System Description
W05-0826Combining Linguistic Data Views For Phrase-Based SMT
W05-0827Improving Phrase-Based Statistical Translation By Modifying Phrase Extraction And Including Several Features
W05-0829Competitive Grouping In Integrated Phrase Segmentation And Alignment Model
W05-0830Deploying Part-Of-Speech Patterns To Enhance Statistical Phrase-Based Machine Translation Resources
W05-0833Hybrid Example-Based SMT: The Best Of Both Worlds?
W05-0836Training And Evaluating Error Minimization Decision Rules For Statistical Machine Translation
W05-0908On Some Pitfalls In Automatic Evaluation And Significance Testing For MT
E06-2002A Web-Based Demonstrator Of A Multi-Lingual Phrase-Based Translation System
J06-3001Orthographic Errors in Web Pages: Toward Cleaner Web Corpora
J06-4004N-gram-based Machine Translation
N06-1002Do We Need Phrases? Challenging The Conventional Wisdom In Statistical Machine Translation
N06-1003Improved Statistical Machine Translation Using Paraphrases
N06-1004Segment Choice Models: Feature-Rich Models For Global Distortion In Statistical Machine Translation
N06-1013A Maximum Entropy Approach To Combining Word Alignments
N06-1014Alignment By Agreement
N06-1015Word Alignment Via Quadratic Assignment
N06-1031Relabeling Syntax Trees To Improve Syntax-Based Machine Translation Quality
N06-1032Grammatical Machine Translation
P06-1002Going Beyond AER: An Extensive Analysis Of Word Alignments And Their Impact On MT
P06-1009Discriminative Word Alignment With Conditional Random Fields
P06-1066Maximum Entropy Based Phrase Reordering Model For Statistical Machine Translation
P06-1067Distortion Models For Statistical Machine Translation
P06-1077Tree-To-String Alignment Template For Statistical Machine Translation
P06-1090A Clustered Global Phrase Reordering Model For Statistical Machine Translation
P06-1091A Discriminative Global Training Algorithm For Statistical MT
P06-1096An End-To-End Discriminative Approach To Machine Translation
P06-1098Left-To-Right Target Generation For Hierarchical Phrase-Based Translation
P06-1122Modelling Lexical Redundancy For Machine Translation
P06-1123Empirical Lower Bounds On The Complexity Of Translational Equivalence
P06-1139Stochastic Language Generation Using WIDL-Expressions And Its Application In Machine Translation And Summarization
P06-1146Optimal Constituent Alignment With Edge Covers For Semantic Projection
P06-2005A Phrase-Based Statistical Model For SMS Text Normalization
P06-2101Minimum Risk Annealing For Training Log-Linear Models
P06-2107Statistical Phrase-Based Models For Interactive Computer-Assisted Translation
W06-1606SPMT: Statistical Machine Translation With Syntactified Target Language Phrases
W06-1607Phrasetable Smoothing For Statistical Machine Translation
W06-1608The Impact Of Parse Quality On Syntactically-Informed Statistical Machine Translation
W06-1609Statistical Machine Reordering
W06-1628A Discriminative Model For Tree-To-Tree Translation
W06-3102Initial Explorations In English To Turkish Statistical Machine Translation
W06-3105Why Generative Phrase Models Underperform Surface Heuristics
W06-3106Phrase-Based SMT With Shallow Tree-Phrases
W06-3109Generalized Stack Decoding Algorithms For Statistical Machine Translation
W06-3112Contextual Bitext-Derived Paraphrases In Automatic MT Evaluation
W06-3113How Many Bits Are Needed To Store Probabilities For Phrase-Based Translation?
W06-3115NTT System Description For The WMT2006 Shared Task
W06-3119Syntax Augmented Machine Translation Via Chart Parsing
W06-3120TALP Phrase-Based Statistical Translation System For European Language Pairs
W06-3121Phramer - An Open Source Statistical Phrase-Based Translator
W06-3122Language Models And Reranking For Machine Translation
W06-3123Constraining The Phrase-Based Joint Probability Statistical Translation Model
W06-3125N-Gram-Based SMT System Enhanced With Reordering Patterns
W06-3601A Syntax-Directed Translator With Extended Domain Of Locality
D07-1006Getting the Structure Right for Word Alignment: LEAF
D07-1007Improving Statistical Machine Translation Using Word Sense Disambiguation
D07-1008Large Margin Synchronous Generation and its Application to Sentence Compression
D07-1030Using RBMT Systems to Produce Bilingual Corpus for SMT
D07-1036Improving Statistical Machine Translation Performance by Training Data Selection and Optimization
D07-1056Phrase Reordering Model Integrating Syntactic Knowledge for SMT
D07-1079What Can Syntax-Based MT Learn from Phrase-Based MT?
D07-1080Online Large-Margin Training for Statistical Machine Translation
D07-1103Improving Translation Quality by Discarding Most of the Phrasetable
D07-1104Hierarchical Phrase-Based Translation with Suffix Arrays
D07-1105An Empirical Study on Computing Consensus Translations from Multiple Machine Translation Systems
N07-1007Generating Case Markers in Machine Translation
N07-1022Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation
N07-1061A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation
N07-1062Efficient Phrase-Table Representation for Machine Translation with Applications to Online MT and Speech Translation
N07-1063An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT
N07-1064Statistical Phrase-Based Post-Editing
N07-2007Combination of Statistical Word Alignments Based on Multiple Preprocessing Schemes
N07-2008A Fast Method for Parallel Document Identification
N07-2009Generalized Graphical Abstractions for Statistical Machine Translation
N07-2015Are Very Large N-Best Lists Useful for SMT?
N07-2053Selective Phrase Pair Extraction for Improved Statistical Machine Translation
P07-1001Guiding Statistical Word Alignment Models With Prior Knowledge
P07-1005Word Sense Disambiguation Improves Statistical Machine Translation
P07-1039Bootstrapping Word Alignment via Word Packing
P07-1059Statistical Machine Translation for Query Expansion in Answer Retrieval
P07-1083Alignment-Based Discriminative String Similarity
P07-1089Forest-to-String Statistical Translation Rules
P07-1090Ordering Phrases with Function Words
P07-1091A Probabilistic Approach to Syntax-based Reordering for Statistical Machine Translation
P07-1092Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora
P07-1108Pivot Language Approach for Phrase-Based Statistical Machine Translation
P07-1119Substring-Based Transliteration
P07-2045Moses: Open Source Toolkit for Statistical Machine Translation
P07-2046Boosting Statistical Machine Translation by Lemmatization and Linear Interpolation
W07-0403Inversion Transduction Grammar for Joint Phrasal Translation Modeling
W07-0406Machine Translation as Tree Labeling
W07-0409Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
W07-0412Probabilistic Synchronous Tree-Adjoining Grammars for Machine Translation: The Argument from Bilingual Dictionaries
W07-0701Using Dependency Order Templates to Improve Generality in Translation
W07-0702CCG Supertags in Factored Statistical Machine Translation
W07-0703Integration of an Arabic Transliteration Module into a Statistical Machine Translation System
W07-0704Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation
W07-0709Meta-Structure Transformation Model for Statistical Machine Translation
W07-0711Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation
W07-0715An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
W07-0716Using Paraphrases for Parameter Tuning in Statistical Machine Translation
W07-0717Mixture-Model Adaptation for SMT
W07-0718(Meta-) Evaluation of Machine Translation
W07-0719Context-aware Discriminative Phrase Selection for Statistical Machine Translation
W07-0721Analysis of Statistical and Morphological Classes to Generate Weigthed Reordering Hypotheses on a Statistical Machine Translation System
W07-0724NRC's PORTAGE System for WMT 2007
W07-0725Building a Statistical Machine Translation System for French Using the Europarl Corpus
W07-0731The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation
W07-1512Computing Translation Units and Quantifying Parallelism in Parallel Dependency Treebanks
C08-1005Improving Alignments for Better Confusion Networks for Combining Machine Translation Systems
C08-1007Enhancing Multilingual Latent Semantic Analysis with Term Alignment Information
C08-1014Regenerating Hypotheses for Statistical Machine Translation
C08-1017Latent Morpho-Semantic Analysis: Multilingual Information Retrieval with Character N-Grams and Mutual Information
C08-1027Syntactic Reordering Integrated with Phrase-Based SMT
C08-1041Improving Statistical Machine Translation using Lexicalized Rule Selection
C08-1064Tera-Scale Translation Models via Pattern Matching
C08-1074Random Restarts in Minimum Error Rate Training for Statistical Machine Translation
C08-1127Linguistically Annotated BTG for Statistical Machine Translation
C08-1137Sentence Type Based Reordering Model for Statistical Machine Translation
C08-1138Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation
C08-1144A Systematic Comparison of Phrase-Based Hierarchical and Syntax-Augmented Statistical MT
C08-2005Phrasal Segmentation Models for Statistical Machine Translation
C08-2032Building a Bilingual Lexicon Using Phrase-based Statistical Machine Translation via a Pivot Language
D08-1010Maximum Entropy based Rule Selection Model for Syntax-based Statistical Machine Translation
D08-1011Indirect-HMM-based Hypothesis Alignment for Combining Outputs from Machine Translation Systems
D08-1021Syntactic Constraints on Paraphrases Extracted from Parallel Corpora
D08-1024Online Large-Margin Training of Syntactic and Structural Translation Features
D08-1051Improving Interactive Machine Translation via Mouse Actions
D08-1059A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing
D08-1066Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective
D08-1078Predicting Success in Machine Translation
D08-1089A Simple and Effective Hierarchical Phrase Reordering Model
I08-1033Improving Word Alignment by Adjusting Chinese Word Segmentation
I08-1064Projection-based Acquisition of a Temporal Labeller
I08-1067Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation
I08-2087A Structured Prediction Approach for Statistical Machine Translation
I08-2088Method of Selecting Training Data to Build a Compact and Efficient Translation Model
I08-8001Transformation-based Sentence Splitting method for Statistical Machine Translation
L08-1131Improving Statistical Machine Translation Efficiency by Triangulation
L08-1185Building a Golden Collection of Parallel Multi-Language Word Alignment
L08-1485MISTRAL: a Statistical Machine Translation Decoder for Speech Recognition Lattices
L08-1566Evaluation of a Machine Translation System for Low Resource Languages: METIS-II
L08-1567Using Reordering in Statistical Machine Translation based on Alignment Block Classification
P08-1009Cohesive Phrase-Based Decoding for Statistical Machine Translation
P08-1010Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?
P08-1011Measure Word Generation for English-Chinese SMT Systems
P08-1023Forest-Based Translation
P08-1024A Discriminative Latent Variable Model for Statistical Machine Translation
P08-1049Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora
P08-1064A Tree Sequence Alignment-based Tree-to-Tree Translation Model
P08-1089Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora
P08-1113Mining Parenthetical Translations from the Web by Word Alignment
P08-1114Soft Syntactic Constraints for Hierarchical Phrased-Based Translation
P08-1115Generalizing Word Lattice Translation
P08-1116Combining Multiple Resources to Improve SMT-based Paraphrasing Model
P08-2007The Complexity of Phrase Alignment Problems
P08-2041Partial Matching Strategy for Phrase-based Statistical Machine Translation
W08-0301An Empirical Study in Source Word Deletion for Phrase-Based Statistical Machine Translation
W08-0302Rich Source-Side Context for Statistical Machine Translation
W08-0303Discriminative Word Alignment via Alignment Matrix Modeling
W08-0304Regularization and Search for Minimum Error Rate Training
W08-0305Learning Performance of a Machine Translation System: a Statistical and Computational Analysis
W08-0306Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation
W08-0309Further Meta-Evaluation of Machine Translation
W08-0310Limsi’s Statistical Translation Systems for WMT’08
W08-0313First Steps towards a General Purpose French/English Statistical Machine Translation System
W08-0314The University of Washington Machine Translation System for ACL WMT 2008
W08-0316European Language Translation with Weighted Finite State Transducers: The CUED MT System for the 2008 ACL Workshop on SMT
W08-0318Towards better Machine Translation Quality for the German-English Language Pairs
W08-0322Kernel Regression Framework for Machine Translation: UCL System Description for WMT 2008 Shared Translation Task
W08-0326MaTrEx: The DCU MT System for WMT 2008
W08-0333Fast Easy and Cheap: Construction of Statistical Machine Translation Models with MapReduce
W08-0334Dynamic Model Interpolation for Statistical Machine Translation
W08-0335Improved Statistical Machine Translation by Multiple Chinese Word Segmentation
W08-0336Optimizing Chinese Word Segmentation for Machine Translation Performance
W08-0403Prior Derivation Models For Formally Syntax-Based Translation Using Linguistically Syntactic Parsing and Tree Kernels
W08-0404Generalizing Local Translation Models
W08-0405A Rule-Driven Dynamic Programming Decoder for Statistical MT
W08-0406Syntactic Reordering Integrated with Phrase-Based SMT
W08-0408Multiple Reorderings in Phrase-Based Machine Translation
W08-0409Improving Word Alignment Using Syntactic Dependencies
W08-0411Syntax-Driven Learning of Sub-Sentential Translation Equivalents and Translation Rules from Parsed Parallel Corpora




Top Similar Papers
By Title
ID Title
P03-1039Chunk-Based Statistical Translation
W07-0728Rule-Based Translation with Statistical Phrase-Based Post-Editing
N07-1064Statistical Phrase-Based Post-Editing
P05-2016Dependency-Based Statistical Machine Translation
P01-1067A Syntax-Based Statistical Translation Model
N04-1033Improvements In Phrase-Based Statistical Machine Translation
P06-2107Statistical Phrase-Based Models For Interactive Computer-Assisted Translation
P07-1089Forest-to-String Statistical Translation Rules
C88-1016A Statistical Approach To Language Translation
J90-2002A Statistical Approach To Machine Translation


By Abstract
ID Title
W02-1018A Phrase-Based Joint Probability Model For Statistical Machine Translation
H05-1022HMM Word And Phrase Alignment For Statistical Machine Translation
D07-1079What Can Syntax-Based MT Learn from Phrase-Based MT?
H05-1011A Discriminative Framework For Bilingual Word Alignment
H90-1057Representation Quality In Text Classification: An Introduction And Experiment
H05-1095Translating With Non-Contiguous Phrases
W99-0604Improved Alignment Models For Statistical Machine Translation
W07-0733Experiments in Domain Adaptation for Statistical Machine Translation
N07-2022Discriminative Alignment Training without Annotated Data for Machine Translation
P98-2221Modeling With Structures In Statistical Machine Translation


By Full Text
ID Title
N03-1019A Weighted Finite State Transducer Implementation Of The Alignment Template Model For Statistical Machine Translation
P01-1027Refined Lexicon Models For Statistical Machine Translation Using A Maximum Entropy Approach
P06-1090A Clustered Global Phrase Reordering Model For Statistical Machine Translation
P06-1096An End-To-End Discriminative Approach To Machine Translation
P01-1030Fast Decoding And Optimal Decoding For Machine Translation
P05-1069A Localized Prediction Model For Statistical Machine Translation
W06-1008A Fast And Accurate Method For Detecting English-Japanese Parallel Texts
P06-1091A Discriminative Global Training Algorithm For Statistical MT
P06-1139Stochastic Language Generation Using WIDL-Expressions And Its Application In Machine Translation And Summarization
W01-1404Approximating Context-Free By Rational Transduction For Example-Based MT


By Co-citation
ID Title Num Co-citations
P03-1021Minimum Error Rate Training In Statistical Machine Translation 111
J93-2003The Mathematics Of Statistical Machine Translation: Parameter Estimation 97
J03-1002A Systematic Comparison Of Various Statistical Alignment Models 88
P02-1040Bleu: A Method For Automatic Evaluation Of Machine Translation 87
P05-1033A Hierarchical Phrase-Based Model For Statistical Machine Translation 59
P02-1038Discriminative Training And Maximum Entropy Models For Statistical Machine Translation 51
J04-4002The Alignment Template Approach To Statistical Machine Translation 51
P00-1056Improved Statistical Alignment Models 48
J97-3002Stochastic Inversion Transduction Grammars And Bilingual Parsing Of Parallel Corpora 41
W02-1018A Phrase-Based Joint Probability Model For Statistical Machine Translation 41


Citation Summary
Citing sentences
I08-1033 1 23:155 However for remedy, many of the current word alignment methods combine the results of both alignment directions, via intersection or 249 grow-diag-final heuristic, to improve the alignment reliability (Koehn et al., 2003; Liang et al., 2006; Ayan et al., 2006; DeNero et al., 2007).
I08-1033 2 134:155 The phrasebased machine translation (Koehn et al., 2003) uses the grow-diag-final heuristic to extend the word alignment to phrase alignment by using the intersection result.
W06-1607 3 16:162 Traditionally, maximum-likelihood estimation from relative frequencies is used to obtain conditional probabilities (Koehn et al. , 2003), eg, p(s|t) = c(s,t)/summationtexts c(s,t) (since the estimation problems for p(s|t) and p(t|s) are symmetrical, we will usually refer only to p(s|t) for brevity).
W06-1607 4 33:162 The features used in this study are: the length of t; a single-parameter distortion penalty on phrase reordering in a, as described in (Koehn et al. , 2003); phrase translation model probabilities; and trigram language model probabilities logp(t), using Kneser-Ney smoothing as implemented in the SRILM toolkit (Stolcke, 2002).
W06-1607 5 36:162 To derive the joint counts c(s,t) from which p(s|t) and p(t|s) are estimated, we use the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
W06-1607 6 59:162 This is the traditional approach for glass-box smoothing (Koehn et al. , 2003; Zens and Ney, 2004).
P07-1001 7 121:185 Our decoder is a phrase-based multi-stack imple5 mentation of the log-linear model similar to Pharaoh (Koehn et al. , 2003).
P06-1090 8 26:135 (Koehn et al. , 2003) used the following distortion model, which simply penalizes nonmonotonic phrase alignments based on the word distance of successively translated source phrases with an appropriate value for the parameter a71, a36a51a4a39a38 a33 a40a52a42 a33a53a45 a32 a8 a10 a71a26a72a73a25a74 a45a62a75 a74a77a76a24a78 a45 a32 a72 (3) a79a17a80a82a81a84a83a85a15a86a88a87a70a89a91a90 languageis a means communication of MG RA RA b1 b2 b3 b4 Figure 1: Phrase alignment and reordering bi-1 bi fi-1 fi ei-1 ei bi-1 bi fi-1 fi ei-1 ei bi-1 bi fi-1 fi ei-1 ei bi-1 bi fi-1 fi ei-1 ei source target target source target target source source d=MA d=MG d=RA d=RG Figure 2: Four types of reordering patterns 3 The Global Phrase Reordering Model Figure 1 shows an example of Japanese-English phrase alignment that consists of four phrase pairs.
P06-1090 9 24:135 The translation model used in (Koehn et al. , 2003) is the product of translation probability a34a35a4 a29 a0 a33 a6 a29 a2 a33 a8 and distortion probability a36a37a4a39a38 a33a41a40a43a42a44a33a46a45 a32 a8, a3a5a4a35a29 a0 a30 a32 a6 a29 a2 a30 a32 a8 a10 a30 a47 a33a49a48 a32 a34a35a4 a29 a0a22a33 a6 a29 a2 a33a50a8 a36a51a4a39a38 a33 a40a52a42 a33a53a45 a32 a8 (1) where a38 a33 denotes the start position of the source phrase translated into the a54 -th target phrase, and a42 a33a53a45 a32 denotes the end position of the source phrase translated into the a4a53a54 a40a56a55 a8 -th target phrase.
P06-1090 10 9:135 a1 Graduated in March 2006 Standard phrase-based translation systems use a word distance-based reordering model in which non-monotonic phrase alignment is penalized based on the word distance between successively translated source phrases without considering the orientation of the phrase alignment or the identities of the source and target phrases (Koehn et al. , 2003; Och and Ney, 2004).
P06-1090 11 69:135 For comparison, we also implemented a different N-best phrase alignment method, where _ _ _ _ the_light_was_red _ _ _ the_light was_red _ _ the_light was red (1) (2) (3) Figure 4: N-best phrase alignments phrase pairs are extracted using the standard phrase extraction method described in (Koehn et al. , 2003).
P06-1090 12 117:135 Here, ppicker shows the accuracy when phrases are extracted by using the N-best phrase alignment method described in Section 4.1, while growdiag-final shows the accuracy when phrases are extracted using the standard phrase extraction algorithm described in (Koehn et al. , 2003).
P07-1039 13 103:170 4.3 Baseline We use a standard log-linear phrase-based statistical machine translation system as a baseline: GIZA++ implementation of IBM word alignment model 4 (Brown et al. , 1993; Och and Ney, 2003),8 the refinement and phrase-extraction heuristics described in (Koehn et al. , 2003), minimum-error-rate training 7More specifically, we choose the first English reference from the 7 references and the Chinese sentence to construct new sentence pairs.
P07-1039 14 26:170 To quickly (and approximately) evaluate this phenomenon, we trained the statistical IBM wordalignment model 4 (Brown et al. , 1993),1 using the GIZA++ software (Och and Ney, 2003) for the following language pairs: ChineseEnglish, Italian English, and DutchEnglish, using the IWSLT-2006 corpus (Takezawa et al. , 2002; Paul, 2006) for the first two language pairs, and the Europarl corpus (Koehn, 2005) for the last one.
P07-1039 15 109:170 Running words 1,864 14,437 Vocabulary size 569 1,081 Table 2: ChineseEnglish corpus statistics (Och, 2003) using Phramer (Olteanu et al. , 2006), a 3-gram language model with Kneser-Ney smoothing trained with SRILM (Stolcke, 2002) on the English side of the training data and Pharaoh (Koehn, 2004) with default settings to decode.
P07-1083 16 78:185 A similar use of the term phrase exists in machine translation, where phrases are often pairs of word sequences consistent with word-based alignments (Koehn et al. , 2003).
N07-2008 17 12:110 They have been employed in word sense disambiguation (Diab and Resnik, 2002), automatic construction of bilingual dictionaries (McEwan et al. , 2002), and inducing statistical machine translation models (Koehn et al. , 2003).
W07-1512 18 44:121 However, many of these models are not applicable to parallel treebanks because they assume translation units where either the source text, the target text or both are represented as word sequences without any syntactic structure (Galley et al. , 2004; Marcu et al. , 2006; Koehn et al. , 2003).
D07-1103 19 23:214 These joint counts are estimated using the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
D07-1103 20 28:214 The features used are: the length of t; a single-parameter distortion penalty on phrase reordering in a, as described in (Koehn et al. , 2003); phrase translation model probabilities; and 4-gram language model probabilities logp(t), using Kneser-Ney smoothing as implemented in the SRILM toolkit (Stolcke, 2002).
N04-4026 21 6:109 1 Introduction In recent years, phrase-based systems for statistical machine translation (Och et al. , 1999; Koehn et al. , 2003; Venugopal et al. , 2003) have delivered state-of-the-art performance on standard translation tasks.
W07-0716 22 6:171 1 Introduction Viewed at a very high level, statistical machine translationinvolvesfourphases: languageandtranslation model training, parameter tuning, decoding, and evaluation (Lopez, 2007; Koehn et al. , 2003).
W07-0716 23 36:171 ??Initial phrase pairs are identified following the procedure typically employed in phrase based systems (Koehn et al. , 2003; Och and Ney, 2004).
W07-0716 24 71:171 We use the following features in our induced English-to-English grammar:3 3Hiero also uses lexical weights (Koehn et al. , 2003) in both 122 ??The joint probability of the two English hierarchical paraphrases, conditioned on the nonterminal symbol, as defined by this formula: p(e1, e2|x) = c(X ???e1, e2??summationtext e1prime, e2prime c(X ???e1prime, e2prime??
P07-1091 25 72:196 The implementation is similar to the idea of lexical weight in (Koehn et al. , 2003): all points in the alignment matrices of the entire training corpus are collected to calculate the probabilistic distribution, P(t|s), of some TL word 3Some readers may prefer the expression the subtree rooted at node N to node N. The latter term is used in this paper for simplicity.
P07-1091 26 153:196 The translation table is obtained as described in (Koehn et al. , 2003), i.e. the alignment tool GIZA++ is run over the training data in both translation directions, and the two alignTest Setting BLEU B1 standard phrase-based SMT 29.22 B2 (B1) + clause splitting 29.13 Table 2: Experiment Baseline Test Setting BLEU BLEU 2-ary 2,3-ary 1 rule 29.77 30.31 2 ME (phrase label) 29.93 30.49 3 ME (left,right) 30.10 30.53 4 ME ((3)+head) 30.24 30.71 5 ME ((3)+phrase label) 30.12 30.30 6 ME ((4)+context) 30.24 30.76 Table 3: Tests on Various Reordering Models The 3rd column comprises the BLEU scores obtained by reordering binary nodes only, the 4th column the scores by reordering both binary and 3-ary nodes.
P07-1091 27 8:196 For example, the distancebased reordering model (Koehn et al. , 2003) allows a decoder to translate in non-monotonous order, under the constraint that the distance between two phrases translated consecutively does not exceed a limit known as distortion limit.
D07-1056 28 26:196 There have been considerable amount of efforts to improve the reordering model in SMT systems, ranging from the fundamental distance-based distortion model (Och and Ney, 2004; Koehn et al. , 2003), flat reordering model (Wu, 1996; Zens et al. , 2004; Kumar et al. , 2005), to lexicalized reordering model (Tillmann, 2004; Kumar et al. , 2005; Koehn et al. , 2005), hierarchical phrase-based model (Chiang, 2005), and maximum entropy-based phrase reordering model (Xiong et al. , 2006).
D07-1056 29 12:196 In phrase-based SMT systems (Koehn et al. , 2003; Koehn, 2004), foreign sentences are firstly segmented into phrases which consists of adjacent words.
W07-0704 30 71:182 We employ the phrase-based SMT framework (Koehn et al. , 2003), and use the Moses toolkit (Koehn et al. , 2007), and the SRILM language modelling toolkit (Stolcke, 2002), and evaluate our decoded translations using the BLEU measure (Papineni et al. , 2002), using a single reference translation.
W05-0820 31 61:91 (2004)), better language-specific preprocessing (Koehn and Knight, 2003) and restructuring (Collins et al. , 2005), additional feature functions such as word class language models, and minimum error rate training (Och, 2003) to optimize parameters.
W06-3123 32 107:111 For the future, the joint model would benefit from lexical weighting like that used in the standard model (Koehn et al. , 2003).
W06-3123 33 88:111 On smaller data sets (Koehn et al. , 2003) the joint model shows performance comparable to the standard model, however the joint model does not reach the level of performance of the stan156 EN-ES ES-EN Joint 3-gram, dl4 20.51 26.64 5-gram, dl6 26.34 27.17 + lex.
W06-3123 34 23:111 154 2 Translation Models 2.1 Standard Phrase-based Model Most phrase-based translation models (Och, 2003; Koehn et al. , 2003; Vogel et al. , 2003) rely on a pre-existing set of word-based alignments from which they induce their parameters.
N06-1003 35 19:146 2 The Problem of Coverage in SMT Statistical machine translation made considerable advances in translation quality with the introduction of phrase-based translation (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004).
W06-3119 36 46:125 Baseline Pharaoh with phrases extracted from IBM Model 4 training with maximum phrase length 7 and extraction method diag-growthfinal (Koehn et al. , 2003a) Lex Phrase-decoder simulation: using only the initial lexical rules from the phrase table, all with LHS X, the Glue rule, and a binary reordering rule with its own reordering-feature XCat All nonterminals merged into a single X nonterminal: simulation of the system Hiero (Chiang, 2005).
W06-3119 37 9:125 The hierarchical translation operations introduced in these methods call for extensions to the traditional beam decoder (Koehn et al. , 2003a).
W06-3119 38 18:125 138 2 Rule Generation We start with phrase translations on the parallel training data using the techniques and implementation described in (Koehn et al. , 2003a).
W06-3119 39 33:125 We use the following features for our rules: sourceand target-conditioned neg-log lexical weights as described in (Koehn et al. , 2003b) neg-log relative frequencies: left-handside-conditioned, target-phrase-conditioned, source-phrase-conditioned Counters: n.o. rule applications, n.o. target words Flags: IsPurelyLexical (i.e. , contains only terminals), IsPurelyAbstract (i.e. , contains only nonterminals), IsXRule (i.e. , non-syntactical span), IsGlueRule 139 Penalties: rareness penalty exp(1 RuleFrequency); unbalancedness penalty |MeanTargetSourceRatio n.o. source words n.o. target words| 4 Parsing Our SynCFG rules are equivalent to a probabilistic context-free grammar and decoding is therefore an application of chart parsing.
W06-3119 40 44:125 5 Results We present results that compare our system against the baseline Pharaoh implementation (Koehn et al. , 2003a) and MER training scripts provided for this workshop.
W06-3119 41 7:125 1 Introduction Recent work in machine translation has evolved from the traditional word (Brown et al. , 1993) and phrase based (Koehn et al. , 2003a) models to include hierarchical phrase models (Chiang, 2005) and bilingual synchronous grammars (Melamed, 2004).
W06-3102 42 65:125 Table 2: The set of tags used to mark explicit morphemes in English Tag Meaning JJR Adjective, comparative JJS Adjective, superlative NNS Noun, plural POS Possessive ending RBR Adverb, comparative RBS Adverb, superlative VB Verb, base form VBD Verb, past tense VBG Verb, gerund or present participle VBN Verb, past participle VBP Verb, non3rd person singular present VBZ Verb, 3rd person singular present Figure 2: Morpheme alignment between a Turkish and an English sentence 4 Experiments We proceeded with the following sequence of experiments: (1) Baseline: As a baseline system, we used a pure word-based approach and used Pharaoh Training tool (2004), to train on the 22,500 sentences, and decoded using Pharaoh (Koehn et al. , 2003) to obtain translations for a test set of 50 sentences.
P07-1089 43 137:179 We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions using its default setting, and then applied the refinement rule diagand described in (Koehn et al. , 2003) to obtain a single many-to-many word alignment for each sentence pair.
P07-1089 44 152:179 We compared our system Lynx against a freely available phrase-based decoder Pharaoh (Koehn et al. , 2003).
P07-1119 45 30:211 The phrase-based approach developed for statistical machine translation (Koehn et al. , 2003) is designed to overcome the restrictions on many-tomany mappings in word-based translation models.
P07-1119 46 75:211 Starting from a word-based alignment for each pair of sentences, the training for the algorithm accepts all contiguous bilingual phrase pairs (up to a predetermined maximum length) whose words are only aligned with each other (Koehn et al. , 2003).
W07-0719 47 32:244 159 2.1 Baseline System The baseline system is a phrase-based SMT system (Koehn et al. , 2003), built almost entirely using freely available components.
W07-0719 48 9:244 1 Introduction Translations tables in Phrase-based Statistical Machine Translation (SMT) are often built on the basis of Maximum-likelihood Estimation (MLE), being one of the major limitations of this approach that the source sentence context in which phrases occur is completely ignored (Koehn et al. , 2003).
P08-1089 49 120:250 LW was originally used to validate the quality of a phrase translation pair in MT (Koehn et al., 2003).
H05-1024 50 126:187 We computed precision, recall and error rate on the entire set for each data set.6 For an initial alignment, we used GIZA++ in both directions (E-to-F and F-to-E, where F is either Chinese (C) or Spanish (S)), and also two different combined alignments: intersection of E-to-F and F-to-E; and RA using a heuristic combination approach called grow-diag-final (Koehn et al. , 2003).
H05-1024 51 30:187 The standard method to overcome this problem to use the model in both directions (interchanging the source and target languages) and applying heuristic-based combination techniques to produce a refined alignment (Och and Ney, 2000; Koehn et al. , 2003)henceforth referred to as RA. Several researchers have proposed algorithms for improving word alignment systems by injecting additional knowledge or combining different alignment models.
H05-1024 52 59:187 For our experiments, we chose GIZA++ (Och and Ney, 2000) and the RA approach (Koehn et al. , 2003) the best known alignment combination technique as our initial aligners.1 4.2 TBL Templates Our templates consider consecutive words (of size 1, 2 or 3) in both languages.
N07-1063 53 121:163 Grammar rules were induced with the syntaxbased SMT system SAMT described in (Zollmann and Venugopal, 2006), which requires initial phrase alignments that we generated with GIZA++ (Koehn et al. , 2003), and syntactic parse trees of the target training sentences, generated by the Stanford Parser (D. Klein, 2003) pre-trained on the Penn Treebank.
P07-1059 54 15:239 We present two approaches to SMT-based query expansion, both of which are implemented in the framework of phrase-based SMT (Och and Ney, 2004; Koehn et al. , 2003).
P07-1059 55 61:239 4 SMT-Based Query Expansion Our SMT-based query expansion techniques are based on a recent implementation of the phrasebased SMT framework (Koehn et al. , 2003; Och and Ney, 2004).
D08-1021 56 47:209 They give a probabilistic formation of paraphrasing which naturally falls out of the fact that they use techniques from phrase-based statistical machine translation: e2 = argmax e2:e2negationslash=e1 p(e2|e1) (1) where p(e2|e1) = summationdisplay f p(f|e1)p(e2|f,e1) (2) summationdisplay f p(f|e1)p(e2|f) (3) Phrase translation probabilities p(f|e1) and p(e2|f) are commonly calculated using maximum likelihood estimation (Koehn et al., 2003): p(f|e) = count(e,f)summationtext f count(e,f) (4) where the counts are collected by enumerating all bilingual phrase pairs that are consistent with the 197 conseguido .opportunitiesequalcreatetofailedhasprojecteuropeanthe oportunidadesdeigualdadlahanoeuropeoproyectoel Figure 1: The interaction of the phrase extraction heuristic with unaligned English words means that the Spanish phrase la igualdad aligns with equal, create equal, and to create equal.
W05-0830 57 62:80 The inclusion of phrases longer than three words in translation resources has been avoided, as it has been shown not to have a strong impact on translation performance [Koehn et al. , 2003].
W05-0830 58 4:80 Whereas language generation has benefited from syntax [Wu, 1997; Alshawi et al. , 2000], the performance of statistical phrase-based machine translation when relying solely on syntactic phrases has been reported to be poor [Koehn et al. , 2003].
P05-1068 59 10:175 Recently, various works have improved the quality of statistical machine translation systems by using phrase translation (Koehn et al. , 2003; Marcu et al. , 2002; Och et al. , 1999; Och and Ney, 2000; Zens et al. , 2004).
N07-2009 60 53:91 For comparison, we use the MT training program, GIZA++ (Och and Ney, 2003), the phrase-base decoder, Pharaoh (Koehn et al. , 2003), and the wordbased decoder, Rewrite (Germann, 2003).
W06-1608 61 6:168 It has been shown that phrasal machine translation systems are not affected by the quality of the input word alignments (Koehn et al. , 2003).
W06-1608 62 33:168 This dependency graph is partitioned into treelets; like (Koehn et al. , 2003), we assume a uniform probability distribution over all partitions.
W06-1608 63 119:168 In Englishto-German, this result produces results very comparable to a phrasal SMT system (Koehn et al. , 2003) trained on the same data.
W08-0409 64 108:167 73 ment and phrase-extraction heuristics described in (Koehn et al., 2003), minimum-error-rate training (Och, 2003), a trigram language model with KneserNey smoothing trained with SRILM (Stolcke, 2002) on the English side of the training data, and Moses (Koehn et al., 2007) to decode.
W08-2119 65 112:223 We use the same featuresas (Koehnet al., 2003).
W08-2119 66 25:223 This system uses all featuresof conventionalphrase-basedSMT as in (Koehn et al., 2003).
W08-2119 67 204:223 Our method does not suppose a uniform distribution over all possible phrase segmentationsas (Koehn et al., 2003) since each phrase tree has a probability.
P05-1069 68 107:243 3.4 Lexical Weighting The lexical weight a27 a14a12a91 a29 a92a93a21 of the block a9 a72 a14a12a91 a19a86a92a93a21 is computed similarly to (Koehn et al. , 2003), but the lexical translation probability a27 a14a12a94 a29 a97a100a21 is derived from the block set itself rather than from a word alignment, resulting in a simplified training.
P05-1069 69 30:243 2 Block Orientation Bigrams This section describes a phrase-based model for SMT similar to the models presented in (Koehn et al. , 2003; Och et al. , 1999; Tillmann and Xia, 2003).
P05-1069 70 168:243 Two block sets are derived for each of the training sets using a phrase-pair selection algorithm similar to (Koehn et al. , 2003; Tillmann and Xia, 2003).
P05-1069 71 65:243 Lexical Weighting: (e) the lexical weight a27 a14a12a91 a29 a92a93a21 of the block a9 a72 a14a12a91 a19a86a92a93a21 is computed similarly to (Koehn et al. , 2003), details are given in Section 3.4.
W07-0703 72 71:186 To generate phrase pairs from a parallel corpus, we use the "diag-and" phrase induction algorithm described in (Koehn et al, 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al, 1993).
W07-0703 73 67:186 Portage is a statistical phrase-based SMT system similar to Pharaoh (Koehn et al, 2003).
W06-3112 74 46:135 Word alignment and phrase extraction We used the GIZA++ word alignment software 3 to produce initial word alignments for our miniature bilingual corpus consisting of the source French file and the English reference file, and the refined word alignment strategy of (Och and Ney, 2003; Koehn et al. , 2003; Tiedemann, 2004) to obtain improved word and phrase alignments.
W03-1001 75 164:171 A word link extension algorithm similar to the one presented in this paper is given in (Koehn et al. , 2003).
C08-1014 76 6:197 1 Introduction State-of-the-art Statistical Machine Translation (SMT) systems usually adopt a two-pass search strategy (Och, 2003; Koehn, et al., 2003) as shown in Figure 1.
C08-1014 77 37:197 Our MT baseline system is based on Moses decoder (Koehn et al., 2007) with word alignment obtained from GIZA++ (Och et al., 2003).
W05-0829 78 5:80 1 Introduction In recent years, various phrase translation approaches (Marcu and Wong, 2002; Och et al. , 1999; Koehn et al. , 2003) have been shown to outperform word-to-word translation models (Brown et al. , 1993).
W08-0309 79 105:288 The phrases in the translations were located using standard phrase extraction techniques (Koehn et al., 2003).
P06-2107 80 71:185 The second one is heuristic and tries to use a wordaligned corpus (Zens et al. , 2002; Koehn et al. , 2003).
P07-1108 81 59:179 Thus, equation (3) can be rewritten as = i p i iii i i eppfef )|()|()|( (4) 4.2 Lexical Weight Given a phrase pair ),( ef and a word alignment a between the source word positions ni,,1= and the target word positions mj,,1=, the lexical weight can be estimated according to the following method (Koehn et al. , 2003).
P07-1108 82 8:179 1 Introduction For statistical machine translation (SMT), phrasebased methods (Koehn et al. , 2003; Och and Ney, 2004) and syntax-based methods (Wu, 1997; Alshawi et al. 2000; Yamada and Knignt, 2001; Melamed, 2004; Chiang, 2005; Quick et al. , 2005; Mellebeek et al. , 2006) outperform word-based methods (Brown et al. , 1993).
P07-1108 83 51:179 3 Phrase-Based SMT According to the translation model presented in (Koehn et al. , 2003), given a source sentence f, the best target translation best e can be obtained according to the following model )( )()|(maxarg )|(maxarg e e e eef fee length LM best pp p = = (1) Where the translation model )|( efp can be decomposed into = = I i i iii i i II aefpbadef efp 1 1 1 1 ),|()()|( )|( w (2) Where )|( i i ef and )( 1 ii bad denote phrase translation probability and distortion probability, respectively.
P07-1090 84 18:208 The basic phrase reordering model is a simple unlexicalized, context-insensitive distortion penalty model (Koehn et al. , 2003).
P07-1090 85 196:208 However, the pb features yields no noticeable improvement unlike in prefect lexical choice scenario; this is similar to the findings in (Koehn et al. , 2003).
W08-0333 86 29:168 In this paper we present MapReduce implementations of training algorithms for two kinds of models commonly used in statistical MT today: a phrasebased translation model (Koehn et al., 2003) and word alignment models based on pairwise lexical translation trained using expectation maximization (Dempster et al., 1977).
W08-0333 87 85:168 4 Phrase-Based Translation In phrase-based translation, the translation process is modeled by splitting the source sentence into phrases (a contiguous string of words) and translating the phrases as a unit (Och et al., 1999; Koehn et al., 2003).
W07-0717 88 28:158 2 Phrase-based Statistical MT Our baseline is a standard phrase-based SMT system (Koehn et al. , 2003).
W07-0717 89 35:158 To derive the joint counts c(?s,?t) from which p(?s|?t) and p(?t|?s) are estimated, we use the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
W07-0717 90 31:158 The features used in this study are: the length of t; a single-parameter distortion penalty on phrase reordering in a, as described in (Koehn et al. , 2003); phrase translation model probabilities; and 4-gram language model probabilities logp(t), using Kneser-Ney smoothing as implemented in the SRILM toolkit.
P08-1049 91 66:184 However, since most of statistical translation models (Koehn et al., 2003; Chiang, 2007; Galley et al., 2006) are symmetrical, it is relatively easy to train a translation system to translate from English to Chinese, except that weneed to train aChinese language model from the Chinese monolingual data.
P08-1049 92 15:184 While the research in statistical machine translation (SMT) has made significant progress, most SMT systems (Koehn et al., 2003; Chiang, 2007; Galleyetal., 2006) relyonparallel corpora toextract translation entries.
N06-1002 93 160:217 As an additional baseline, we compare against a phrasal SMT decoder, Pharaoh (Koehn et al. 2003).
N06-1002 94 174:217 We used the heuristic combination described in (Och and Ney 2003) and extracted phrasal translation pairs from this combined alignment as described in (Koehn et al. , 2003).
P06-1091 95 67:210 Word-based features are used as well, e.g. feature a75 a11a39a99a78a99a18a11 captures word-to-word translation de4On our test set, (Tillmann and Zhang, 2005) reports a BLEU score of a100a63a101a63a102a43a103 and (Ittycheriah and Roukos, 2005) reports a BLEU score of a104a89a103a63a102 a105 . pendencies similar to the use of Model a98 probabilities in (Koehn et al. , 2003).
P06-1091 96 139:210 The block set is generated using a phrase-pair selection algorithm similar to (Koehn et al. , 2003; Al-Onaizan et al. , 2004), which includes some heuristic filtering to mal statement here.
P04-1064 97 7:189 It is important because a wordaligned corpus is typically used as a first step in order to identify phrases or templates in phrase-based Machine Translation (Och et al. , 1999), (Tillmann and Xia, 2003), (Koehn et al. , 2003, sec.
W06-3122 98 15:91 It generates a vector of 5 numeric values for each phrase pair: phrase translation probability: ( f|e) = count( f, e) count(e),(e| f) = count( f, e) count( f) 2http://www.phramer.org/ Java-based open-source phrase based SMT system 3http://www.isi.edu/licensed-sw/carmel/ 4http://www.speech.sri.com/projects/srilm/ 5http://www.iccs.inf.ed.ac.uk/pkoehn/training.tgz 150 lexical weighting (Koehn et al. , 2003): lex( f|e,a) = nproductdisplay i=1 1 |{j|(i, j) a}| summationdisplay (i,j)a w(fi|ej) lex(e|f,a) = mproductdisplay j=1 1 |{i|(i, j) a}| summationdisplay (i,j)a w(ej|fi) phrase penalty: ( f|e) = e; log(( f|e)) = 1 2.2 Decoding We used the Pharaoh decoder for both the Minimum Error Rate Training (Och, 2003) and test dataset decoding.
D07-1008 99 213:382 Our corpora were automatically aligned with Giza++ (Och et al. , 1999) in both directions between source and target and symmetrised using the intersection heuristic (Koehn et al. , 2003).
W06-3121 100 25:69 We generated for each phrase pair in the translation table 5 features: phrase translation probability (both directions), lexical weighting (Koehn et al. , 2003) (both directions) and phrase penalty (constant value).
H05-1098 101 30:140 The basic model uses the following features, analogous to Pharaohs default feature set: P( | ) and P( | ) the lexical weights Pw( | ) and Pw( | ) (Koehn et al. , 2003);1 a phrase penalty exp(1); a word penalty exp(l), where l is the number of terminals in .
H05-1098 102 8:140 The need for some way to model aspects of syntactic behavior, such as the tendency of constituents to move together as a unit, is widely recognizedthe role of syntactic units is well attested in recent systematic studies of translation (Fox, 2002; Hwa et al. , 2002; Koehn and Knight, 2003), and their absence in phrase-based models is quite evident when looking at MT system output.
H05-1098 103 42:140 The feature weights are learned by maximizing the BLEU score (Papineni et al. , 2002) on held-out data,usingminimum-error-ratetraining(Och,2003) as implemented by Koehn.
N04-1033 104 264:290 In (Koehn et al. , 2003), various aspects of phrase-based systems are compared, e.g. the phrase extraction method, the underlying word alignment model, or the maximum phrase length.
D07-1105 105 16:270 For instance, word alignment models are often trained using the GIZA++ toolkit (Och and Ney, 2003); error minimizing training criteria such as the Minimum Error Rate Training (Och, 2003) are employed in order to learn feature function weights for log-linear models; and translation candidates are produced using phrase-based decoders (Koehn et al. , 2003) in combination with n-gram language models (Brants et al. , 2007).
P07-2046 106 32:108 It is an extension of Pharaoh (Koehn et al. , 2003), and supports factor training and decoding.
P08-1064 107 41:210 However, most of them fail to utilize non-syntactic phrases well that are proven useful in the phrase-based methods (Koehn et al., 2003).
P08-1064 108 8:210 1 Introduction Phrase-based modeling method (Koehn et al., 2003; Och and Ney, 2004a) is a simple, but powerful mechanism to machine translation since it can model local reorderings and translations of multiword expressions well.
P06-1067 109 75:241 Similarly, (Koehn et al. , 2003) propose a relative distortion model to be used with a phrase decoder.
P06-1067 110 62:241 However, their decoder is outperformed by phrase-based decoders such as (Koehn, 2004), (Och et al. , 1999), and (Tillmann and Ney, 2003).
W08-0310 111 15:96 translation systems (Och and Ney, 2004; Koehn et al., 2003) and use Moses (Koehn et al., 2007) to search for the best target sentence.
I08-1064 112 144:193 Although bi-alignments are known to exhibit high precision (Koehn et al., 2003), in the face of sparse annotations we use unidirectional alignments as a fallback, as has been proposed in the context of phrase-based machine translation (Koehn et al., 2003; Tillmann, 2003).
W08-1501 113 80:131 To generate the n-best lists, a phrase based SMT (Koehn et al., 2003) was used.
N06-1032 114 5:167 1 Introduction Recent approaches to statistical machine translation (SMT) piggyback on the central concepts of phrasebased SMT (Och et al. , 1999; Koehn et al. , 2003) and at the same time attempt to improve some of its shortcomings by incorporating syntactic knowledge in the translation process.
H05-1023 115 8:217 1 Introduction Todays statistical machine translation systems rely on high quality phrase translation pairs to acquire state-of-the-art performance, see (Koehn et al. , 2003; Zens and Ney, 2004; Och and Ney, 2003).
W07-0403 116 199:234 Two are conditionalized phrasal models, each EM trained until performance degrades: C-JPTM3 as described in (Birch et al. , 2006) Phrasal ITG as described in Section 4.1 Three provide alignments for the surface heuristic: GIZA++ with grow-diag-final (GDF) Viterbi Phrasal ITG with and without the noncompositional constraint We use the Pharaoh decoder (Koehn et al. , 2003) with the SMT Shared Task baseline system (Koehn and Monz, 2006).
W07-0403 117 21:234 It extracts all consistent phrase pairs from word-aligned bitext (Koehn et al. , 2003).
W07-0403 118 133:234 4.1 Translation Modeling We can test our models utility for translation by transforming its parameters into a phrase table for the phrasal decoder Pharaoh (Koehn et al. , 2003).
W07-0403 119 31:234 The grow-diag-final (GDF) combination heuristic (Koehn et al. , 2003) adds links so that each new link connects a previously unlinked token.
W07-0403 120 19:234 2 Background 2.1 Phrase Table Extraction Phrasal decoders require a phrase table (Koehn et al. , 2003), which contains bilingual phrase pairs and 17 scores indicating their utility.
W07-0403 121 136:234 Pharaoh also includes lexical weighting parameters that are derived from the alignments used to induce its phrase pairs (Koehn et al. , 2003).
W05-0823 122 4:86 1 Introduction During the last decade, statistical machine translation (SMT) systems have evolved from the original word-based approach (Brown et al. , 1993) into phrase-based translation systems (Koehn et al. , 2003).
D08-1089 123 7:184 1 Introduction Statistical phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) have consistently delivered state-of-the-art performance in recent machine translation evaluations, yet these systems remain weak at handling word order changes.
P06-1123 124 14:221 is relevant to finite-state phrase-based models that use no parse trees (Koehn et al. , 2003), tree-tostring models that rely on one parse tree (Yamada and Knight, 2001), and tree-to-tree models that rely on two parse trees (Groves et al. , 2004, e.g.).
W08-0406 125 8:250 1 Introduction The emergence of phrase-based statistical machine translation (PSMT) (Koehn et al., 2003) has been one of the major developments in statistical approaches to translation.
W05-0826 126 9:103 See (Och and Ney, 2000), (Yamada and Knight, 2001), (Koehn and Knight, 2002), (Koehn et al. , 2003), (Schafer and Yarowsky, 2003) and (Gildea, 2003).
N06-1013 127 148:176 Based on the observations in (Koehn et al. , 2003), we also limited the phrase length to 3 for computational reasons.
N06-1013 128 67:176 For comparison purposes, three additional heuristically-induced alignments are generated for each system: (1) Intersection of both directions (Aligner(int)); (2) Union of both directions (Aligner(union)); and (3) The previously bestknown heuristic combination approach called growdiag-final (Koehn et al. , 2003) (Aligner(gdf)).
N06-1013 129 7:176 1 Introduction Word alignmentdetection of corresponding words between two sentences that are translations of each otheris usually an intermediate step of statistical machine translation (MT) (Brown et al. , 1993; Och and Ney, 2003; Koehn et al. , 2003), but also has been shown useful for other applications such as construction of bilingual lexicons, word-sense disambiguation, projection of resources, and crosslanguage information retrieval.
W08-0316 130 40:76 After unioning the Viterbi alignments, the stems were replaced with their original words, and phrase-pairs of up to five foreign words in length were extracted in the usual fashion (Koehn et al., 2003).
W08-0318 131 7:88 Foralllanguagepairs,weusedtheMosesdecoder (Koehnetal.,2007), whichfollowsthephrase-based statistical machine translation approach (Koehn et al., 2003), with default settings as a starting point.
W06-3115 132 18:84 In a phrase-based statistical translation (Koehn et al. , 2003), a bilingual text is decomposed as K phrase translation pairs (e1, fa1), (e2, fa2 ),: The input foreign sentence is segmented into phrases f K1, 122 mapped into corresponding English eK1, then, reordered to form the output English sentence according to a phrase alignment index mapping a. In a hierarchical phrase-based translation (Chiang, 2005), translation is modeled after a weighted synchronous-CFG consisting of production rules whose right-hand side is paired (Aho and Ullman, 1969): X ,, where X is a non-terminal, and are strings of terminals and non-terminals.
W06-3115 133 24:84 Second, phrase translation pairs are extracted from the word aligned corpus (Koehn et al. , 2003).
W06-3115 134 28:84 The decoding process is very similar to those described in (Koehn et al. , 2003): It starts from an initial empty hypothesis.
W06-3115 135 7:84 One is a phrase-based translation in which a phrasal unit is employed for translation (Koehn et al. , 2003).
W06-3115 136 50:84 For each differently tokenized corpus, we computed word alignments by a HMM translation model (Och and Ney, 2003) and by a word alignment refinement heuristic of grow-diagfinal (Koehn et al. , 2003).
W06-3115 137 34:84 2.3 Feature Functions Our phrase-based model uses a standard pharaoh feature functions listed as follows (Koehn et al. , 2003): Relative-count based phrase translation probabilities in both directions.
P08-1115 138 73:179 Models that support non-monotonic decoding generally include a distortion cost, such as|aibi11|where ai is the starting position of the foreign phrasefi andbi1 is the ending position of phrase fi1 (Koehn et al., 2003).
C08-1138 139 14:198 Based on these grammars, a great number of SMT models have been recently proposed, including string-to-string model (Synchronous FSG) (Brown et al., 1993; Koehn et al., 2003), tree-to-string model (TSG-string) (Huang et al., 2006; Liu et al., 2006; Liu et al., 2007), string-totree model (string-CFG/TSG) (Yamada and Knight, 2001; Galley et al., 2006; Marcu et al., 2006), tree-to-tree model (Synchronous CFG/TSG, Data-Oriented Translation) (Chiang, 2005; Cowan et al., 2006; Eisner, 2003; Ding and Palmer, 2005; Zhang et al., 2007; Bod, 2007; Quirk wt al., 2005; Poutsma, 2000; Hearne and Way, 2003) and so on.
W08-0336 140 45:196 2.2 Phrase-based Chinese-to-English MT The MT system used in this paper is Moses, a stateof-the-art phrase-based system (Koehn et al., 2003).
D07-1030 141 34:173 SMT has evolved from the original word-based approach (Brown et al. , 1993) into phrase-based approaches (Koehn et al. , 2003; Och and Ney, 2004) and syntax-based approaches (Wu, 1997; Alshawi et al. , 2000; Yamada and Knignt, 2001; Chiang, 2005).
D07-1030 142 61:173 3.1 Phrase-Based Models According to the translation model presented in (Koehn et al. , 2003), given a source sentence f, the best target translation can be obtained using the following model best e 288 )( )()(maxarg )(maxarg | | e e e eef fee length LM best pp p = = (1) Where the translation model can be decomposed into )( | efp = = I i i iii i i II aefpbadef efp 1 1 1 1 ),|()()|( )|( w (2) Where )|( i i ef is the phrase translation probability.
D08-1059 143 21:188 Beam-search has been successful in many NLP tasks (Koehn et al., 2003; 562 Inputs: training examples (xi,yi) Initialization: set vectorw = 0 Algorithm: // R training iterations; N examples for t = 1R, i = 1N: zi = argmaxyGEN(xi) (y) vectorw if zi negationslash= yi: vectorw = vectorw + (yi)(zi) Outputs: vectorw Figure 1: The perceptron learning algorithm Collins and Roark, 2004), and can achieve accuracy that is close to exact inference.
N07-1007 144 8:188 Most stateof-the-art SMT systems treat grammatical elements in exactly the same way as content words, and rely on general-purpose phrasal translations and target language models to generate these elements (e.g. , Och and Ney, 2002; Koehn et al. , 2003; Quirk et al. , 2005; Chiang, 2005; Galley et al. , 2006).
P06-1066 145 10:243 One is distortion model (Och and Ney, 2004; Koehn et al. , 2003) which penalizes translations according to their jump distance instead of their content.
P07-2045 146 11:103 1 Motivation Phrase-based statistical machine translation (Koehn et al. 2003) has emerged as the dominant paradigm in machine translation research.
W07-0731 147 66:149 As in phrasebased translation model estimation, ? also contains two lexical weights (Koehn et al. , 2003), counters for number of target terminals generated.
W07-0731 148 85:149 Table 1 shows the impact of increasing reordering window length (Koehn et al. , 2003) on translation quality for the ?dev06??data.2 Increasing the reordering window past 2 has minimal impact on translation quality, implying that most of the reordering effects across Spanish and English are well modeled at the local or phrase level.
N06-1031 149 4:157 1 Introduction Recent work in statistical machine translation (MT) has sought to overcome the limitations of phrasebased models (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004) by making use of syntactic information.
P08-1114 150 33:172 Rules have the form X e, f, where e and f are phrases containing terminal symbols (words) and possibly co-indexed instances of the nonterminal symbol X.2 Associated with each rule is a set of translation model features, i( f, e); for example, one intuitively natural feature of a rule is the phrase translation (log-)probability ( f, e) = log p(e| f) , directly analogous to the corresponding feature in non-hierarchical phrase-based models like Pharaoh (Koehn et al., 2003).
P08-1114 151 34:172 In addition to this phrase translation probability feature, Hieros feature set includes the inverse phrase translation probability log p( f|e), lexical weights lexwt( f|e) and lexwt(e| f), which are estimates of translation quality based on word-level correspondences (Koehn et al., 2003), and a rule penalty allowing the model to learn a preference for longer or shorter derivations; see (Chiang, 2007) for details.
P08-1011 152 97:191 We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair.
P08-1011 153 18:191 In most statistical machine translation (SMT) models (Och et al., 2004; Koehn et al., 2003; Chiang, 2005), some of measure words can be generated without modification or additional processing.
W08-0306 154 9:125 GIZA++ refined alignments have been used in state-of-the-art phrase-based statistical MT systems such as (Och, 2004); variations on the refined heuristic have been used by (Koehn et al., 2003) (diag and diag-and) and by the phrase-based system Moses (grow-diag-final) (Koehn et al., 2007).
P06-1122 155 122:211 4 Experiments Phrase-based SMT systems have been shown to outperform word-based approaches (Koehn et al. , 2003).
P06-1122 156 124:211 4.1 Applications to phrase-based SMT Aphrase-basedtranslationmodelcanbeestimated in two stages: first a parallel corpus is aligned at the word-level and then phrase pairs are extracted (Koehn et al. , 2003).
W06-1606 157 4:175 1 Introduction During the last four years, various implementations and extentions to phrase-based statistical models (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004) have led to significant increases in machine translation accuracy.
W06-3120 158 40:90 The huge increase in computational and storage cost of including longer phrases does not provide a signi cant improvement in quality (Koehn et al. , 2003) as the probability of reappearance of larger phrases decreases.
W05-0833 159 77:152 Accordingly, in this section we describe a set of experiments which extends the work of (Way and Gough, 2005) by evaluating the Marker-based EBMT system of (Gough & Way, 2004b) against a phrase-based SMT system built using the following components: Giza++, to extract the word-level correspondences; The Giza++ word alignments are then refined and used to extract phrasal alignments ((Och & Ney, 2003); or (Koehn et al. , 2003) for a more recent implementation); Probabilities of the extracted phrases are calculated from relative frequencies; The resulting phrase translation table is passed to the Pharaoh phrase-based SMT decoder which along with SRI language modelling toolkit5 performs translation.
W05-0833 160 47:152 (Koehn et al. , 2003); (Och, 2003)).
W07-0711 161 7:235 1 Introduction Word alignment is an important step of most modern approaches to statistical machine translation (Koehn et al. , 2003).
W07-0709 162 117:139 5.1 The baseline System used for comparison was Pharaoh (Koehn et al. , 2003; Koehn, 2004), which uses a beam search algorithm for decoding.
P06-2101 163 139:219 791 and score the alignment template models phrases (Koehn et al. , 2003).
P07-1092 164 108:201 The translation models and lexical scores were estimated on the training corpus whichwasautomaticallyalignedusingGiza++(Och et al. , 1999) in both directions between source and target and symmetrised using the growing heuristic (Koehn et al. , 2003).
P07-1092 165 111:201 This represents the translation probability of a phrase when it is decomposed into a series of independent word-for-word translation steps (Koehn et al. , 2003), and has proven a very effective feature (Zens and Ney, 2004; Foster et al. , 2006).
P07-1092 166 8:201 1 Introduction Statistical machine translation (Brown et al. , 1993) has seen many improvements in recent years, most notably the transition from wordto phrase-based models (Koehn et al. , 2003).
P07-1092 167 17:201 As with conventional smoothing methods (Koehn et al. , 2003; Foster et al. , 2006), triangulation increases the robustness of phrase translation estimates.
W08-0405 168 25:208 2 Baseline DP Decoder The translation model used in this paper is a phrasebased model (Koehn et al., 2003), where the translation units are so-called blocks: a block b is a pair consisting of a source phrase s and a target phrase t which are translations of each other.
P07-1005 169 23:177 Recently, Cabezas and Resnik (2005) experimented with incorporating WSD translations into Pharaoh, a state-of-the-art phrase-based MT system (Koehn et al. , 2003).
P07-1005 170 12:177 To perform translation, state-of-the-art MT systems use a statistical phrase-based approach (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004) by treating phrases as the basic units of translation.
N06-1004 171 8:208 1 Introduction: Defining SCMs The work presented here was done in the context of phrase-based MT (Koehn et al. , 2003; Och and Ney, 2004).
N06-1004 172 163:208 Phrase tables were learned from the training corpus using the diag-and method (Koehn et al. , 2003), and using IBM model 2 to produce initial word alignments (these authors found this worked as well as IBM4).
C04-1090 173 16:186 However, (Koehn et al 2003) found that it is actually harmful to restrict phrases to constituents in parse trees, because the restriction would cause the system to miss many reliable translations, such as the correspondence between there is in English and es gibt (it gives) in German.
P06-2005 174 16:209 The normalization is visualized as a translation problem where messages in the SMS language are to be translated to normal English using a similar phrase-based statistical MT method (Koehn et al. , 2003).
P06-1077 175 111:252 5.1 Pharaoh The baseline system we used for comparison was Pharaoh (Koehn et al. , 2003; Koehn, 2004), a freely available decoder for phrase-based translation models: p(e|f) = p(f|e) pLM(e)LM pD(e,f)D length(e)W(e) (10) We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions using its default setting, and then applied the refinement rule diagand described in (Koehn et al. , 2003) to obtain a single many-to-many word alignment for each sentence pair.
P06-1077 176 65:252 h1(eI1,fJ1 ) = log Kproductdisplay k=1 N(z)(T(z), Tk) N(T(z)) h2(eI1,fJ1 ) = log Kproductdisplay k=1 N(z)(T(z), Tk) N(S(z)) h3(eI1,fJ1 ) = log Kproductdisplay k=1 lex(T(z)|S(z))(T(z), Tk) h4(eI1,fJ1 ) = log Kproductdisplay k=1 lex(S(z)|T(z))(T(z), Tk) h5(eI1,fJ1 ) = K h6(eI1,fJ1 ) = log Iproductdisplay i=1 p(ei|ei2,ei1) h7(eI1,fJ1 ) = I 4When computing lexical weighting features (Koehn et al. , 2003), we take only terminals into account.
P06-1077 177 7:252 1 Introduction Phrase-based translation models (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004), which go beyond the original IBM translation models (Brown et al. , 1993) 1 by modeling translations of phrases rather than individual words, have been suggested to be the state-of-theart in statistical machine translation by empirical evaluations.
N07-2007 178 89:120 The baseline we measure against in all of these experiments is the state-of-the-art grow-diag-final (gdf ) alignment refinement heuristic commonly used in phrase-based SMT (Koehn et al. , 2003).
N07-2053 179 7:103 They provide pairs of phrases that are used to construct a large set of potential translations for each input sentence, along with feature values associated with each phrase pair that are used to select the best translation from this set.1 The most widely used method for building phrase translation tables (Koehn et al. , 2003) selects, from a word alignment of a parallel bilingual training corpus, all pairs of phrases (up to a given length) that are consistent with the alignment.
N07-2053 180 36:103 (2006), modified from (Koehn et al. , 2003), which is an average of pairwise word translation probabilities.
W08-0301 181 109:195 The subsequent construction of translation table was done in exactly the same way as explained 4 in (Koehn et al., 2003).
W08-0301 182 103:195 4 Experiments 4.1 Experiment Settings A series of experiments were run to compare the performance of the three SWD models against the baseline, which is the standard phrase-based approach to SMT as elaborated in (Koehn et al., 2003).
D07-1006 183 163:193 (Och and Ney, 2003) invented heuristic symmetriza57 FRENCH/ENGLISH ARABIC/ENGLISH SYSTEM F-MEASURE ( = 0.4) BLEU F-MEASURE ( = 0.1) BLEU GIZA++ 73.5 30.63 75.8 51.55 (FRASER AND MARCU, 2006B) 74.1 31.40 79.1 52.89 LEAF UNSUPERVISED 74.5 72.3 LEAF SEMI-SUPERVISED 76.3 31.86 84.5 54.34 Table 3: Experimental Results tion of the output of a 1-to-N model and a M-to-1 model resulting in a M-to-N alignment, this was extended in (Koehn et al. , 2003).
D07-1006 184 104:193 This operation does not change the collection of phrases or rules extracted from a hypothesized alignment, see, for instance, (Koehn et al. , 2003).
D07-1006 185 128:193 For French/English translation we use a state of the art phrase-based MT system similar to (Och and Ney, 2004; Koehn et al. , 2003).
N04-1035 186 16:232 Along this line, (Koehn et al. , 2003) present convincing evidence that restricting phrasal translation to syntactic constituents yields poor translation performance the ability to translate nonconstituent phrases (such as there are, note that, and according to) turns out to be critical and pervasive.
P08-2041 187 6:104 1 Introduction Currently, most of the phrase-based statistical machine translation (PBSMT) models (Marcu and Wong, 2002; Koehn et al., 2003) adopt full matching strategy for phrase translation, which means that a phrase pair (tildewidef,tildewidee) can be used for translating a source phrase f, only if tildewidef = f. Due to lack of generalization ability, the full matching strategy has some limitations.
I08-2088 188 68:145 3.2.2 Features We used eight features (Och and Ney, 2003; Koehn et al., 2003) and their weights for the translations.
I08-2088 189 67:145 We used the preprocessed data to train the phrase-based translation model by using GIZA++ (Och and Ney, 2003) and the Pharaoh tool kit (Koehn et al., 2003).
W07-0409 190 7:156 1 Introduction Recent works in statistical machine translation (SMT) shows how phrase-based modeling (Och and Ney, 2000a; Koehn et al. , 2003) significantly outperform the historical word-based modeling (Brown et al. , 1993).
W08-0305 191 26:200 The de-facto answer came during the 1990s from the research community on Statistical Machine Translation, who made use of statistical tools based on a noisy channel model originally developed for speech recognition (Brown et al., 1994; Och and Weber, 1998; R.Zens et al., 2002; Och and Ney, 2001; Koehn et al., 2003).
W08-0326 192 20:80 For example, our system configuration for the shared task incorporates a wrapper around GIZA++ (Och and Ney, 2003) for word alignment and a wrapper around Moses (Koehn et al., 2007) for decoding.
W06-3106 193 7:161 It has the advantage of naturally capturing local reorderings and is shown to outperform word-based machine translation (Koehn et al. , 2003).
W06-3106 194 32:161 This includes the standard notion of phrase, popular with phrasedbased SMT (Koehn et al. , 2003; Vogel et al. , 2003) aswellassequencesofwordsthatcontaingaps(possibly of arbitrary size).
W06-3106 195 128:161 PP-model WecollectedthePPparametersbysimply reading the alignment matrices resulting from the word alignment, in a way similar to the one described in (Koehn et al. , 2003).
C08-1064 196 136:260 Sum of logarithms of source-to-target lexical weighting (Koehn et al., 2003).
C08-1064 197 35:260 Our baseline uses Giza++ alignments (Och and Ney, 2003) symmetrized with the grow-diag-final-and heuristic (Koehn et al., 2003).
C08-1064 198 29:260 It compares favorably 505 with conventional phrase-based translation (Koehn et al., 2003) on Chinese-English news translation (Chiang, 2007).
C08-1064 199 173:260 Our results are similar to those for conventional phrase-based models (Koehn et al., 2003; Zens and Ney, 2007).
C08-1064 200 162:260 4.3 Relaxing Length Restrictions Increasing the maximum phrase length in standard phrase-based translation does not improve BLEU (Koehn et al., 2003; Zens and Ney, 2007).
C08-1064 201 116:260 Except where noted, each system was trained on 27 million words of newswire data, aligned with GIZA++ (Och and Ney, 2003) and symmetrized with the grow-diag-final-and heuristic (Koehn et al., 2003).
H05-1096 202 29:156 Nowadays, most of the state-of-the-art SMT systems are based on bilingual phrases (Bertoldi et al. , 2004; Koehn et al. , 2003; Och and Ney, 2004; Tillmann, 2003; Vogel et al. , 2004; Zens and Ney, 2004).
N07-1022 203 128:209 Like WASP1, the phrase extraction algorithm of PHARAOH is based on the output of a word alignment model such as GIZA++ (Koehn et al. , 2003), which performs poorly when applied directly to MRLs (Section 3.2).
N07-1022 204 10:209 In this paper we present results on using a recent phrase-based SMT system, PHARAOH (Koehn et al. , 2003), for NLG.1 Although moderately effec1We also tried IBM Model 4/REWRITE (Germann, 2003), a word-based SMT system, but it gave much worse results.
N07-1022 205 135:209 Toremedythis situation, we can borrow the probabilistic model of PHARAOH, and define the parsing model as: Pr(d|e(d)) = productdisplay dd w(r(d)) (4) which is the product of the weights of the rules used in a derivation d. The rule weight, w(X ,), is in turn defined as: P(|)1P(|)2Pw(|)3Pw(|)4 exp(||)5 where P(|) and P(|) are the relative frequencies of and , and Pw(|) and Pw(|) are 176 the lexical weights (Koehn et al. , 2003).
N07-1022 206 58:209 These rules are learned using a word alignment model, which finds an optimal mapping from words to MR predicates given a set of training sentences and their correct MRs. Word alignment models have been widely used for lexical acquisition in SMT (Brown et al. , 1993; Koehn et al. , 2003).
N07-1022 207 140:209 Following the phrase extraction phase in PHARAOH, we eliminate word gaps by incorporating unaligned words as part of the extracted NL phrases (Koehn et al. , 2003).
N07-1022 208 37:209 3.1 Generation using PHARAOH PHARAOH (Koehn et al. , 2003) is an SMT system that uses phrases as basic translation units.
C08-2005 209 4:93 1 Introduction In phrase-based statistical machine translation (Koehn et al., 2003) phrases extracted from word-aligned parallel data are the fundamental unit of translation.
P04-1060 210 36:158 (Koehn et al. , 2003) show that exploiting all contiguous word blocks in phrase-based alignment is better than focusing on syntactic constituents only.
P04-1060 211 22:158 We use the Europarl corpus (Koehn, 2002), and the statistical word alignment was performed with the GIZA++ toolkit (Al-Onaizan et al. , 1999; Och and Ney, 2003).1 For the current experiments we assume no preexisting parser for any of the languages, contrary to the information projection scenario.
P04-1060 212 100:158 For each span in the chart, we get a weight factor that is multiplied with the parameter-based expectations.9 4 Experiments We applied GIZA++ (Al-Onaizan et al. , 1999; Och and Ney, 2003) to word-align parts of the Europarl corpus (Koehn, 2002) for English and all other 10 languages.
C08-1127 213 23:196 With these linguistic annotations, we expect the LABTG to address two traditional issues of standard phrase-based SMT (Koehn et al., 2003) in a more effective manner.
C08-1127 214 31:196 2 Related Work There have been various efforts to integrate linguistic knowledge into SMT systems, either from the target side (Marcu et al., 2006; Hassan et al., 2007; Zollmann and Venugopal, 2006), the source side (Quirk et al., 2005; Liu et al., 2006; Huang et al., 2006) or both sides (Eisner, 2003; Ding et al., 2005; Koehn and Hoang, 2007), just to name a few.
C08-1127 215 72:196 Firstly, we run GIZA++ (Och and Ney, 2000) on the training corpus in both directions and then apply the ogrow-diag-finalprefinement rule (Koehn et al., 2003) to obtain many-to-many word alignments.
I08-1067 216 28:124 Phrases extracted using these heuristics are also shown to perform better than syntactically motivated phrases, the joint model, and IBM model 4 (Koehn et al., 2003).
I08-1067 217 52:124 The phrase translation table is learnt in the following manner: The parallel corpus is word-aligned bidirectionally, and using various heuristics (see (Koehn et al., 2003) for details) phrase correspondences are established.
W07-0701 218 9:168 1 Introduction Modern phrasal SMT systems such as (Koehn et al. , 2003) derive much of their power from being able to memorize and use long phrases.
W07-0701 219 124:168 We compared our system to Pharaoh, a leading phrasal SMT decoder (Koehn et al. , 2003), and our treelet system.
D08-1010 220 133:200 Thenthewordalignment is refined by performing grow-diag-final method (Koehn et al., 2003).
W08-0302 221 8:197 Phrase-based MT systems are straightforward to train from parallel corpora (Koehn et al., 2003) and, like the original IBM models (Brown et al., 1990), benefit from standard language models built on large monolingual, target-language corpora (Brants et al., 2007).
W08-0302 222 114:197 Baseline We use the Moses MT system (Koehn et al., 2007) as a baseline and closely follow the example training procedure given for the WMT-07 and WMT-08 shared tasks.4 In particular, we perform word alignment in each direction using GIZA++ (Och and Ney, 2003), apply the grow-diag-finaland heuristic for symmetrization and use a maximum phrase length of 7.
W08-0302 223 12:197 (2003), in which we translate a source-language sentence f into the target-language sentence e that maximizes a linear combination of features and weights:1 e,a = argmax e,a score(e,a,f) (1) = argmax e,a Msummationdisplay m=1 mhm(e,a,f) (2) where a represents the segmentation of e and f into phrases and a correspondence between phrases, and each hm is a R-valued feature with learned weight m. The translation is typically found using beam search (Koehn et al., 2003).
H05-1022 224 124:196 5 Phrase Pair Induction A common approach to phrase-based translation is to extract an inventory of phrase pairs (PPI) from bitext (Koehn et al. , 2003), For example, in the phraseextract algorithm (Och, 2002), a word alignment am1 is generated over the bitext, and all word subsequences ei2i1 and fj2j1 are found that satisfy : am1 : aj [i1,i2] iff j [j1,j2] .
P04-1023 225 115:168 The phrase-based decoder extracts phrases from the word alignments produced by GIZA++, and computes translation probabilities based on the frequency of one phrase being aligned with another (Koehn et al. , 2003).
E06-2002 226 13:77 By introducing the hidden word alignment variable a, the following approximate optimization criterion can be applied for that purpose: e = argmaxe Pr(e | f) = argmaxe summationdisplay a Pr(e,a | f) argmaxe,a Pr(e,a | f) Exploiting the maximum entropy (Berger et al. , 1996) framework, the conditional distribution Pr(e,a | f) can be determined through suitable real valued functions (called features) hr(e,f,a),r = 1R, and takes the parametric form: p(e,a | f) exp Rsummationdisplay r=1 rhr(e,f,a)} The ITC-irst system (Chen et al. , 2005) is based on a log-linear model which extends the original IBM Model 4 (Brown et al. , 1993) to phrases (Koehn et al. , 2003; Federico and Bertoldi, 2005).
P05-1074 227 21:147 Our method for identifying paraphrases is an extension of recent work in phrase-based statistical machine translation (Koehn et al. , 2003).
P05-1074 228 46:147 Koehn (2004), Tillmann (2003), and Vogel et al.
N07-1061 229 28:313 2 Phrase-based SMT We use a phrase-based SMT system, Pharaoh, (Koehn et al. , 2003; Koehn, 2004), which is based on a log-linear formulation (Och and Ney, 2002).
N07-1061 230 35:313 For details on these feature functions, please refer to (Koehn et al. , 2003; Koehn, 2004; Koehn et al. , 2005).
N07-1061 231 53:313 The definitions of the phrase and lexical translation probabilities are as follows (Koehn et al. , 2003).
N07-1061 232 43:313 That is, phrases are heuristically extracted from word-level alignments produced by doing GIZA++ training on the corresponding parallel corpora (Koehn et al. , 2003).
P08-1024 233 26:227 (Koehn et al., 2003).
P08-1024 234 114:227 The standard solution is to approximate the maximum probability translation using a single derivation (Koehn et al., 2003).
C08-1027 235 7:249 1 Introduction The emergence of phrase-based statistical machine translation (PSMT) (Koehn et al., 2003) has been one of the major developments in statistical approaches to translation.
W05-0836 236 36:153 Under a phrase based translation model (Koehn et al. , 2003; Marcu and Wong, 2002), this distinction is important and will be discussed in more detail.
W05-0836 237 92:153 For further information on these parameter settings, confer (Koehn et al. , 2003).
W05-0836 238 89:153 The first system is the Pharaoh decoder provided by (Koehn et al. , 2003) for the shared data task.
P06-1098 239 13:74 A phrase-based translation model is one of the modern approaches which exploits a phrase, a contiguous sequence of words, as a unit of translation (Koehn et al. , 2003; Zens and Ney, 2003; Tillman, 2004).
P06-1098 240 65:74 Many-to-many word alignments are induced by running a one-to-many word alignment model, such as GIZA++ (Och and Ney, 2003), in both directions and by combining the results based on a heuristic (Koehn et al. , 2003).
P06-1098 241 66:74 Second, phrase translation pairs are extracted from the word alignment corpus (Koehn et al. , 2003).
W06-3125 242 6:145 This translation model differs from the well known phrase-based translation approach (Koehn et al. , 2003) in two basic issues: rst, training data is monotonously segmented into bilingual units; and second, the model considers n-gram probabilities instead of relative frequencies.
P06-1139 243 109:231 Automatic Creation of WIDL-expressions for MT. We generate WIDL-expressions from Chinese strings by exploiting a phrase-based translation table (Koehn et al. , 2003).
P06-1139 244 126:231 When evaluated against the state-of-the-art, phrase-based decoder Pharaoh (Koehn, 2004), using the same experimental conditions translation table trained on the FBIS corpus (7.2M Chinese words and 9.2M English words of parallel text), trigram language model trained on 155M words of English newswire, interpolation weights a65 (Equation 2) trained using discriminative training (Och, 2003) (on the 2002 NIST MT evaluation set), probabilistic beam a90 set to 0.01, histogram beam a58 set to 10 and BLEU (Papineni et al. , 2002) as our metric, the WIDL-NGLM-Aa86 a129 algorithm produces translations that have a BLEU score of 0.2570, while Pharaoh translations have a BLEU score of 0.2635.
H05-1009 245 123:201 We computed precision, recall and error rate on the entire set of sentence pairs for each data set.5 To evaluate NeurAlign, we used GIZA++ in both directions (E-to-F and F-to-E, where F is either Chinese (C) or Spanish (S)) as input and a refined alignment approach (Och and Ney, 2000) that uses a heuristic combination method called grow-diagfinal (Koehn et al. , 2003) for comparison.
W08-0403 246 75:207 Our baseline model follows Chiangs hierarchical model (Chiang, 2007) in conjunction with additional features: conditional probabilities in both directions: P(|) and P(|); lexical weights (Koehn et al., 2003) in both directions: Pw(|) and Pw(|); 21 word counts |e|; rule counts |D|; target n-gram language model PLM(e); glue rule penalty to learn preference of nonterminal rewriting over serial combination through Eq.
D07-1104 247 283:433 We symmetrized bidirectional alignments using the growdiag-final heuristic (Koehn et al. , 2003).
D07-1104 248 21:433 So far, these techniques have focused on phrasebased models using contiguous phrases (Koehn et al. , 2003; Och and Ney, 2004).
P06-1009 249 11:213 Most current SMT systems (Och and Ney, 2004; Koehn et al. , 2003) use a generative model for word alignment such as the freely available GIZA++ (Och and Ney, 2003), an implementation of the IBM alignment models (Brown et al. , 1993).
D07-1080 250 41:227 Such a quasi-syntactic structure can naturally capture the reordering of phrases that is not directly modeled by a conventional phrase-based approach (Koehn et al. , 2003).
D07-1080 251 6:227 1 Introduction The recent advances in statistical machine translation have been achieved by discriminatively training a small number of real-valued features based either on (hierarchical) phrase-based translation (Och and Ney, 2004; Koehn et al. , 2003; Chiang, 2005) or syntax-based translation (Galley et al. , 2006).
D07-1080 252 150:227 Second, the word alignment is refined by a grow-diag-final heuristic (Koehn et al. , 2003).
P08-1009 253 19:223 Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation.
P08-1009 254 56:223 Restricting phrases to syntactic constituents has been shown to harm performance (Koehn et al., 2003), so we tighten our definition of a violation to disregard cases where the only point of overlap is obscured by our phrasal resolution.
P08-1009 255 27:223 Early experiments with syntactically-informed phrases (Koehn et al., 2003), and syntactic reranking of K-best lists (Och et al., 2004) produced mostly negative results.
W06-3113 256 9:151 The current state of the art is represented by the so-called phrase-based translation approach (Och and Ney, 2004; Koehn et al. , 2003).
C08-1041 257 127:197 Then the word alignment is refined by performing growdiag-final method (Koehn et al., 2003).
P08-1116 258 129:268 Given phrase p1 and its paraphrase p2, we compute Score3(p1,p2) by relative frequency (Koehn et al., 2003): Score3(p1,p2) = p(p2|p1) = count(p2,p1)P pprime count(pprime,p1) (7) People may wonder why we do not use the same method on the monolingual parallel and comparable corpora.
W08-0303 259 162:239 For the first two tasks, all heuristics of the Pharaoh-Toolkit (Koehn et al., 2003) as well as the refined heuristic (Och and Ney, 2003) to combine both IBM4-alignments were tested and the best ones are shown in the tables.
P05-2016 260 20:31 Statistical Phrase-based Translation (Koehn et al. , 2003): Here phrase-based means subsequence-based, as there is no guarantee that the phrases learned by the model will have any relation to what we would think of as syntactic phrases.
D07-1036 261 130:249 In training process, we use GIZA++ 4 toolkit for word alignment in both translation directions, and apply grow-diag-final method to refine it (Koehn et al. , 2003).
W07-0724 262 16:91 They are generated from the training corpus via the ?diag-and??method (Koehn et al. , 2003) and smoothed using Kneser-Ney smoothing (Foster et al. , 2006), ??one or several n-gram language model(s) trained with the SRILM toolkit (Stolcke, 2002); in the baseline experiments reported here, we used a trigram model, ??a distortion model which assigns a penalty based on the number of source words which are skipped when generating a new target phrase, ??a word penalty.
D08-1078 263 132:241 The automatic alignments were extracted by appending the manually aligned sentences on to the respective Europarl v3 corpora and aligning them using GIZA++ (Och and Ney, 2003) and the growfinal-diag algorithm (Koehn et al., 2003).
P06-1096 264 199:222 The discrepancy between DEV performance and TEST performance is due to temporal distance from TRAIN and high variance in BLEU score.11 We also compared our model with Pharaoh (Koehn et al. , 2003).
P06-1096 265 78:222 At the end we ran our models once on TEST to get final numbers.2 4 Models Our experiments used phrase-based models (Koehn et al. , 2003), which require a translation table and language model for decoding and feature computation.
P06-1096 266 185:222 The process of phrase extraction is difficult to optimize in a non-discriminative setting: many heuristics have been proposed (Koehn et al. , 2003), but it is not obvious which one should be chosen for a given language pair.
P06-1096 267 90:222 In the future, we plan to explore our discriminative framework on a full distortion model (Koehn et al. , 2003) or even a hierarchical model (Chiang, 2005).
D08-1066 268 54:243 These heuristics define a phrase pair to consist of a source and target ngrams of a word-aligned source-target sentence pair such that if one end of an alignment is in the one ngram, the other end is in the other ngram (and there is at least one such alignment) (Och and Ney, 2004; Koehn et al., 2003).
D08-1066 269 198:243 From this aligned training corpus, we extract the phrase pairs according to the heuristics in (Koehn et al., 2003).
D08-1066 270 8:243 The pervading method for estimating these probabilities is a simple heuristic based on the relative frequency of the phrase pair in the multi-set of the phrase pairs extracted from the word-aligned corpus (Koehn et al., 2003).
D08-1066 271 7:243 1 Motivation A major component in phrase-based statistical Machine translation (PBSMT) (Zens et al., 2002; Koehn et al., 2003) is the table of conditional probabilities of phrase translation pairs.
D08-1066 272 53:243 (Koehn et al., 2003; Och and Ney, 2004)).
P05-1066 273 29:229 Results using the method show an improvement from 25.2% Bleu score to 26.8% Bleu score (a statistically significant improvement), using a phrase-based system (Koehn et al. , 2003) which has been shown in the past to be a highly competitive SMT system.
P05-1066 274 46:229 Reranking methods have also been proposed as a method for using syntactic information (Koehn and Knight, 2003; Och et al. , 2004; Shen et al. , 2004).
P05-1066 275 8:229 1 Introduction Recent research on statistical machine translation (SMT) has lead to the development of phrasebased systems (Och et al. , 1999; Marcu and Wong, 2002; Koehn et al. , 2003).
P05-1066 276 31:229 More recently, phrase-based models (Och et al. , 1999; Marcu and Wong, 2002; Koehn et al. , 2003) have been proposed as a highly successful alternative to the IBM models.
P05-1066 277 142:229 Our baseline is the phrase-based MT system of (Koehn et al. , 2003).
P05-1066 278 104:229 In experiments with the system of (Koehn et al. , 2003) we have found that in practice a large number of complete translations are completely monotonic (i.e. , have a0 skips), suggesting that the system has difficulty learning exactly what points in the translation should allow reordering.
P05-1066 279 34:229 In this paper we use the phrase-based system of (Koehn et al. , 2003) as our underlying model.
W08-0335 280 67:228 The training and decoding system of our SMT used the publicly available Pharaoh (Koehn et al., 2003)2.
C08-2032 281 23:70 Let us suppose that we have two bilingual lexicons L f L p and L p L e . We obtain word alignments of these lexicons by applying GIZA++ (Och and Ney, 2003), and grow-diag-final heuristics (Koehn et al., 2007).
C08-2032 282 15:70 This paper proposes a method for building a bilingual lexicon through a pivot language by using phrase-based statistical machine translation (SMT) (Koehn et al., 2003).
W08-0314 283 16:100 3 System Overview 3.1 Translation model The system developed for this years shared task is a state-of-the-art, two-pass phrase-based statistical machine translation system based on a log-linear translation model (Koehn et al, 2003).
W05-0908 284 11:148 In the area of statistical machine translation (SMT), recently a combination of the BLEU evaluation metric (Papineni et al. , 2001) and the bootstrap method for statistical significance testing (Efron and Tibshirani, 1993) has become popular (Och, 2003; Kumar and Byrne, 2004; Koehn, 2004b; Zhang et al. , 2004).
W06-3109 285 35:141 On the other hand, models that deal with structures or phrases instead of single words have also been proposed: the syntax translation models are described in (Yamada and Knight, 2001), alignment templates are used in (Och, 2002), and the alignment template approach is re-framed into the so-called phrase based translation (PBT) in (Marcu and Wong, 2002; Zens et al. , 2002; Koehn et al. , 2003; Tomas and Casacuberta, 2003).
D08-1051 286 39:207 486 One of the most popular instantiations of loglinear models is that including phrase-based (PB) models (Zens et al., 2002; Koehn et al., 2003).
I08-8001 287 9:155 However, reordering models in traditional phrase-based systems are not sufficient to treat such complex cases when we translate long sentences (Koehn et al, 2003).
P05-1033 288 113:249 We compared a baseline system, the state-of-the-art phrase-based system Pharaoh (Koehn et al. , 2003; Koehn, 2004a), against our system.
P05-1033 289 69:249 To do this, we first identify initial phrase pairs using the same criterion as previous systems (Och and Ney, 2004; Koehn et al. , 2003): Definition 1.
P05-1033 290 26:249 When we run a phrase-based system, Pharaoh (Koehn et al. , 2003; Koehn, 2004a), on this sentence (using the experimental setup described below), we get the following phrases with translations: (4) [Aozhou] [shi] [yu] [Bei Han] [you] [bangjiao]1 [de shaoshu guojia zhiyi] [Australia] [is] [dipl.
P05-1033 291 56:249 For our experiments we used the following features, analogous to Pharaohs default feature set: P( | ) and P( | ), the latter of which is not found in the noisy-channel model, but has been previously found to be a helpful feature (Och and Ney, 2002); the lexical weights Pw( | ) and Pw( | ) (Koehn et al. , 2003), which estimate how well the words in translate the words in ;2 a phrase penalty exp(1), which allows the model to learn a preference for longer or shorter derivations, analogous to Koehns phrase penalty (Koehn, 2003).
P05-1033 292 118:249 5.1 Baseline The baseline system we used for comparison was Pharaoh (Koehn et al. , 2003; Koehn, 2004a), as publicly distributed.
P05-1033 293 20:249 Above the phrase level, these models typically have a simple distortion model that reorders phrases independently of their content (Och and Ney, 2004; Koehn et al. , 2003), or not at all (Zens and Ney, 2004; Kumar et al. , 2005).
W06-1609 294 8:175 1 Introduction During the last few years, SMT systems have evolved from the original word-based approach (Brown et al. , 1993) to phrase-based translation systems (Koehn et al. , 2003).
W07-0412 295 26:166 And again, we see this insight informing statistical machine translation systems, for instance, in the phrase-based approaches of Och (2003) and Koehn et al.
N06-1014 296 5:166 1 Introduction Word alignment is an important component of a complete statistical machine translation pipeline (Koehn et al. , 2003).
N06-1014 297 154:166 Using GIZA++ model 4 alignments and Pharaoh (Koehn et al. , 2003), we achieved a BLEU score of 0.3035.
C08-1005 298 152:188 grow-diagfinal (Koehn et al., 2003)).
W08-0322 299 10:68 Our system is actually designed as a hybrid of the classic phrase-based SMT model (Koehn et al., 2003) and the kernel regression model as follows: First, for each source sentence a small relevant set of sentence pairs are retrieved from the large-scale parallel corpus.
C08-1017 300 14:197 However, Moores Law, the driving force of change in computing since then, has opened the way for recent progress in the field, such as Statistical Machine Translation (SMT) (Koehn et al. 2003).
C08-1144 301 33:207 Phrase pairs are extracted up to a fixed maximum length, since very long phrases rarely have a tangible impact during translation (Koehn et al., 2003).
C08-1144 302 14:207 Starting with bilingualphrasepairsextractedfromautomatically aligned parallel text (Och and Ney, 2004; Koehn et al., 2003), these PSCFG approaches augment each contiguous (in source and target words) phrase pair with a left-hand-side symbol (like the VP in the example above), and perform a generalization procedure to form rules that include nonterminal symbols.
W08-1911 303 46:160 (Och and Ney, 2003)), and the phrase-based approach to Statistical Machine Translation (Koehn et al., 2003) has led to the development of heuristics for obtaining alignments between phrases of any number of words.
W08-1911 304 104:160 4 Experiments and evaluation We carried out an evaluation on the local rephrasing of French sentences, using English as the pivot language.2 We extracted phrase alignments of up to 7 word forms using the Giza++ alignment tool (Och and Ney, 2003) and the grow-diag-final-and heuristics described in (Koehn et al., 2003) on 948,507 sentences of the French-English part of the Europarl corpus (Koehn, 2005) and obtained some 42 million phrase pairs for which probabilities were estimated using maximum likelihood estimation.
W08-0404 305 84:179 Consider the lexical model pw(ry|rx), defined following Koehn et al (2003), with a denoting the most frequent word alignment observed for the rule in the training set.
W08-0404 306 113:179 by diag-and symmetrization (Koehn et al., 2003).
W07-0721 307 10:163 1 Introduction Nowadays, statistical machine translation is mainly based on phrases (Koehn et al. , 2003).
W07-0725 308 13:98 2 Architecture of the system The goal of statistical machine translation (SMT) is to produce a target sentence e from a source sentence f. It is today common practice to use phrases as translation units (Koehn et al. , 2003; Och and Ney, 2003) and a log linear framework in order to introduce several models explaining the translation process: e??= argmaxp(e|f) = argmaxe {exp(summationdisplay i ihi(e,f))} (1) The feature functions hi are the system models and the i weights are typically optimized to maximize a scoring function on a development set (Och and Ney, 2002).
H05-1021 309 140:173 Phrase-pairs are then extracted from the word alignments (Koehn et al. , 2003).
W08-0313 310 14:90 2 Architecture of the system The goal of statistical machine translation (SMT) is to produce a target sentence e from a source sentence f. It is today common practice to use phrases as translation units (Koehn et al., 2003; Och and Ney, 2003) and a log linear framework in order to introduce several models explaining the translation process: e = argmaxp(e|f) = argmaxe {exp(summationdisplay i ihi(e,f))} (1) The feature functions hi are the system models and the i weights are typically optimized to maximize a scoring function on a development set (Och and Ney, 2002).
N07-2015 311 10:110 Many research groups use a decoder based on a log-linear approach incorporating phrases as main paradigm (Koehn et al. , 2003).
D07-1079 312 9:289 Approaches include word substitution systems (Brown et al. , 1993), phrase substitution systems (Koehn et al. , 2003; Och and Ney, 2004), and synchronous context-free grammar systems (Wu and Wong, 1998; Chiang, 2005), all of which train on string pairs and seek to establish connections between source and target strings.
N06-1015 313 205:205 We view this as a particularly promising aspect of our work, given that phrase-based systems such as Pharaoh (Koehn et al. , 2003) perform better with higher recall alignments.
P08-1010 314 134:204 4.1 Training and Translation Setup Our decoder is a phrase-based multi-stack implementation of the log-linear model similar to Pharaoh (Koehn et al., 2003).
P08-1010 315 15:204 Since most phrases appear only a few times in training data, a phrase pair translation is also evaluated by lexical weights (Koehn et al., 2003) or term weighting (Zhao et al., 2004) as additional features to avoid overestimation.
P08-1010 316 12:204 The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003).
P08-1010 317 54:204 The commonly used phrase extraction approach based on word alignment heuristics (referred as ViterbiExtract algorithm for comparison in this paper) as described in (Och, 2002; Koehn et al., 2003) is a special case of the algorithm, where candidate phrase pairs are restricted to those that respect word alignment boundaries.
N07-1062 318 42:255 We have investigated this and our results are in line with (Koehn et al. , 2003) showing that the translation quality does not improve if we utilize phrases beyond a certain length.
N07-1062 319 168:255 Even a length limit of 3, as proposed by (Koehn et al. , 2003), would result in almost optimal translation quality.
W07-0406 320 12:175 However, attempts to retrofit syntactic information into the phrase-based paradigm have not met with enormous success (Koehn et al. , 2003; Och et al. , 2003)1, and purely phrase-based machine translation systems continue to outperform these syntax/phrase-based hybrids.
D08-1024 321 140:186 5.1 Experimental setup The baseline model was Hiero with the following baseline features (Chiang, 2005; Chiang, 2007): two language models phrase translation probabilities p(f | e) and p(e| f) lexical weighting in both directions (Koehn et al., 2003) word penalty penalties for: automatically extracted rules identity rules (translating a word into itself) two classes of number/name translation rules glue rules The probability features are base-100 logprobabilities.
W06-3601 322 55:298 2 Previous Work It is helpful to compare this approach with recent efforts in statistical MT. Phrase-based models (Koehn et al. , 2003; Och and Ney, 2004) are good at learning local translations that are pairs of (consecutive) sub-strings, but often insufficient in modeling the reorderings of phrases themselves, especially between language pairs with very different word-order.
W07-0715 323 27:155 2 Previous Approaches Koehn, et al.?s (2003) method of estimating phrasetranslation probabilities is very simple.
W08-0411 324 9:219 1 Introduction Phrase-based Statistical MT (PB-SMT) (Koehn et al., 2003) has become the predominant approach to Machine Translation in recent years.
Copyright © Univ. of Mich. and the CLAIR Group at the Univ. of Mich.
All information provided herein should be considered tentative and still under construction. Further analysis and correction is still being performed. Please remember that all statistics contained herein are the results of independent research and should not be considered a statement of fact regarding any of the papers, authors, or other entities they refer to.