Paper: The Mathematics Of Statistical Machine Translation: Parameter Estimation

Webmaster's Note: The whole dataset is available Here. Please download the dataset instead of crawling the website.

Basic Info:

id: J93-2003
title: The Mathematics Of Statistical Machine Translation: Parameter Estimation
authors: Brown, Peter F. (IBM T.J. Watson Research Center, Yorktown Heights NY), Della Pietra, Vincent J. (IBM T.J. Watson Research Center, Yorktown Heights NY), Della Pietra, Stephen A. (IBM T.J. Watson Research Center, Yorktown Heights NY), Mercer, Robert L. (IBM T.J. Watson Research Center, Yorktown Heights NY)
venue: CL
year: 1993
pdf: link


Abstract






Incoming Citations
IdTitle
H92-1053Dividing And Conquering Long Sentences In A Translation System
H93-1037LINGSTAT: An Interactive Machine-Aided Translation System
H93-1039But Dictionaries Are Data Too
J93-1001Introduction To The Special Issue On Computational Linguistics Using Large Corpora
P93-1002Aligning Sentences In Bilingual Corpora Using Lexical Information
W93-0301Robust Bilingual Word Alignment For Machine Aided Translation
A94-1006Termight: Identifying And Translating Technical Terminology
A94-1012Combination Of Symbolic And Statistical Approaches For Grammatical Knowledge Acquisition
C94-1014A Matching Technique In Example-Based Machine Translation
C94-2175Bilingual Text Matching Using Bilingual Dictionary And Statistics
C94-2178K-Vec: A New Approach For Aligning Parallel Texts
H94-1028The Candide System For Machine Translation
W94-0115Learning A Radically Lexical Grammar
P95-1033An Algorithm For Simultaneously Bracketing Parallel Texts By Aligning Words
P95-1034Two-Level Many-Paths Generation
W95-0106Trainable Coarse Bilingual Grammars For Parallel Text Bracketing
W95-0115Automatic Evaluation And Uniform Filter Cascades For Inducing N-Best Translation Lexicons
C96-1037Aligning More Words With High Precision For Small Bilingual Corpora
C96-1040Bilingual Knowledge Acquisition From Korean-English Parallel Corpus Using Alignment
C96-1067Word Completion - A First Step Toward Target-Text Mediated IMT
C96-1078Alignment Of Shared Forests For Bilingual Corpora
C96-2098Extraction Of Lexical Translations From Non-Aligned Corpora
C96-2141HMM-Based Word Alignment In Statistical Translation
C96-2211Pattern-Based Machine Translation
J96-1001Translating Collocations For Bilingual Lexicons: A Statistical Approach
J96-1002A Maximum Entropy Approach To Natural Language Processing
P96-1020Pattern-Based Context-Free Grammars For Machine Translation
P96-1021A Polynomial-Time Algorithm For Statistical Machine Translation
P96-1023Head Automata And Bilingual Tiling: Translation With Minimal Representations (Invited Talk)
A97-1050Semi-Automatic Acquisition Of Domain-Specific Translation Lexicons
J97-2004A Class-Based Approach To Word Alignment
J97-3002Stochastic Inversion Transduction Grammars And Bilingual Parsing Of Parallel Corpora
P97-1022Fertility Models For Statistical Natural Language Understanding
P97-1037A DP-Based Search Using Monotone Alignments In Statistical Translation
P97-1038An Alignment Method For Noisy Parallel Corpora Based On Image Processing Techniques
P97-1039A Portable Algorithm For Mapping Bitext Correspondence
P97-1046A Comparison Of Head Transducers And Transfer For A Limited Domain Translation Application
P97-1047Decoding Algorithm In Statistical Machine Translation
P97-1063A Word-To-Word Model Of Translational Equivalence
W97-0119Finding Terminology Translations From Non-Parallel Corpora
W97-0207Measuring Semantic Entropy
W97-0311Automatic Discovery Of Non-Compositional Compounds In Parallel Data
W97-0405A Formal Basis For Spoken Language Translation By Analogy
W97-0408English-To-Mandarin Speech Translation With Head Transducers
W97-1014Word Triggers And The EM Algorithm
C98-1006Automatic Acquisition of Hierarchical Transduction Models for Machine Translation
C98-1066An IR Approach for Translating New Words from Nonparallel Comparable Texts
C98-1071Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora
C98-2129Bitext Correspondences through Rich Mark-up
C98-2153A DP based Search Algorithm for Statistical Machine Translation
C98-2157Improving Statistical Natural Language Translation with Categories and Rules
C98-2225Machine Translation with a Stochastic Grammatical Channel
J98-4003Machine Transliteration
P98-1006Automatic Acquisition Of Hierarchical Transduction Models For Machine Translation
P98-1069An IR Approach For Translating New Words From Nonparallel Comparable Texts
P98-1074Flow Network Models For Word Alignment And Terminology Extraction From Bilingual Corpora
P98-2134Bitext Correspondences Through Rich Mark-Up
P98-2158A DP Based Search Algorithm For Statistical Machine Translation
P98-2162Improving Statistical Natural Language Translation With Categories And Rules
P98-2221Modeling With Structures In Statistical Machine Translation
P98-2230Machine Translation With A Stochastic Grammatical Channel
W98-1103Using A Probabilistic Translation Model For Cross-Language Information Retrieval
E99-1010An Efficient Method For Determining Bilingual Word Classes
J99-1003Bitext Maps And Alignment Via Pattern Recognition
J99-4005Decoding Complexity In Word-Replacement Translation Models
P99-1027Should We Translate The Documents Or The Queries In Cross-Language Information Retrieval?
W99-0602Text-Translation Alignment: Three Languages Are Better Than Two
W99-0604Improved Alignment Models For Statistical Machine Translation
A00-1004Automatic Construction Of Parallel English-Chinese Corpus For Cross-Language Information Retrieval
A00-1018An Automatic Reviser: The TransCheck System
A00-1019Unit Completion For A Computer-Aided Translation Typing System
C00-1056An English To Korean Transliteration Model Of Extended Markov Window
C00-1064Structural Feature Selection For English-Korean Statistical Machine Translation
C00-1078Chart-Based Transfer Rule Application In Machine Translation
C00-2123Word Re-Ordering And DP-Based Search In Statistical Machine Translation
C00-2162Improving SMT Quality With Morpho-Syntactic Analysis
C00-2163A Comparison Of Alignment Models For Statistical Machine Translation
J00-1004Learning Dependency Translation Models As Collections Of Finite-State Head Transducers
J00-2004Models Of Translational Equivalence Among Words
P00-1006A Maximum Entropy/Minimum Divergence Translation Model
P00-1041Headline Generation Based On Statistical Translation
P00-1050Chinese-Korean Word Alignment Based On Linguistic Comparison
P00-1056Improved Statistical Alignment Models
P00-1067PENS: A Machine-Aided English Writing System For Chinese Users
W00-0507TransType: A Computer-Aided Translation Typing System
W00-0508Stochastic Finite-State Models For Spoken Language Machine Translation
W00-0707Incorporating Position Information Into A Maximum Entropy/Minimum Divergence Translation Model
W00-0801An Unsupervised Method For Multilingual Word Sense Tagging Using Parallel Corpora
W00-1314Word Alignment Of English-Chinese Bilingual Corpus Based On Chucks
H01-1061Robust Knowledge Discovery From Parallel Speech And Text Sources
H01-1062The RWTH System For Statistical Translation Of Spoken Dialogues
N01-1018A Finite-State Approach To Machine Translation
N01-1020Multipath Translation Lexicon Induction Via Bridge Languages
P01-1008Extracting Paraphrases From A Parallel Corpus
P01-1026Organizing Encyclopedic Knowledge Based On The Web And Its Application To Question Answering
P01-1027Refined Lexicon Models For Statistical Machine Translation Using A Maximum Entropy Approach
P01-1030Fast Decoding And Optimal Decoding For Machine Translation
P01-1050Towards A Unified Approach To Memory- And Statistical-Based Machine Translation
P01-1067A Syntax-Based Statistical Translation Model
W01-0505Improving Lexical Mapping Model Of English-Korean Bitext Using Structural Features
W01-1208Question Answering Using Encyclopedic Knowledge Generated From The Web
W01-1405Stochastic Modelling: From Pattern Classification To Language Translation
W01-1406A Best-First Alignment Algorithm For Automatic Extraction Of Transfer Mappings From Bilingual Corpora
W01-1407Toward Hierarchical Models For Statistical Machine Translation Of Inflected Languages
W01-1408An Efficient A* Search Algorithm For Statistical Machine Translation
W01-1409Building A Statistical Machine Translation System From Scratch: How Much Bang For The Buck Can We Expect?
W01-1410Machine Translation With Grammar Association: Some Improvements And The Loco_C Model
W01-1411Towards A Simple And Accurate Statistical Approach To Learning Translation Relationships Among Words
W01-1413Using The Web As A Bilingual Dictionary
C02-1002A Cheap And Fast Way To Build Useful Translation Lexicons
C02-1008A Transitive Model For Extracting Translation Equivalents Of Web Queries Through Anchor Text Mining
C02-1009A Robust Cross-Style Bilingual Sentences Alignment Model
C02-1011Base Noun Phrase Translation Using Web Data And The EM Algorithm
C02-1032Improving Alignment Quality In Statistical Machine Translation Using Context-Dependent Maximum Entropy Models
C02-1050Bidirectional Decoding For Statistical Machine Translation
C02-1076Using Language And Translation Models To Select The Best Among Outputs From Multiple MT Systems
C02-1102Learning How To Answer Questions Using Trivia Games
C02-1134Bootstrapping Bilingual Data Using Consensus Translation For A Multilingual Instant Messaging System
C02-1164Language Model Adaptation With Additional Text Generated By Machine Translation
P02-1038Discriminative Training And Maximum Entropy Models For Statistical Machine Translation
P02-1039A Decoder For Syntax-Based Statistical MT
P02-1050Evaluating Translational Correspondence Using Annotation Projection
P02-1051Translating Named Entities Using Monolingual And Bilingual Resources
P02-1052Using Similarity Scoring To Improve The Bilingual Dictionary For Sub-Sentential Alignment
W02-0705Speech Translation Performance Of Statistical Dependency Transduction And Semantic Similarity Transduction
W02-0718The VI Framework Program In Europe: Some Thoughts About Speech To Speech Translation Research
W02-1012Extentions To HMM-Based Statistical Word Alignment Models
W02-1018A Phrase-Based Joint Probability Model For Statistical Machine Translation
W02-1019Minimum Bayes-Risk Word Alignments Of Bilingual Texts
W02-1020User-Friendly Text Prediction For Translators
W02-1021Generation Of Word Graphs In Statistical Machine Translation
W02-1022Bootstrapping Lexical Choice Via Multiple-Sequence Alignment
W02-1037Processing Comparable Corpora With Bilingual Suffix Trees
W02-1039Phrasal Cohesion And Statistical Machine Translation
W02-1405Improving A General-Purpose Statistical Translation Engine By Terminological Lexicons
W02-1607Building A Training Corpus For Word Sense Disambiguation In English-To-Vietnamese Machine Translation
E03-1007Using POS Information For SMT Into Morphologically Rich Languages
E03-1029Automatic Construction Of Machine Translation Knowledge Using Translation Literalness
E03-1050Using Noisy Biligual Data For Statistical Machine Translation
E03-1055Comparison Of Alignment Templates And Maximum Entropy Models For NLP
J03-1002A Systematic Comparison Of Various Statistical Alignment Models
J03-1005Word Reordering And A Dynamic Programming Beam Search Algorithm For Statistical Machine Translation
N03-1010Greedy Decoding For Statistical Machine Translation In Almost Linear Time
N03-1017Statistical Phrase-Based Translation
N03-1019A Weighted Finite State Transducer Implementation Of The Alignment Template Model For Statistical Machine Translation
N03-2006Adaptation Using Out-Of-Domain Corpus Within EBMT
N03-2016Cognates Can Improve Statistical Translation Models
N03-2036A Phrase-Based Unigram Model For Statistical Machine Translation
N03-4001TIPS: A Translingual Information Processing System
P03-1003A Noisy-Channel Approach To Question Answering
P03-1011Loosely Tree-Based Alignment For Machine Translation
P03-1012A Probability Model To Improve Word Alignment
P03-1016Synonymous Collocation Extraction Using Translation Information
P03-1039Chunk-Based Statistical Translation
P03-1040Feature-Rich Statistical Translation Of Noun Phrases
P03-1041Effective Phrase Translation Extraction From Alignment Models
P03-1050Unsupervised Learning Of Arabic Stemming Using A Parallel Corpus
P03-1051Language Model Based Arabic Word Segmentation
P03-2017Towards Interactive Text Understanding
W03-0301An Evaluation Exercise For Word Alignment
W03-0302ProAlign: Shared Task System Description
W03-0303Word Alignment Based On Bilingual Bracketing
W03-0304Statistical Translation Alignment With Compositionality Constraints
W03-0305Reducing Parameter Space For Word Alignment
W03-0309The Duluth Word Alignment System
W03-0310Bootstrapping Parallel Corpora
W03-0313Translation Spotting For Translation Memories
W03-0315Efficient Optimization For Bilingual Sentence Alignment Based On Linear Regression
W03-0321Comparing The Sentence Alignment Yield From Two News Corpora Using A Dictionary-Based Alignment System
W03-0413Confidence Estimation For Translation Prediction
W03-0414Using 'smart' Bilingual Projection To Feature-Tag A Monolingual Dictionary
W03-0602Words And Pictures In The News
W03-0604Towards A Framework For Learning Structured Shape Models From Text-Annotated Images
W03-0608Why Can't Jose Read? The Problem Of Learning Semantic Associations In A Robot Environment
W03-1001A Projection Extension Algorithm For Statistical Machine Translation
W03-1002Statistical Machine Translation Using Coercive Two-Level Syntactic Transduction
W03-1003Cross-Lingual Lexical Triggers In Statistical Language Modeling
W03-1502Automatic Extraction Of Named Entity Translingual Equivalence Based On Multi-Feature Cost Minimization
W03-1506Multi-Language Named-Entity Recognition System Based On HMM
W03-1508Transliteration Of Proper Names In Cross-Lingual Information Retrieval
W03-1714Semantic Maps For Word Alignment In Bilingual Parallel Corpora
C04-1005Improving Statistical Word Alignment With A Rule-Based Machine Translation System
C04-1006Improved Word Alignment Using A Symmetric Lexicon Model
C04-1015Example-Based Machine Translation Based On Syntactic Transfer With Statistical Models
C04-1031Word To Word Alignment Strategies
C04-1032Symmetric Word Alignments For Statistical Machine Translation
C04-1045Improving Word Alignment Quality Using Morpho-Syntactic Information
C04-1046Confidence Estimation For Machine Translation
C04-1047Using A Mixture Of N-Best Lists From Multiple MT Systems In Rank-Sum-Based Confidence Measure For MT Outputs
C04-1051Unsupervised Construction Of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources
C04-1059Language Model Adaptation For Statistical Machine Translation Via Structured Query Models
C04-1060Syntax-Based Alignment: Supervised Or Unsupervised?
C04-1073Improving A Statistical MT System With Automatically Learned Rewrite Patterns
C04-1090A Path-Based Transfer Model For Machine Translation
C04-1091An Algorithmic Framework For Solving The Decoding Problem In Statistical Machine Translation
C04-1168A Unified Approach In Speech-To-Speech Translation: Integrating Features Of Speech Recognition And Machine Translation
J04-2003Statistical Machine Translation With Scarce Resources Using Morpho-Syntactic Information
J04-2004Machine Translation With Inferred Stochastic Finite-State Transducers
J04-4002The Alignment Template Approach To Statistical Machine Translation
N04-1008Automatic Question Answering: Beyond The Factoid
N04-1021A Smorgasbord Of Features For Statistical Machine Translation
N04-1034Improved Machine Translation Performance Via Parallel Sentence Extraction From Comparable Corpora
N04-1036Improving Named Entity Translation Combining Phonetic And Semantic Similarities
N04-4003Example-Based Rescoring Of Statistical Machine Translation Output
N04-4015Morphological Analysis For Statistical Machine Translation
N04-4026A Unigram Orientation Model For Statistical Machine Translation
P04-1022Collocation Translation Acquisition Using Monolingual Corpora
P04-1023Statistical Machine Translation With Word- And Sentence-Aligned Parallel Corpora
P04-1063Multi-Engine Machine Translation With Voted Language Model
P04-1064Aligning Words Using Matrix Factorisation
P04-1066Improving IBM Word Alignment Model 1
P04-1083Statistical Machine Translation By Parsing
P04-3002Improving Domain-Specific Word Alignment For Computer Assisted Translation
P04-3005Customizing Parallel Corpora At The Document Level
P04-3014Improving Bitext Word Alignments Via Syntax-Based Reordering Of English
W04-0857Generative Models For Semantic Role Labeling
W04-1118Do We Need Chinese Word Segmentation For Statistical Machine Translation?
W04-1513Synchronous Dependency Insertion Grammars: A Grammar Formalism For Syntax Based Statistical MT
W04-2207Identifying Correspondences Between Words: An Approach Based On A Bilingual Syntactic Analysis Of French/English Parallel Corpora
W04-3207Bilingual Parsing With Factored Estimation: Using English To Parse Korean
W04-3208Mining Very-Non-Parallel Corpora: Parallel Sentence And Lexicon Extraction Via Bootstrapping And EM
W04-3216A Phrase-Based HMM Approach To Document/Abstract Alignment
W04-3219Monolingual Machine Translation For Paraphrase Generation
W04-3225Adaptive Language And Translation Models For Interactive Machine Translation
W04-3226Improving Word Alignment Models Using Structured Monolingual Corpora
W04-3227Phrase Pair Rescoring With Term Weighting For Statistical Machine Translation
W04-3228Dependencies Vs. Constituents For Tree-Based Alignment
W04-3245From Machine Translation To Computer Assisted Translation Using Finite-State Models
W04-3248A New Approach For English-Chinese Named Entity Alignment
W04-3250Statistical Significance Tests For Machine Translation Evaluation
W04-3255Efficient Decoding For Statistical Machine Translation With A Fully Expanded WFST Model
H05-1011A Discriminative Framework For Bilingual Word Alignment
H05-1012A Maximum Entropy Word Aligner For Arabic-English Machine Translation
H05-1021Local Phrase Reordering Models For Statistical Machine Translation
H05-1022HMM Word And Phrase Alignment For Statistical Machine Translation
H05-1023Inner-Outer Bracket Models For Word Alignment Using Hidden Blocks
H05-1024Alignment Link Projection Using Transformation-Based Learning
H05-1057Matching Inconsistently Spelled Names In Automatic Speech Recognizer Output For Information Retrieval
H05-1061Mining Key Phrase Translations From Web Corpora
H05-1095Translating With Non-Contiguous Phrases
H05-1096Word-Level Confidence Estimation For Machine Translation Using Phrase-Based Translation Models
H05-1097Word-Sense Disambiguation For Machine Translation
H05-1110Inducing A Multilingual Dictionary From A Parallel Multitext In Related Languages
I05-1041Improving Statistical Word Alignment with Ensemble Methods
I05-1042Empirical Study of Utilizing Morph-Syntactic Information in SMT
I05-1051Phrase-Based Statistical Machine Translation: A Level of Detail Approach
I05-2003A Hybrid Chinese Language Model based on a Combination of Ontology with Statistical Method
I05-2012Automatic Extraction of English-Korean Translations for Constituents of Technical Terms
I05-2014BLEU in Characters: Towards Automatic MT Evaluation in Languages without Word Delimiters
I05-3011Learning a Log-Linear Model with Bilingual Phrase-Pair Features for Statistical Machine Translation
I05-4010Harvesting the Bitexts of the Laws of Hong Kong From the Web
I05-5001Support Vector Machines for Paraphrase Identification and Corpus Construction
I05-5002Automatically Constructing a Corpus of Sentential Paraphrases
J05-3002Sentence Fusion For Multidocument News Summarization
J05-4003Improving Machine Translation Performance By Exploiting Non-Parallel Corpora
J05-4004Induction Of Word And Phrase Alignments For Automatic Document Summarization
P05-1009Towards Developing Generation Algorithms For Text-To-Text Applications
P05-1032Scaling Phrase-Based Statistical Machine Translation To Larger Corpora And Longer Phrases
P05-1033A Hierarchical Phrase-Based Model For Statistical Machine Translation
P05-1057Log-Linear Models For Word Alignment
P05-1058Alignment Model Adaptation For Domain-Specific Word Alignment
P05-1059Stochastic Lexicalized Inversion Transduction Grammar For Alignment
P05-1066Clause Restructuring For Statistical Machine Translation
P05-1067Machine Translation Using Probabilistic Synchronous Dependency Insertion Grammars
P05-1068Context-Dependent SMT Model Using Bilingual Verb-Noun Collocation
P05-1074Paraphrasing With Bilingual Parallel Corpora
P05-2016Dependency-Based Statistical Machine Translation
P05-2022Using Bilingual Dependencies To Align Words In English/French Parallel Corpora
W05-0612An Expectation Maximization Approach To Pronoun Resolution
W05-0614Intentional Context In Situated Natural Language Learning
W05-0712An Integrated Approach For Arabic-English Named Entity Translation
W05-0801Association-Based Bilingual Word Alignment
W05-0804Bilingual Word Spectral Clustering For Statistical Machine Translation
W05-0806Augmenting A Small Parallel Text With Morpho-Syntactic Language
W05-0809Word Alignment For Languages With Scarce Resources
W05-0810NUKTI: English-Inuktitut Word Alignment System Description
W05-0812Improved HMM Alignment Models For Languages With Scarce Resources
W05-0814ISI's Participation In The Romanian-English Alignment Task
W05-0815Experiments Using MAR For Aligning Corpora
W05-0816Comparison Selection And Use Of Sentence Alignment Algorithms For New Language Pairs
W05-0817Combined Word Alignments
W05-0823Statistical Machine Translation Of Euparl Data By Using Bilingual N-Grams
W05-0825A Generalized Alignment-Free Phrase Extraction
W05-0826Combining Linguistic Data Views For Phrase-Based SMT
W05-0829Competitive Grouping In Integrated Phrase Segmentation And Alignment Model
W05-0835A Recursive Statistical Translation Model
W05-0836Training And Evaluating Error Minimization Decision Rules For Statistical Machine Translation
W05-1010Automatic Acquisition Of Bilingual Rules For Extraction Of Bilingual Word Pairs From Parallel Corpora
W05-1208A Probabilistic Setting And Lexical Coocurrence Model For Textual Entailment
E06-1004Computational Complexity Of Statistical Machine Translation
E06-1005Computing Consensus Translation For Multiple Machine Translation Systems Using Enhanced Hypothesis Alignment
E06-1019A Comparison Of Syntactically Motivated Word Alignment Spaces
E06-1020Improved Lexical Alignment By Combining Multiple Reified Alignments
E06-1046Edit Machines For Robust Multimodal Language Processing
E06-2002A Web-Based Demonstrator Of A Multi-Lingual Phrase-Based Translation System
J06-4004N-gram-based Machine Translation
N06-1002Do We Need Phrases? Challenging The Conventional Wisdom In Statistical Machine Translation
N06-1003Improved Statistical Machine Translation Using Paraphrases
N06-1004Segment Choice Models: Feature-Rich Models For Global Distortion In Statistical Machine Translation
N06-1013A Maximum Entropy Approach To Combining Word Alignments
N06-1014Alignment By Agreement
N06-1056Learning For Semantic Parsing With Statistical Machine Translation
N06-1057ParaEval: Using Paraphrases To Evaluate Summaries Automatically
N06-2051Bridging The Inflection Morphology Gap For Arabic Statistical Machine Translation
N06-3004Efficient Algorithms For Richer Formalisms: Parsing And Machine Translation
N06-4004MTTK: An Alignment Toolkit For Statistical Machine Translation
P06-1002Going Beyond AER: An Extensive Analysis Of Word Alignments And Their Impact On MT
P06-1009Discriminative Word Alignment With Conditional Random Fields
P06-1011Extracting Parallel Sub-Sentential Fragments From Non-Parallel Corpora
P06-1032Correcting ESL Errors Using Phrasal SMT Techniques
P06-1062A DOM Tree Alignment Model For Mining Parallel Data From The Web
P06-1065Improved Discriminative Bilingual Word Alignment
P06-1067Distortion Models For Statistical Machine Translation
P06-1072Annealing Structural Bias In Multilingual Weighted Grammar Induction
P06-1077Tree-To-String Alignment Template For Statistical Machine Translation
P06-1082Word Alignment In English-Hindi Parallel Corpus Using Recency-Vector Approach: Some Studies
P06-1091A Discriminative Global Training Algorithm For Statistical MT
P06-1096An End-To-End Discriminative Approach To Machine Translation
P06-1097Semi-Supervised Training For Statistical Word Alignment
P06-1098Left-To-Right Target Generation For Hierarchical Phrase-Based Translation
P06-1119Leveraging Reusability: Cost-Effective Lexical Acquisition For Large-Scale Ontology Translation
P06-1122Modelling Lexical Redundancy For Machine Translation
P06-2005A Phrase-Based Statistical Model For SMS Text Normalization
P06-2014Soft Syntactic Constraints For Word Alignment Through Discriminative Training
P06-2061Integration Of Speech To Computer-Assisted Translation Using Finite-State Automata
P06-2065Unsupervised Analysis For Decipherment Problems
P06-2070Stochastic Iterative Alignment For Machine Translation Evaluation
P06-2092ATLAS - A New Text Alignment Architecture
P06-2093Continuous Space Language Models For Statistical Machine Translation
P06-2103Discourse Generation Using Utility-Trained Coherence Models
P06-2107Statistical Phrase-Based Models For Interactive Computer-Assisted Translation
P06-2111Finding Synonyms Using Automatic Word Alignment And Measures Of Distributional Similarity
P06-2112Word Alignment For Languages With Scarce Resources Using Bilingual Corpora Of Other Language Pairs
P06-2117Boosting Statistical Word Alignment Using Labeled And Unlabeled Data
P06-2124BiTAM: Bilingual Topic AdMixture Models For Word Alignment
W06-1204Using Information About Multi-Word Expressions For The Word-Alignment Task
W06-1606SPMT: Statistical Machine Translation With Syntactified Target Language Phrases
W06-1607Phrasetable Smoothing For Statistical Machine Translation
W06-1609Statistical Machine Reordering
W06-1626Distributed Language Modeling For $N$-Best List Re-Ranking
W06-1627Efficient Search For Inversion Transduction Grammar
W06-1628A Discriminative Model For Tree-To-Tree Translation
W06-2008Projecting POS Tags And Syntactic Dependencies From English And French To Polish In Aligned Corpora
W06-2402Grouping Multi-Word Expressions According To Part-Of-Speech In Statistical Machine Translation
W06-3102Initial Explorations In English To Turkish Statistical Machine Translation
W06-3104Quasi-Synchronous Grammars: Alignment By Soft Projection Of Syntactic Dependencies
W06-3105Why Generative Phrase Models Underperform Surface Heuristics
W06-3107Searching For Alignments In SMT: A Novel Approach Based On An Estimation Of Distribution Algorithm
W06-3109Generalized Stack Decoding Algorithms For Statistical Machine Translation
W06-3111Partitioning Parallel Documents Using Binary Segmentation
W06-3116Mood At Work: Ramses Versus Pharaoh
W06-3119Syntax Augmented Machine Translation Via Chart Parsing
W06-3123Constraining The Phrase-Based Joint Probability Statistical Translation Model
W06-3124Microsoft Research Treelet Translation System: Meeting Of The North American Association For Computational Linguistics 2006 Europarl Evaluation
W06-3125N-Gram-Based SMT System Enhanced With Reordering Patterns
W06-3126The LDV-COMBO System For SMT
W06-3601A Syntax-Directed Translator With Extended Domain Of Locality
W06-3602Efficient Dynamic Programming Search Algorithms For Phrase-Based SMT
D07-1003What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA
D07-1005Improving Word Alignment with Bridge Languages
D07-1006Getting the Structure Right for Word Alignment: LEAF
D07-1025A Sequence Alignment Model Based on the Averaged Perceptron
D07-1030Using RBMT Systems to Produce Bilingual Corpus for SMT
D07-1038Syntactic Re-Alignment Models for Machine Translation
D07-1045Smooth Bilingual $N$-Gram Translation
D07-1079What Can Syntax-Based MT Learn from Phrase-Based MT?
D07-1090Large Language Models in Machine Translation
D07-1103Improving Translation Quality by Discarding Most of the Phrasetable
N07-1008Direct Translation Model 2
N07-1022Generation by Inverting a Semantic Parser that Uses Statistical Machine Translation
N07-1046A Log-Linear Block Transliteration Model based on Bi-Stream HMMs
N07-1057Multilingual Structural Projection across Interlinear Text
N07-1061A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation
N07-1064Statistical Phrase-Based Post-Editing
N07-2009Generalized Graphical Abstractions for Statistical Machine Translation
N07-2010Situated Models of Meaning for Sports Video Retrieval
N07-2022Discriminative Alignment Training without Annotated Data for Machine Translation
N07-2034An Integrated Architecture for Speech-Input Multi-Target Machine Translation
P07-1001Guiding Statistical Word Alignment Models With Prior Knowledge
P07-1003Tailoring Word Alignments to Syntactic Machine Translation
P07-1004Transductive learning for statistical machine translation
P07-1011Detecting Erroneous Sentences using Automatically Mined Sequential Patterns
P07-1016Semantic Transliteration of Personal Names
P07-1020Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction
P07-1039Bootstrapping Word Alignment via Word Packing
P07-1047Automated Vocabulary Acquisition and Interpretation in Multimodal Conversational Systems
P07-1059Statistical Machine Translation for Query Expansion in Answer Retrieval
P07-1082Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back-Transliteration
P07-1090Ordering Phrases with Function Words
P07-1092Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora
P07-1108Pivot Language Approach for Phrase-Based Statistical Machine Translation
P07-3010Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction
W07-0212How Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis through Complex Network Approach
W07-0403Inversion Transduction Grammar for Joint Phrasal Translation Modeling
W07-0409Combining Morphosyntactic Enriched Representation with n-best Reranking in Statistical Translation
W07-0412Probabilistic Synchronous Tree-Adjoining Grammars for Machine Translation: The Argument from Bilingual Dictionaries
W07-0703Integration of an Arabic Transliteration Module into a Statistical Machine Translation System
W07-0705Can We Translate Letters?
W07-0708Speech-Input Multi-Target Machine Translation
W07-0709Meta-Structure Transformation Model for Statistical Machine Translation
W07-0715An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation
W07-0717Mixture-Model Adaptation for SMT
W07-0721Analysis of Statistical and Morphological Classes to Generate Weigthed Reordering Hypotheses on a Statistical Machine Translation System
W07-0724NRC's PORTAGE System for WMT 2007
W07-1205Deep Grammars in a Tree Labeling Approach to Syntax-based Statistical Machine Translation
C08-1007Enhancing Multilingual Latent Semantic Analysis with Term Alignment Information
C08-1014Regenerating Hypotheses for Statistical Machine Translation
C08-1093Translating Queries into Snippets for Improved Query Expansion
C08-1128Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation
C08-1136Extracting Synchronous Grammar Rules From Word-Level Alignments in Linear Time
C08-1138Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation
D08-1026Incorporating Temporal and Semantic Information with Eye Gaze for Automatic Word Acquisition in Multimodal Conversational Systems
D08-1033Sampling Alignment Structure under a Bayesian Translation Model
D08-1039Triplet Lexicon Models for Statistical Machine Translation
D08-1043Bridging Lexical Gaps between Queries and Questions on Large Online Q&A Collections with Compact Translation Models
D08-1053Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
D08-1078Predicting Success in Machine Translation
D08-1082A Generative Model for Parsing Natural Language to Meaning Representations
D08-1084A Phrase-Based Alignment Model for Natural Language Inference
I08-1013An Effective Compositional Model for Lexical Alignment
I08-1024A Comparative Study for Query Translation using Linear Combination and Confidence Measure
I08-1033Improving Word Alignment by Adjusting Chinese Word Segmentation
I08-1043Paraphrasing Depending on Bilingual Context Toward Generalization of Translation Knowledge
I08-1068Statistical Machine Translation Models for Personalized Search
I08-2104Statistical Machine Translation based Passage Retrieval for Cross-Lingual Question Answering
I08-6006Statistical Transliteration for Cross Language Information Retrieval using HMM alignment model and CRF
I08-8001Transformation-based Sentence Splitting method for Statistical Machine Translation
L08-1057Annotation Guidelines for Chinese-Korean Word Alignment
L08-1132Phrase-Based Machine Translation based on Simulated Annealing
L08-1286Bilingual Text Classification using the IBM 1 Translation Model
L08-1564Automatic Translation of Biomedical Terms by Supervised Machine Learning
L08-1572Sentence Alignment in DPC: Maximizing Precision Minimizing Human Effort
L08-1582Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion
P08-1010Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?
P08-1012Bayesian Learning of Non-Compositional Phrases with Synchronous Parsing
P08-1019Searching Questions by Identifying Question Topic and Question Focus
P08-1058Randomized Language Models via Perfect Hash Functions
P08-1082Learning to Rank Answers on Large Online QA Collections
P08-1113Mining Parenthetical Translations from the Web by Word Alignment
P08-1116Combining Multiple Resources to Improve SMT-based Paraphrasing Model
W08-0306Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation
W08-0307Using Shallow Syntax Information to Improve Word Alignment and Reordering for SMT
W08-0321Improving Word Alignment with Language Model Based Confidence Scores
W08-0326MaTrEx: The DCU MT System for WMT 2008
W08-0333Fast Easy and Cheap: Construction of Statistical Machine Translation Models with MapReduce
W08-0408Multiple Reorderings in Phrase-Based Machine Translation
W08-0409Improving Word Alignment Using Syntactic Dependencies
W08-0509Parallel Implementations of Word Alignment Tool




Top Similar Papers
By Title
ID Title
C80-1091A Mathematical Model Of The Vocabulary-Text Relation
P07-1104A Comparative Study of Parameter Estimation Methods for Statistical Natural Language Processing
P02-1001Parameter Estimation For Probabilistic Finite-State Transducers
H92-1028Parameter Estimation For Constrained Context-Free Language Models
H91-1046A Trellis-Based Algorithm For Estimating The Parameters Of Hidden Stochastic Context-Free Grammar
W04-1302On Statistical Parameter Setting
W02-2018A Comparison Of Algorithms For Maximum Entropy Parameter Estimation
W07-0716Using Paraphrases for Parameter Tuning in Statistical Machine Translation
C92-1060An Algorithm For Estimating The Parameters Of Unrestricted Hidden Stochastic Context-Free Grammars
J84-3001On The Mathematical Properties Of Linguistic Theories


By Abstract
ID Title


By Full Text
ID Title
H94-1028The Candide System For Machine Translation
J96-1002A Maximum Entropy Approach To Natural Language Processing
W93-0301Robust Bilingual Word Alignment For Machine Aided Translation
H93-1039But Dictionaries Are Data Too
J04-4002The Alignment Template Approach To Statistical Machine Translation
C88-1016A Statistical Approach To Language Translation
C98-2153A DP based Search Algorithm for Statistical Machine Translation
P98-2158A DP Based Search Algorithm For Statistical Machine Translation
E06-1004Computational Complexity Of Statistical Machine Translation
H92-1053Dividing And Conquering Long Sentences In A Translation System


By Co-citation
ID Title Num Co-citations
J03-1002A Systematic Comparison Of Various Statistical Alignment Models 120
N03-1017Statistical Phrase-Based Translation 97
P02-1040Bleu: A Method For Automatic Evaluation Of Machine Translation 85
P00-1056Improved Statistical Alignment Models 82
C96-2141HMM-Based Word Alignment In Statistical Translation 80
J97-3002Stochastic Inversion Transduction Grammars And Bilingual Parsing Of Parallel Corpora 71
P03-1021Minimum Error Rate Training In Statistical Machine Translation 69
P01-1067A Syntax-Based Statistical Translation Model 62
W99-0604Improved Alignment Models For Statistical Machine Translation 51
J90-2002A Statistical Approach To Machine Translation 50


Citation Summary
Citing sentences
D08-1053 1 22:157 Compared with clean parallel corpora such as "Hansard" (Brown et al. 1993), which consists of 505 French-English translations of political debates in the Canadian parliament, texts from the web are far more diverse and noisy.
D08-1053 2 8:157 1 Introduction Sentence-aligned parallel bilingual corpora have been essential resources for statistical machine translation (Brown et al. 1993), and many other multi-lingual natural language processing applications.
P03-1041 3 11:152 Re-ordering effects across languages have been modeled in several ways, including word-based (Brown et al. , 1993), template-based (Och et al. , 1999) and syntax-based (Yamada, Knight, 2001).
P03-1041 4 6:152 The traditional framework presented in (Brown et al. , 1993) assumes a generative process where the source sentence is passed through a noisy stochastic process to produce the target sentence.
P03-1041 5 8:152 Within the generative model, the Bayes reformulation is used to estimate a31 a0a15a14a35a33a1a26a13a37a36 a31 a0a15a14a19a13 a31 a0a2a1a38a33a14a39a13 where a31 a0a15a14a39a13 is considered the language model, and a31 a0a2a1a38a33a14a19a13 is the translation model; the IBM (Brown et al. , 1993) models being the de facto standard.
I08-1033 6 9:155 Most existing methods treat word tokens as basic alignment units (Brown et al., 1993; Vogel et al., 1996; Deng and Byrne, 2005), however, many languages have no explicit word boundary markers, such as Chinese and Japanese.
W01-1407 7 27:148 If we assign a probability a13a15a14a17a16 a10a12a11a5a19a18a2 a3a5a21a20 to each pair of strings a16 a10 a11a5a12a22 a2a4a3a5 a20, then according to Bayes decision rule, we have to choose the English string that maximizes the product of the English language model a13a23a14a24a16 a10 a11a5 a20 and the string translation model a13a15a14a17a16a25a2 a3a5a26a18a10a27a11a5a28a20 . Many existing systems for statistical machine translation (Wang and Waibel, 1997; Nieen et al. , 1998; Och and Weber, 1998) make use of a special way of structuring the string translation model like proposed by (Brown et al. , 1993): The correspondence between the words in the source and the target string is described by alignments which assign one target word position to each source word position.
W01-1407 8 97:148 Table 2 summarizes the characteristics of the training corpus used for training the parameters of Model 4 proposed in (Brown et al. , 1993).
W06-1607 9 36:162 To derive the joint counts c(s,t) from which p(s|t) and p(t|s) are estimated, we use the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
W03-0310 10 40:130 This cost can often be substantial, as with the Penn Treebank (Marcus et al. , 1993).
W99-0602 11 10:212 Bilingual alignments have so far shown that they can play multiple roles in a wide range of linguistic applications, such as computer assisted translation (Isabelle et al. , 1993; Brown et al. , 1990), terminology (Dagan and Church, 1994) lexicography (Langlois, 1996; Klavans and Tzoukermann, 1995; Melamed, 1996), and cross-language information retrieval (Nie et al. , * This research was funded by the Canadian Department of Foreign Affairs and International Trade (http://~.dfait-maeci.gc.ca/), via the Agence de la francophonie (http://~.
W99-0602 12 32:212 However, in the experiments described here, we focus on alignment at the level of sentences, this for a number of reasons: First, sentence alignments have so far proven their usefulness in a number of applications, e.g. bilingual lexicography (Langlois, 1996; Klavans and Tzoukermann, 1995; Dagan and Church, 1994), automatic translation verification (Macklovitch, 1995; Macklovitch, 1996) and the automatic acquisition of knowledge about translation (Brown et al. , 1993).
W03-0301 13 98:139 Four teams had approaches that relied (to varying degrees) on an IBM model of statistical machine translation (Brown et al. , 1993).
C04-1006 14 40:187 A detailed description of the popular translation models IBM-1 to IBM-5 (Brown et al. , 1993), aswellastheHidden-Markovalignmentmodel (HMM) (Vogel et al. , 1996) can be found in (Och and Ney, 2003).
C04-1006 15 14:187 Word alignment models were first introduced in statistical machine translation (Brown et al. , 1993).
C04-1006 16 36:187 These alignment models stem from the source-channel approach to statistical machine translation (Brown et al. , 1993).
C04-1006 17 16:187 Using the IBM translation models IBM-1 to IBM-5 (Brown et al. , 1993), as well as the Hidden-Markov alignment model (Vogel et al. , 1996), we can produce alignments of good quality.
C04-1006 18 161:187 6 Related Work The popular IBM models for statistical machine translation are described in (Brown et al. , 1993).
P07-1001 19 133:185 For the simple bag-of-word bilingual LSA as describedinSection2.2.1,afterSVDonthesparsematrix using the toolkit SVDPACK (Berry et al. , 1993), all source and target words are projected into a lowdimensional (R = 88) LSA-space.
P07-1001 20 22:185 It can be applied to complicated models such IBM Model-4 (Brown et al. , 1993).
P07-1001 21 87:185 Berry et al (1993)) to yield W W = U S V T as Figure 3 shows, where, for some order R lessmuch min(M,N) of the decomposition, U is a MR left singular matrix with rows ui, i = 1,,M, S is a RR diagonal matrix of singular values s1 s2 sR greatermuch 0, and V is NR a right singular matrix with rows vj, j = 1,,N. For each i, the scaled R-vector uiS may be viewed as representing wi, thei-th word in the vocabulary, and similarly the scaled R-vector vjS as representing dj, j-th document in the corpus.
P07-1001 22 23:185 We shall take HMM-based word alignment model (Vogel et al. , 1996) as an example and follow the notation of (Brown et al. , 1993).
P07-1001 23 10:185 For instance, the most relaxed IBM Model-1, which assumes that any source word can be generated by any target word equally regardless of distance, can be improved by demanding a Markov process of alignments as in HMM-based models (Vogel et al. , 1996), or implementing a distribution of number of target words linked to a source word as in IBM fertility-based models (Brown et al. , 1993).
W02-1018 24 139:145 For example, in our previous work (Marcu, 2001), we have used a statistical translation memory of phrases in conjunction with a statistical translation model (Brown et al. , 1993).
W02-1018 25 21:145 In constrast with many previous approaches (Brown et al. , 1993; Och et al. , 1999; Yamada and Knight, 2001), our model does not try to capture how Source sentences can be mapped into Target sentences, but rather how Source and Target sentences can be generated simultaneously.
W02-1018 26 3:145 1 Motivation Most of the noisy-channel-based models used in statistical machine translation (MT) (Brown et al. , 1993) are conditional probability models.
W02-1018 27 13:145 Intuitively, if we allow any Source words to be aligned to any Target words, the best alignment that we can come up with is the one in Figure 1.c. Sentence pair (S2, T2) offers strong evidence that b c in language S means the same thing as x in language T. On the basis of this evidence, we expect the system to also learn from sentence pair (S1, T1) that a in language S means the same thing as y in language T. Unfortunately, if one works with translation models that do not allow Target words to be aligned to more than one Source word as it is the case in the IBM models (Brown et al. , 1993) it is impossible to learn that the phrase b c in language S means the same thing as word x in language T. The IBM Model 4 (Brown et al. , 1993), for example, converges to the word alignments shown in Figure 1.b and learns the translation probabilities shown in Figure 1.a.2 Since in the IBM model one cannot link a Target word to more than a Source word, the training procedure 2To train the IBM-4 model, we used Giza (Al-Onaizan et al. , 1999).
W02-1018 28 6:145 A variety of methods are used to account for the re-ordering stage: word-based (Brown et al. , 1993), templatebased (Och et al. , 1999), and syntax-based (Yamada and Knight, 2001), to name just a few.
C00-2162 29 15:113 Many existing systems tbr SMT (Wang and Waibel, 1997; Niefien et al. , 1.(/98; Och and Weber, 1998) make use of a special way of structuring the string translation model (Brown et al. , 1993): 'l?he correspondence between the words in the source and the target string is described by aligmuents that assign one target word position to each source word position.
D07-1006 30 80:193 (Brown et al. , 1993) defined two local search operations for their 1-to-N alignment models 3, 4 and 5.
D07-1006 31 69:193 2.2 Unsupervised Parameter Estimation We can perform maximum likelihood estimation of the parameters of this model in a similar fashion to that of Model 4 (Brown et al. , 1993), described thoroughly in (Och and Ney, 2003).
D07-1006 32 70:193 We use Viterbi training (Brown et al. , 1993) but neighborhood estimation (Al-Onaizan et al. , 1999; Och and Ney, 2003) or pegging (Brown et al. , 1993) could also be used.
D07-1006 33 151:193 5 Previous Work The LEAF model is inspired by the literature on generative modeling for statistical word alignment and particularly by Model 4 (Brown et al. , 1993).
W00-0801 34 81:219 It is an implementation of Models 1-4 of Brown et al. \[1993\], where each of these models produces a Viterbi alignment.
P02-1039 35 13:224 For the IBM models defined by a pioneering paper (Brown et al. , 1993), a decoding algorithm based on a left-to-right search was described in (Berger et al. , 1996).
W03-0608 36 45:216 Fortunately, there is a straightforward parallel between our object recognition formulation and the statistical machine translation problem of building a lexicon from an aligned bitext (Brown et al. , 1993; Al-Onaizan et al. , 1999).
J98-4003 37 258:280 (p. 18) Whether this is a useful perspective for machine translation is debatable (Brown et al. 1993; Knoblock 1996)--however, it is a dead-on description of transliteration.
D07-1038 38 50:170 However, searching the space of all possible alignments is intractable for EM, so in practice the procedure is bootstrapped by models with narrower search space such as IBM Model 1 (Brown et al. , 1993) or Aachen HMM (Vogel et al. , 1996).
D07-1038 39 36:170 3.1 The traditional IBM alignment model IBM Model 4 (Brown et al. , 1993) learns a set of 4 probability tables to compute p(f|e) given a foreign sentence f and its target translation e via the following (greatly simplified) generative story: 361 NP-C NPB NPB NNP taiwan POS s NN surplus PP IN in NP-C NPB NN trade PP IN between NP-C NPB DT the CD two NNS shores FTD0 GR G4E7 DYBG EL DIDV TAIWAN IN TWO-SHORES TRADE MIDDLE SURPLUS R1: NP-C NPB x0:NPB x1:NN x2:PP x0 x2EL x1 R10: NP-C NPB x0:NPB x1:NN x2:PP x0 x2 x1 R10: NP-C NPB x0:NPB x1:NN x2:PP x0 x2 x1 R2: NPB NNP taiwan POS s FTD0 R11: NPB x0:NNP POS s x0 R17: NPB NNP taiwan x0:POS x0 R12: NNP taiwan FTD0 R18: POS s FTD0 R3: PP x0:IN x1:NP-C x0 x1 R13: PP IN in x0:NP-C GR x0EL R19: PP IN in x0:NP-C x0 R4: IN in GR R5: NP-C x0:NPB x1:PP x1 x0 R5: NP-C x0:NPB x1:PP x1 x0 R20: NP-C x0:NPB PP x1:IN x2:NP-C x2 x0 x1 R6: PP IN between NP-C NPB DT the CD two NNS shores G4E7 R14: PP IN between x0:NP-C x0 R21: IN between EL R15: NP-C x0:NPB x0 R15: NP-C x0:NPB x0 R16: NPB DT the CD two NNS shores G4E7 R22: NPB x0:DT CD two x1:NNS x0 x1 R23: NNS shores G4E7 R24: DT the GR R7: NPB x0:NN x0 R7: NPB x0:NN x0 R7: NPB x0:NN x0 R8: NN trade DYBG R9: NN surplus DIDV R8: NN trade DYBG R9: NN surplus DIDV R8: NN trade DYBG R9: NN surplus DIDV Figure 2: A (English tree, Chinese string) pair and three different sets of multilevel tree-to-string rules that can explain it; the first set is obtained from bootstrap alignments, the second from this papers re-alignment procedure, and the third is a viable, if poor quality, alternative that is not learned.
P05-1009 40 9:155 In Machine Translation, for example, sentences are produced using application-specific decoders, inspired by work on speech recognition (Brown et al. , 1993), whereas in Summarization, summaries are produced as either extracts or using task-specific strategies (Barzilay, 2003).
C02-1008 41 12:161 Another kind of popular approaches to dealing with query translation based on corpus-based techniques uses a parallel corpus containing aligned sentences whose translation pairs are corresponding to each other (Brown et al. , 1993; Dagan et al. , 1993; Smadja et al. , 1996).
A94-1006 42 126:178 We have been using the output of word_align, a robust alignment program that proved useful for bilingual concordancing of noisy texts (Dagan et al. , 1993).
A94-1006 43 15:178 Word alignment is newer, found only in a few places (Gale and Church, 1991a; Brown et al. , 1993; Dagan et al. , 1993).
A94-1006 44 115:178 have been used in statistical machine translation (Brown et al. , 1990), terminology research and translation aids (Isabelle, 1992; Ogden and Gonzales, 1993; van der Eijk, 1993), bilingual lexicography (Klavans and Tzoukermann, 1990; Smadja, 1992), word-sense disambiguation (Brown et al. , 1991b; Gale et al. , 1992) and information retrieval in a multilingual environment (Landauer and Littman, 1990).
A94-1006 45 114:178 3 Bilingual Task: An Application for Word Alignment 3.1 Sentence and word alignment Bilingual alignment methods (Warwick et al. , 1990; Brown et al. , 1991a; Brown et al. , 1993; Gale and Church, 1991b; Gale and Church, 1991a; Kay and Roscheisen, 1993; Simard et al. , 1992; Church, 1993; Kupiec, 1993a; Matsumoto et al. , 1993; Dagan et al. , 1993).
A94-1006 46 14:178 Part-ofspeech taggers are used in a few applications, such as speech synthesis (Sproat et al. , 1992) and question answering (Kupiec, 1993b).
A94-1006 47 117:178 Algorithms for the more difficult task of word alignment were proposed in (Gale and Church, 1991a; Brown et al. , 1993; Dagan et al. , 1993) and were applied for parameter estimation in the IBM statistical machine translation system (Brown et al. , 1993).
P01-1067 48 84:254 To make this paper comparable to (Brown et al. , 1993), we use English-French notation in this section.
P01-1067 49 15:254 Mathematical details are fully described in (Brown et al. , 1993).
P01-1067 50 171:254 Let a183a49a48a50 a69 a188 a50 a51a181a51a181a51a212a188 a50a7a51a24a52 a48a54a53 a185a56a55 be a substring of a183 from the word a188 a50 with length a57 . Note this notation is different from (Brown et al. , 1993).
P01-1067 51 40:254 Following (Brown et al. , 1993) and the other literature in TM, this paper only focuses the details of TM.
P07-1039 52 103:170 4.3 Baseline We use a standard log-linear phrase-based statistical machine translation system as a baseline: GIZA++ implementation of IBM word alignment model 4 (Brown et al. , 1993; Och and Ney, 2003),8 the refinement and phrase-extraction heuristics described in (Koehn et al. , 2003), minimum-error-rate training 7More specifically, we choose the first English reference from the 7 references and the Chinese sentence to construct new sentence pairs.
P07-1039 53 26:170 To quickly (and approximately) evaluate this phenomenon, we trained the statistical IBM wordalignment model 4 (Brown et al. , 1993),1 using the GIZA++ software (Och and Ney, 2003) for the following language pairs: ChineseEnglish, Italian English, and DutchEnglish, using the IWSLT-2006 corpus (Takezawa et al. , 2002; Paul, 2006) for the first two language pairs, and the Europarl corpus (Koehn, 2005) for the last one.
P07-1039 54 35:170 They can be seen as extensions of the simpler IBM models 1 and 2 (Brown et al. , 1993).
P07-1039 55 7:170 Most current statistical models (Brown et al. , 1993; Vogel et al. , 1996; Deng and Byrne, 2005) treat the aligned sentences in the corpus as sequences of tokens that are meant to be words; the goal of the alignment process is to find links between source and target words.
P98-2221 56 76:144 The subset was the neighboring alignments (Brown et al. , 1993) of the Viterbi alignments discovered by Model 1 and Model 2.
P98-2221 57 6:144 1 Introduction Most (if not all) statistical machine translation systems employ a word-based alignment model (Brown et al. , 1993; Vogel, Ney, and Tillman, 1996; Wang and Waibel, 1997), which treats words in a sentence as independent entities and ignores the structural relationship among them.
C04-1168 58 33:197 According to the statistical machine translation formalism (Brown et al. , 1993), the translation process is to search for the best sentence bE such that bE = arg max E P(EjJ) = arg maxE P(JjE)P(E) where P(JjE) is a translation model characterizing the correspondence between E and J; P(E), the English language model probability.
W06-1204 59 215:232 State-of-art systems for doing word alignment use generative models like GIZA++ (Och and Ney, 2003; Brown et al. , 1993).
W07-0409 60 7:156 1 Introduction Recent works in statistical machine translation (SMT) shows how phrase-based modeling (Och and Ney, 2000a; Koehn et al. , 2003) significantly outperform the historical word-based modeling (Brown et al. , 1993).
P96-1021 61 10:177 Estimation of the parameters has been described elsewhere (Brown et al. , 1993).
P96-1021 62 27:177 Such linguistic-preprocessing techniques could 1Various models have been constructed by the IBM team (Brown et al. , 1993).
P03-1016 63 116:213 2.3.4 Word Translation Probability Estimation Many methods are used to estimate word translation probabilities from unparallel or parallel bilingual corpora (Koehn and Knight, 2000; Brown et al. , 1993).
P03-1016 64 100:213 Equation (2) is rewritten as: )|()|()|( )|()|()|()|( 2211 21 ce colecolcolcolcol rrpcepcep crpcepcepcep = = (3) It is equal to a word translation model if we take the relation type in the collocations as an element like a word, which is similar to Model 1 in (Brown et al. , 1993).
W03-0604 65 41:196 (1993) (as in Duygulu et al. , 2002), and extend it to structured shape descriptions of visual data.
W03-0604 66 194:196 We have developed a set of extensions to a probabilistic translation model (Brown et al. , 1993) that enable us to successfully merge oversegmented regions into coherent objects.
W03-0604 67 79:196 Probabilistic translation models generally seek to find the translation string e that maximizes the probability Pra5 ea6fa7, given the source string f (where f referred to French and e to English in the original work, Brown et al. , 1993).
C04-1015 68 13:201 On the other hand, statistical MT employing IBM models (Brown et al. , 1993) translates an input sentence by the combination of word transfer and word re-ordering.
D07-1103 69 23:214 These joint counts are estimated using the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
N04-4026 70 37:109 The orientation model is related to the distortion model in (Brown et al. , 1993), but we do not compute a block alignment during training.
W08-0326 71 36:80 Assuming that the parameters P(etk|fsk) are known, the most likely alignment is computed by a simple dynamic-programming algorithm.1 Instead of using an Expectation-Maximization algorithm to estimate these parameters, as commonly done when performing word alignment (Brown et al., 1993; Och and Ney, 2003), we directly compute these parameters by relying on the information contained within the chunks.
J97-2004 72 109:485 Notice that most in-context and dictionary translations of source words are bounded within the same category in a typical thesaurus such as the LLOCE (McArthur 1992) and CILIN (Mei et al. 1993).
J97-2004 73 248:485 The above observations can be stated formally from the perspective of Brown et al.'s (1993) Model 2.
J97-2004 74 372:485 In terms of alignment, this wordnumber difference means that multiword connections must be considered, a task which 334 Sue J. Ker and Jason S. Chang Word Alignment is beyond the reach of methods proposed in recent alignment works based on Brown et al.'s (1993) Model 1 and 2.
P01-1050 75 33:154 a65 The rest of the factors denote distorsion probabilities (d), which capture the probability that words change their position when translated from one language into another; the probability of some French words being generated from an invisible English NULL element (pa6 ), etc. See (Brown et al. , 1993) or (Germann et al. , 2001) for a detailed discussion of this translation model and a description of its parameters.
P01-1050 76 11:154 In this framework, the source language, let-s say English, is assumed to be generated by a noisy probabilistic source.1 Most of the current statistical MT systems treat this source as a sequence of words (Brown et al. , 1993).
P01-1050 77 24:154 First, we show how one can use an existing statistical translation model (Brown et al. , 1993) in order to automatically derive a statistical TMEM.
P01-1050 78 28:154 2 The IBM Model 4 For the work described in this paper we used a modified version of the statistical machine translation tool developed in the context of the 1999 Johns HopkinsSummer Workshop (Al-Onaizan et al. , 1999), which implements IBM translation model 4 (Brown et al. , 1993).
W06-2008 79 69:175 GIZA++ consists of a set of statistical translation models of different complexity, namely the IBM ones (Brown et al. , 1993).
P97-1037 80 10:219 Among all possible target strings, we will choose the one with the highest probability which is given by Bayes' decision rule (Brown et al 1993):,~ = argmaxP,'(e\]~lfg~)} = argmax {P,'(ef).
P97-1037 81 44:219 Therefore the probability of alignment aj for position j should have a dependence on the previous alignment position O j_l: P((/j \[(/j-1 ) A similar approach has been chosen by (Dagan et al. , 1993) and (Vogel et al 1996).
P97-1037 82 41:219 The concept of these alignments is similar to the ones introduced by (Brown et al. , 1993), but we will use another type of dependence in the probability distributions.
P97-1037 83 67:219 The IBM model 1 (Brown et al. , 1993) is used to find an initial estimate of the translation probabilities.
P97-1037 84 24:219 Models describing these types of dependencies are referred to as alignrnen.t models (Brown et al. , 1993), (Dagan eta\] 1993).
D08-1043 85 43:186 Usually the IBM Model 1, developed in the statistical machine translation field (Brown et al., 1993), is used to construct translation models for retrieval purposes in practice.
C00-1078 86 62:146 These transtbr rules are pairs of corresponding rooted substructures, where a substructure (Matsumoto et al. , 1993) is a connected set of arcs and nodes.
C00-1078 87 11:146 In our Machine %'anslation system, transfer rules are generated automatically from parsed parallel text along the lines of (Matsulnoto el; al,, 1993; Meyers et al. , 1996; Meyers et al. , 1998b).
W05-0815 88 26:82 The idea is that the translation of a sentence x into a sentence y can be performed in the following steps1: (a) If x is small enough, IBMs model 1 (Brown et al. , 1993) is employed for the translation.
C04-1047 89 83:195 For scoring MT outputs, the proposed RSCM uses a score based on a translation model called IBM4 (Brown et al. , 1993) (TM-score) and a score based on a language model for the translation target language (LM-score).
P02-1051 90 92:229 The score for a given candidate a9 is given by a modified IBM Model 1 probability (Brown et al. , 1993) as follows: a2a4a3a6a9a21a10a13a12a15a7a14a2 a15 a24a26a17a16 a2a4a3a6a9a19a18 a14a15a10a12 a7 (4) a2 a15 a20 a24a16a22a21a24a23a26a25a1a27a28a27a28a27 a20 a24a16a30a29a1a23a26a25 a31 a32 a33 a23a35a34a37a36 a3a38a12 a33 a10a12a9 a16a8a39 a7 (5) where a40 is the length of a9, a41 is the length of a12, a15 is a scaling factor based on the number of matches of a9 found, and a14 a33 is the index of the English word aligned with a12 a33 according to alignment a14 . The probability a36 a3a6a9 a16a8a39 a10a12 a33 a7 is a linear combination of the transliteration and translation score, where the translation score is a uniform probability over all dictionary entries for a12 a33 . The scored matches form the list of translation candidates.
P06-1082 91 9:160 1 Introduction Several approaches including statistical techniques (Gale and Church, 1991; Brown et al. , 1993), lexical techniques (Huang and Choi, 2000; Tiedemann, 2003) and hybrid techniques (Ahrenberg et al. , 2000), have been pursued to design schemes for word alignment which aims at establishing links between words of a source language and a target language in a parallel corpus.
P06-1082 92 113:160 Use of sententially aligned corpora for word alignment has already been recommended in (Brown et al. , 1993).
N07-2034 93 36:83 A monotonous segmentation copes with monotonous alignments, that is, j < k aj < ak following the notation of (Brown et al. , 1993).
W02-1012 94 6:186 We refer to a3a16a5a7 as the source language string and a10 a11a7 as the target language string in accordance with the noisy channel terminology used in the IBM models of (Brown et al. , 1993).
W02-1012 95 20:186 (1993) and the HMM alignment model of (Vogel et al. , 1996).
E06-1046 96 15:194 First, a parsing-based approach attempts to recover partial parses from the parse chart when the input cannot be parsed in its entirety due to noise, in order to construct a (partial) semantic representation (Dowding et al. , 1993; Allen et al. , 2001; Ward, 1991).
E06-1046 97 61:194 We adopt an approach, similar to (Ciaramella, 1993; Boros et al. , 1996), in which the meaning representation, in our case XML, is transformed into a sorted flat list of attribute-value pairs indicating the core contentful concepts of each command.
E06-1046 98 28:194 To this end, we adopt techniques from statistical machine translation (Brown et al. , 1993; Och and Ney, 2003) and use statistical alignment to learn the edit patterns.
H05-1096 99 96:156 Therefore, we determine the maximal translation probability of the target word e over the source sentence words: pIBM1(e|fJ1 ) = maxj=0,,J p(e|fj), (9) where f0 is the empty source word (Brown et al. , 1993).
W97-0408 100 63:97 3.3 Model Construction The head transducer model was trained and evaluated on English-to-Mandarin Chinese translation of transcribed utterances from the ATIS corpus (Hirschman et al. 1993).
C00-2163 101 11:122 In this paper we will describe extensions to tile Hidden-Markov alignment model froln (Vogel et al. , 1.996) and compare tlmse to Models 1 4 of (Brown et al. , 1993).
C00-2163 102 44:122 3 Model 1 and Model 2 l~cl)lacing the (l(~,t)endence on aj-l in the HMM alignment mo(M I)y a del)endence on j, we olltain a model wlfich (:an lie seen as a zero-order Hid(l(mMarkov Model which is similar to Model 2 1)rot)ose(t t/y (Brown et al. , 1993).
C00-2163 103 53:122 Tile full description of Model 4 (Brown et al. , 1993) is rather complica.ted as there have to be considered tile cases that English words have fertility larger than one and that English words have fertility zero.
C00-2163 104 50:122 Model 4 of (Brown et al. , 1993) is also a first-order alignment model (along the source positions) like the HMM, trot includes also fertilities.
C00-2163 105 57:122 Therefore, the Viterbi alignment is comlmted only approximately using the method described in (Brown et al. , 1993).
C00-2163 106 7:122 Most SMT models (Brown et al. , 1993; Vogel et al. , 1996) try to model word-to-word corresl)ondences between source and target words using an alignment nmpl)ing from source l)osition j to target position i = aj.
C00-2163 107 63:122 As in tile HMM we easily can extend the dependencies in the alignment model of Model 4 easily using the word class of the previous English word E = G(ci,), or the word class of the French word F = G(Ij) (Brown et al. , 1993).
C00-2163 108 49:122 1087 Model 3 of (Brown et al. , 1993) is a zero-order alignment model like Model 2 including in addition fertility paranmters.
W97-0311 109 266:355 Several authors have used mutual information and similar statistics as an objective function for word clustering (Dagan et al. , 1993; Brown et al. , 1992; Pereira et al. , 1993; Wang et al. , 1996), for automatic determination of phonemic baseforms (Lucassen & Mercer, 1984), and for language modeling for speech recognition (Ries ct al. , 1996).
W97-0311 110 258:355 (Brown et aL, 1993) The heuristics in Section 6 are designed specifically to find the interesting features in that featureless desert.
W97-0311 111 42:355 2 Translation Models A translation model can be constructed automatically from texts that exist in two languages (bitexts) (Brown et al. , 1993; Melamed, 1997).
N07-1022 112 58:209 These rules are learned using a word alignment model, which finds an optimal mapping from words to MR predicates given a set of training sentences and their correct MRs. Word alignment models have been widely used for lexical acquisition in SMT (Brown et al. , 1993; Koehn et al. , 2003).
N07-1022 113 40:209 Compared to earlier word-based methods such as IBM Models (Brown et al. , 1993), phrasebased methods such as PHARAOH are much more effective in producing idiomatic translations, and are currently the best performing methods in SMT (Koehn and Monz, 2006).
A00-1004 114 121:232 A number of alignment techniques have been proposed, varying from statistical methods (Brown et al. , 1991; Gale and Church, 1991) to lexical methods (Kay and RSscheisen, 1993; Chen, 1993).
W06-3123 115 9:111 The original IBM Models (Brown et al. , 1993) learn word-to-word alignment probabilities which makes it computationally feasible to estimate model parameters from large amounts of training data.
W94-0115 116 16:139 Brute-force methods (ie those that exploit the massive raw computing power currently available cheaply) may well produce some useful results (eg Brown et al 1993).
N06-1003 117 20:146 By 17 0 10 20 30 40 50 60 70 80 90 100 10000 100000 1e+06 1e+07 Test Set Items with Translations (%) Training Corpus Size (num words) unigrams bigrams trigrams 4-grams Figure 1: Percent of unique unigrams, bigrams, trigrams, and 4-grams from the Europarl Spanish test sentences for which translations were learned in increasingly large training corpora increasing the size of the basic unit of translation, phrase-based machine translation does away with many of the problems associated with the original word-based formulation of statistical machine translation (Brown et al. , 1993).
N06-1003 118 8:146 1 Introduction As with many other statistical natural language processing tasks, statistical machine translation (Brown et al. , 1993) produces high quality results when ample training data is available.
W06-3119 119 7:125 1 Introduction Recent work in machine translation has evolved from the traditional word (Brown et al. , 1993) and phrase based (Koehn et al. , 2003a) models to include hierarchical phrase models (Chiang, 2005) and bilingual synchronous grammars (Melamed, 2004).
W06-3102 120 8:125 1 Introduction The availability of large amounts of so-called parallel texts has motivated the application of statistical techniques to the problem of machine translation starting with the seminal work at IBM in the early 90s (Brown et al. , 1992; Brown et al. , 1993).
W06-3102 121 9:125 Statistical machine translation views the translation process as a noisy-channel signal recovery process in which one tries to recover the input signal e, from the observed output signal f.1 Early statistical machine translation systems used a purely word-based approach without taking into account any of the morphological or syntactic properties of the languages (Brown et al. , 1993).
W02-1020 122 59:160 The translation component is an analog of the IBM model 2 (Brown et al. , 1993), with parameters that are optimized for use with the trigram.
D07-1003 123 56:243 Similarly, Murdock and Croft (2005) adopted a simple translation model from IBM model 1 (Brown et al. , 1990; Brown et al. , 1993) and applied it to QA.
D07-1003 124 156:243 This sort of problem can be solved in principle by conditional variants of the Expectation-Maximization algorithm (Baum et al. , 1970; Dempster et al. , 1977; Meng and Rubin, 1993; Jebara and Pentland, 1999).
D07-1003 125 97:243 The tree is produced by a state-of-the-art dependency parser (McDonald et al. , 2005) trained on the Wall Street Journal Penn Treebank (Marcus et al. , 1993).
P01-1027 126 83:148 This is exactly the standard lexicon probability a27a28a18a26a4 a20a12 a22 employed in the translation model described in (Brown et al. , 1993) and in Section 2.
P01-1027 127 31:148 If we assign a probability a15a17a16a19a18 a12 a13a7a21a20a4a6a5a7a23a22 to each pair of strings a18 a12a14a13a7a25a24 a4 a5a7 a22, then according to Bayes decision rule, we have to choose the target string that maximizes the product of the target language model a15a17a16a19a18 a12a14a13a7 a22 and the string translation model a15a17a16a19a18a26a4a6a5 a7 a20 a12 a13 a7 a22 . Many existing systems for statistical machine translation (Berger et al. , 1994; Wang and Waibel, 1997; Tillmann et al. , 1997; Nieen et al. , 1998) make use of a special way of structuring the string translation model like proposed by (Brown et al. , 1993): The correspondence between the words in the source and the target string is described by alignments that assign one target word position to each source word position.
P01-1027 128 47:148 That is obtained using the Viterbi alignment provided by a translation model as described in (Brown et al. , 1993).
P01-1027 129 24:148 Similar techniques are used in (Papineni et al. , 1996; Papineni et al. , 1998) for socalled direct translation models instead of those proposed in (Brown et al. , 1993).
I05-5001 130 8:155 One promising approach extends standard Statistical Machine Translation (SMT) techniques (e.g. , Brown et al. , 1993; Och & Ney, 2000, 2003) to the problems of monolingual paraphrase identification and generation.
W03-0305 131 11:130 2 Word Alignment algorithm We use IBM Model 4 (Brown et al. , 1993) as a basis for our word alignment system.
N03-4001 132 8:36 By segmenting words into morphemes, we can improve the performance of natural language systems including machine translation (Brown et al. 1993) and information retrieval (Franz, M. and McCarley, S. 2002).
P03-1050 133 71:163 2.2 The Translation Model We adapted Model 1 (Brown et al. , 1993) to our purposes.
W03-0309 134 7:112 The Duluth Word Alignment System is a Perl implementation of IBM Model 2 (Brown et al. , 1993).
W03-0309 135 11:112 (Brown et al. , 1993) introduced five statistical translation models (IBM Models 1 5).
H05-1024 136 29:187 2 Related Work One of the major problems with the IBM models (Brown et al. , 1993) and the HMM models (Vogel et al. , 1996) is that they are restricted to the alignment of each source-language word to at most one targetlanguage word.
I05-4010 137 8:133 Large volumes of training data of this kind are indispensable for constructing statistical translation models (Brown et al. , 1993; Melamed, 2000), acquiring bilingual lexicon (Gale and Church, 1991; Melamed, 1997), and building example-based machine translation (EBMT) systems (Nagao, 1984; Carl and Way, 2003; Way and Gough, 2003).
P03-1040 138 109:201 As a baseline, we use an IBM Model 4 (Brown et al. , 1993) system3 with a greedy decoder4 (Germann et al. , 2001).
P03-1012 139 131:202 These constraints tie words in such a way that the space of alignments cannot be enumerated as in IBM models 1 and 2 (Brown et al. , 1993).
P03-1012 140 6:202 1 Introduction Word alignments were first introduced as an intermediate result of statistical machine translation systems (Brown et al. , 1993).
A97-1050 141 49:291 On the other end of the spectrum, character-based bitext mapping algorithms (Church, 1993; Davis et al. , 1995) are limited to language pairs where cognates are common; in addition, they may easily be misled by superficial differences in formatting and page layout and must sacrifice precision to be computationally tractable.
P05-2022 142 5:149 There are basically two kinds of systems working at these segmentation levels: the most widespread rely on statistical models, in particular the IBM ones (Brown et al. , 1993); others combine simpler association measures with different kinds of linguistic information (Arhenberg et al. , 2000; Barbu, 2004).
J99-1003 143 13:489 Although the above statement was made about translation problems faced by human translators, recent research (Brown et al. 1993; Melamed 1996b) suggests that it also applies to problems in machine translation.
J99-1003 144 15:489 For example, bilingual lexicographers can use bitexts to discover new cross-language lexicalization patterns (Catizone, Russell, and Warwick 1993; Gale and Church 1991b); students of foreign languages can use one half of a bitext to practice their reading skills, referring to the other half for translation when they get stuck (Nerbonne et al. 1997).
J99-1003 145 107:489 A limitation of Church's method, and therefore also of Dagan, Church, and Gale's method, is that orthographic cognates exist only among languages with similar alphabets (Church et al. 1993).
J99-1003 146 104:489 Then they adapted Brown et al.'s (1993) statistical translation Model 2 to work with this model of cooccurrence.
P03-1051 147 9:146 By segmenting words into morphemes, we can improve the performance of natural language systems including machine translation (Brown et al. 1993) and information retrieval (Franz, M. and McCarley, S. 2002).
P03-1051 148 57:146 (Darwish 2002), is not very useful for applications like statistical machine translation, (Brown et al. 1993), for which an accurate word-to-word alignment between the source and the target languages is critical for high quality translations.
A94-1012 149 158:165 The computation mechanism of GP and LP bears a resemblance to the EM algorithm(Dempster et al. , 1977; Brown et al. , 1993), which iteratively computes maximum likelihood estimates from incomplete data.
A94-1012 150 12:165 Unlike probabilistic parsing, proposed by (Fujisaki et al. , 1989; Briscoe and Carroll, 1993), *also a staff member of Matsushita Electric Industrial Co. ,Ltd., Shinagawa, Tokyo, JAPAN.
A94-1012 151 14:165 It also differs from previous proposals on lexical acquisition using statistical measures such as (Church et al. , 1991; Brent, 1991; Brown et al. , 1993) which either deny the prior existence of linguistic knowledge or use linguistic knowledge in ad hoc ways.
N03-1017 152 68:203 As the first method, we learn phrase alignments from a corpus that has been word-aligned by a training toolkit for a word-based translation model: the Giza++ [Och and Ney, 2000] toolkit for the IBM models [Brown et al. , 1993].
N03-1017 153 183:203 For more information on these models, please refer to Brown et al. [1993].
P08-1082 154 93:236 This probability is computed using IBMs Model 1 (Brown et al., 1993): P(Q|A) = productdisplay qQ P(q|A) (3) P(q|A) = (1)Pml(q|A)+Pml(q|C) (4) Pml(q|A) = summationdisplay aA (T(q|a)Pml(a|A)) (5) where the probability that the question term q is generated from answer A, P(q|A), is smoothed using the prior probability that the term q is generated from the entire collection of answers C, Pml(q|C).
P08-1082 155 136:236 The text was split at the sentence level, tokenized and PoS tagged, in the style of the Wall Street Journal Penn TreeBank (Marcus et al., 1993).
P07-1011 156 8:216 Second, it can be applied to control the quality of parallel bilingual sentences mined from the Web, which are critical sources for a wide range of applications, such as statistical machine translation (Brown et al. , 1993) and cross-lingual information retrieval (Nie et al. , 1999).
P07-1016 157 58:212 By treating a letter/character as a word and a group of letters/characters as a phrase or token unit in SMT, one can easily apply the traditional SMT models, such as the IBM generative model (Brown et al. , 1993) or the phrase-based translation model (Crego et al. , 2005) to transliteration.
C96-2141 158 8:140 In the recent years, there have been a number of papers considering this or similar problems: (Brown et al. , 1990), (Dagan et al. , 1993), (Kay et al. , 1993), (Fung et al. , 1993).
C96-2141 159 59:140 A sinfilar approach has been chosen by (Da.gan et al. , 1993).
J04-2003 160 45:412 The translation models they presented in various papers between 1988 and 1993 (Brown et al. 1988; Brown et al. 1990; Brown, Della Pietra, Della Pietra, and Mercer 1993) are commonly referred to as IBM models 15, based on the numbering in Brown, Della Pietra, Della Pietra, and Mercer (1993).
J04-2003 161 34:412 Many existing systems for statistical machine translation (Garca-Varea and Casacuberta 2001; Germann et al. 2001; Nieen et al. 1998; Och, Tillmann, and Ney 1999) implement models presented by Brown, Della Pietra, Della Pietra, and Mercer (1993): The correspondence between the words in the source and the target strings is described by alignments that assign target word positions to each source word position.
P05-1058 162 6:212 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation (Brown et al. , 1993).
P05-1058 163 35:212 2 Statistical Word Alignment According to the IBM models (Brown et al. , 1993), the statistical word alignment model can be generally represented as in Equation (1).
P05-1058 164 50:212 = == = = m aj j m j aj l i i l i ii m j j mlajdeft en pp m ap 0:1 11 1 2 0 0 0 ),( ),,|()|( ! )|( )|,Pr()|,( 00 eef (3) 1 A cept is defined as the set of target words connected to a source word (Brown et al. , 1993).
P05-1058 165 37:212 This simplified version does not take word classes into account as described in (Brown et al. , 1993).
P98-1074 166 6:126 (Brown et al. , 1993) then extended their method and established a sound probabilistic model series, relying on different parameters describing how words within parallel sentences are aligned to each other.
P98-1074 167 7:126 On the other hand, (Dagan et al. , 1993) proposed an algorithm, borrowed to the field of dynamic programming and based on the output of their previous work, to find the best alignment, subject to certain constraints, between words in parallel sentences.
P98-1074 168 4:126 1 Introduction Early works, (Gale and Church, 1993; Brown et al. , 1993), and to a certain extent (Kay and R6scheisen, 1993), presented methods to ex~.:'~.ct bi'_.'i~gua!
I08-1024 169 132:236 These probabilities are estimated with IBM model 1 (Brown et al., 1993) on parallel corpora.
P06-2070 170 26:155 The corpus is aligned in the word level using IBM Model4 (Brown et al. , 1993).
P05-1068 171 11:175 Most of the phrase-based translation models have adopted the noisy-channel based IBM style models (Brown et al. , 1993): CMCT C1 BD BP CPD6CVD1CPDC CT C1 BD C8D6B4CU C2 BD CYCT C1 BD B5C8D6B4CT C1 BD B5 (1) In these model, we have two types of knowledge: translation model, C8D6B4CU C2 BD CYCT C1 BD B5 and language model, C8D6B4CT C1 BD B5.
P05-1068 172 73:175 3.1 Learning Chunk-based Translation We learn chunk alignments from a corpus that has been word-aligned by a training toolkit for wordbased translation models: the Giza++ (Och and Ney, 2000) toolkit for the IBM models (Brown et al. , 1993).
W08-0509 173 41:208 2.2 Implementation of GIZA++ GIZA++ is an implementation of ML estimators for several statistical alignment models, including IBM Model 1 through 5 (Brown et al., 1993), HMM (Vogel et al., 1996) and Model 6 (Och and Ney, 2003).
W08-0509 174 51:208 For example, (Brown et al., 1993) suggested two different methods: using only the alignment with the maximum probability, the so-called Viterbi alignment, or generating a set of alignments by starting from the Viterbi alignment and making changes, which keep the alignment probability high.
W08-0509 175 17:208 The word alignment models implemented in GIZA++, the so-called IBM (Brown et al., 1993) and HMM alignment models (Vogel et al., 1996) are typical implementation of the EM algorithm (Dempster et al., 1977).
N07-2009 176 30:91 3 GM Representation of IBM MT Models In this section we present a GM representation for IBM model 3 (Brown et al. , 1993) in fig.
N07-2009 177 83:91 We attribute the difference in M3/4 scores to the fact we use a Viterbi-like training procedure (i.e. , we consider a single configuration of the hidden variables in EM training) while GIZA uses pegging (Brown et al. , 1993) to sum over a set of likely hidden variable configurations in EM.
J93-1001 178 317:408 This is a particularly exciting area in computational linguistics as evidenced by the large number of contributions in these special issues: Biber (1993), Brent (1993), Hindle and Rooth (this issue), Pustejovsky et al.
J93-1001 179 241:408 Four alternatives are proposed in these special issues: (1) Brent (1993), (2) Briscoe and Carroll (this issue), (3) Hindle and Rooth (this issue), and (4) Weischedel et al.
W08-0409 180 33:167 The notation will assume ChineseEnglish word alignment and ChineseEnglish MT. Here we adopt a notation similar to (Brown et al., 1993).
W08-0409 181 103:167 4.3 Baselines 4.3.1 Word Alignment We used the GIZA++ implementation of IBM word alignment model 4 (Brown et al., 1993; Och and Ney, 2003) for word alignment, and the heuristics described in (Och and Ney, 2003) to derive the intersection and refined alignment.
W08-0409 182 11:167 Generative word alignment models, initially developed at IBM (Brown et al., 1993), and then augmented by an HMM-based model (Vogel et al., 1996), have provided powerful modeling capability for word alignment.
C96-2211 183 77:249 , 1993; Graham et al. , 1980) where K is the number of distinct nonternfinal symbols in the gramma.r G. We ca.n expect a. very etfide.nt pa.rser tbr our pa.tterns, r The input string ca.n a.lso be scanned to reduce the number of relewmt gramma.r rules before pa.rsing, e The combined process is a.lso known as offlineparsing in LTAC,.
P07-1082 184 40:191 (2004) argue that precise alignment can improve transliteration effectiveness, experimenting on English-Chinese data and comparing IBM models (Brown et al. , 1993) with phonemebased alignments using direct probabilities.
P04-3014 185 8:62 Syntax-light alignment models such as the five IBM models (Brown et al. , 1993) and their relatives have proved to be very successful and robust at producing word-level alignments, especially for closely related languages with similar word order and mostly local reorderings, which can be captured via simple models of relative word distortion.
P06-2092 186 11:152 Wordalignment, however, isalmost exclusively done using statistics (Brown et al. , 1993; Hiemstra, 1996; Vogel et al. , 1999; Toutanova et al. , 2002).
P06-2092 187 10:152 The alignment of sentences can be done sufficiently well using cues such as sentence length (Gale and Church, 1993) or cognates (Simard et al. , 1992).
P06-2092 188 58:152 While simple statistical alignment models like IBM-1 (Brown et al. , 1993) and the symmetric alignment approach by Hiemstra (1996) treat sentences as unstructured bags of words, the more sophisticated IBM-models by Brown et al.
P06-2092 189 57:152 3.2 Word Order Differences Another problem that has been noticed as early as 1993 with the first research on word alignment (Brown et al. , 1993) concerns the differences in word order between source and target language.
P06-2092 190 39:152 2.2 Word Alignment Aligning below the sentence level is usually done using statistical models for machine translation (Brown et al. , 1991; Brown et al. , 1993; Hiemstra, 1996; Vogel et al. , 1999) where any word of the targetlanguageistakentobeapossibletranslation for each source language word.
P06-2092 191 8:152 1 Introduction Aligning parallel text, i.e. automatically setting the sentences or words in one text into correspondence with their equivalents in a translation, is a very useful preprocessing step for a range of applications, including but not limited to machine translation (Brown et al. , 1993), cross-language information retrieval (Hiemstra, 1996), dictionary creation (Smadja et al. , 1996) and induction of NLP-tools (Kuhn, 2004).
P04-3002 192 5:97 1 Introduction Bilingual word alignment is first introduced as an intermediate result in statistical machine translation (SMT) (Brown et al. , 1993).
P04-3002 193 26:97 2 2.1 Word Alignment Adaptation Bi-directional Word Alignment In statistical translation models (Brown et al. , 1993), only one-to-one and more-to-one word alignment links can be found.
P06-1062 194 27:173 However, current sentence alignment models, (Brown et al 1991; Gale & Church 1991; Wu 1994; Chen 489 1993; Zhao and Vogel, 2002; etc).
P06-1062 195 106:173 is combined with [ ]E jiT,1+ to be aligned with [ ] F nmT,, then [ ]( ) [ ]( )ATTCNTATTr E K E i FEF jinmjinm,.Pr,P,1],[,],[ ],1[ += where K is the degree of .EiN Finally, the node translation probability is modeled as ( ) ( ) ( )tNtNlNlNNN EiFlEiFlEjFl PrPrPr . And the text translation probability ( )EF ttPr is model using IBM model I (Brown et al 1993).
W03-1002 196 35:186 POS tagging and phrase chunking in English were done using the trained systems provided with the fnTBL Toolkit (Ngai and Florian, 2001); both were trained from the annotated Penn Treebank corpus (Marcus et al. , 1993).
W03-1002 197 15:186 2 Prior Work Statistical machine translation, as pioneered by IBM (e.g. Brown et al. , 1993), is grounded in the noisy channel model.
C98-2153 198 52:197 The underlying translation model is Model 2 from (Brown et al., 1993).
C98-2153 199 38:197 It assumes that the distance of the positions relative to the diagonal of the (j, i) plane is the dominating factor: r(i jI) p(ilj, J, I) = (7) ~'=lr(i' j ) As described in (Brown et al., 1993), the EM algorithm can be used to estimate the parameters of the model.
C98-2153 200 24:197 960 1.2 Alignment with Mixture Distribution Several papers have discussed the first issue, especially the problem of word alignments for bilingual corpora (Brown et al., 1993), (Dagan et al., 1993), (Kay and RSscheisen, 1993), (Fung and Church, 1994), (Vogel et al., 1996).
C98-2153 201 192:197 (Vogel et al., 1996) report better perplexity results on the Verbmobil Corpus with their HMMbased alignment model in comparison to Model 2 of (Brown et al., 1993).
C98-2153 202 25:197 In our search procedure, we use a mixture-based alignment model that slightly differs from the model introduced as Model 2 in (Brown et al., 1993).
C04-1045 203 30:76 Detailed description of those models can be found in (Brown et al. , 1993), (Vogel et al. , 1996) and (Och and Ney, 2003).
C04-1045 204 13:76 2 Related Work The popular IBM models for statistical machine translation are described in (Brown et al. , 1993) and the HMM-based alignment model was introduced in (Vogel et al. , 1996).
C04-1045 205 8:76 So far, most of the statistical machine translation systems are based on the single-word alignment models as described in (Brown et al. , 1993) as well as the Hidden Markov alignment model (Vogel et al. , 1996).
P04-1023 206 6:168 1 Introduction Machine translation systems based on probabilistic translation models (Brown et al. , 1993) are generally trained using sentence-aligned parallel corpora.
H05-1057 207 93:240 3.2 Details To learn alignments, translation probabilities, etc in the first method we used work that has been done in statistical machine translation (Brown et al. , 1993), where the translation process is considered to be equivalent to a corruption of the source language text to the target language text due to a noisy channel.
H05-1057 208 103:240 There are five different IBM translation models (Brown et al. , 1993).
H05-1057 209 108:240 The IBM models have shown good performance in machine translation, and especially so within certain families of languages, for example in translating between French and English or between Sinhalese and Tamil (Brown et al. , 1993; Weerasinghe, 2004).
H05-1057 210 61:240 This is a common technique in machine translation for which the IBM translation models are popular methods (Brown et al. , 1993).
H05-1057 211 69:240 In the first of our methods we align manual transcripts and ASR sentences using the IBM translation model (Brown et al. , 1993) to obtain a probabilistic dictionary.
H05-1057 212 107:240 Further details are in the original paper (Brown et al. , 1993).
W05-0809 213 99:120 Several teams had approaches that relied (to varying degrees) on an IBM model of statistical machine translation (Brown et al. , 1993), with different improvements brought by different teams, consisting of new submodels, improvements in the HMM model, model combination for optimal alignment, etc. Se-veral teams used symmetrization metrics, as introduced in (Och and Ney, 2003) (union, intersection, refined), most of the times applied on the alignments produced for the two directions sourcetarget and targetsource, but also as a way to combine different word alignment systems.
J05-3002 214 482:557 Knight and Marcu (2000) treat reduction as a translation process using a noisychannel model (Brown et al. 1993).
E06-2002 215 13:77 By introducing the hidden word alignment variable a, the following approximate optimization criterion can be applied for that purpose: e = argmaxe Pr(e | f) = argmaxe summationdisplay a Pr(e,a | f) argmaxe,a Pr(e,a | f) Exploiting the maximum entropy (Berger et al. , 1996) framework, the conditional distribution Pr(e,a | f) can be determined through suitable real valued functions (called features) hr(e,f,a),r = 1R, and takes the parametric form: p(e,a | f) exp Rsummationdisplay r=1 rhr(e,f,a)} The ITC-irst system (Chen et al. , 2005) is based on a log-linear model which extends the original IBM Model 4 (Brown et al. , 1993) to phrases (Koehn et al. , 2003; Federico and Bertoldi, 2005).
N07-1064 216 98:182 To improve raw output from decoding, Portage relies on a rescoring strategy: given a list of n-best translations from the decoder, the system reorders this list, this time using a more elaborate loglinear model, incorporating more feature functions, in addition to those of the decoding model: these typically include IBM-1 and IBM-2 model probabilities (Brown et al. , 1993) and an IBM-1-based feature function designed to detect whether any word in one language appears to have been left without satisfactory translation in the other language; all of these feature functions can be used in both language directions, i.e. source-to-target and target-to-source.
W05-0835 217 19:185 This concept of alignment has been also used for tasks like authomatic vocabulary derivation and corpus alignment (Dagan et al. , 1993).
W05-0835 218 44:185 The elements of this set are pairs (x, y) where y is a possible translation for x. 4 IBMs model 1 IBMs model 1 is the simplest of a hierarchy of five statistical models introduced in (Brown et al. , 1993).
W05-0835 219 18:185 Most of them rely on the concept of alignment: a mapping from words or groups of words in a sentence into words or groups in the other (in the case of (Vidal et al. , 1993) the mapping goes from rules in a grammar for a language into rules of a grammar for the other language).
W05-0835 220 17:185 Different models have been presented in the literature, see for instance (Brown et al. , 1993; Och and Ney, 2004; Vidal et al. , 1993; Vogel et al. , 1996).
D07-1005 221 47:211 (2) We note that these posterior probabilities can be computed efficiently for some alignment models such as the HMM (Vogel et al. , 1996; Och and Ney, 2003), Models 1 and 2 (Brown et al. , 1993).
D07-1005 222 40:211 Given a sentence-pair (f,e), the most likely (Viterbi) word alignment is found as (Brown et al. , 1993): a = argmaxa P(f,a|e).
D07-1005 223 36:211 2 Word Alignment Framework A statistical translation model (Brown et al. , 1993; Och and Ney, 2003) describes the relationship between a pair of sentences in the source and target languages (f = fJ1,e = eI1) using a translation probability P(f|e).
D07-1005 224 43:211 Given any word alignment model, posterior probabilities can be computed as (Brown et al. , 1993) P(aj = i|e,f) =summationdisplay a P(a|f,e)(i,aj), (1) where i {0,1,,I}.
P05-1074 225 43:147 The original formulation of statistical machine translation (Brown et al. , 1993) was defined as a word-based operation.
E99-1010 226 36:136 To model p(fJle~;8,.T) we assume the existence of an alignment a J. We assume that every word fj is produced by the word e~j at position aj in the training corpus with the probability P(f~le,~i): J p(f lc ') = 1\] p(L Icon) j=l (7) The word alignment a J is trained automatically using statistical translation models as described in (Brown et al. , 1993; Vogel et al. , 1996).
E99-1010 227 7:136 Various clustering techniques have been proposed (Brown et al. , 1992; Jardino and Adda, 1993; Martin et al. , 1998) which perform automatic word clustering optimizing a maximum-likelihood criterion with iterative clustering algorithms.
N07-1061 228 10:313 1 Introduction The rapid and steady progress in corpus-based machine translation (Nagao, 1981; Brown et al. , 1993) has been supported by large parallel corpora such as the Arabic-English and Chinese-English parallel corpora distributed by the Linguistic Data Consortium and the Europarl corpus (Koehn, 2005), which consists of 11 European languages.
P97-1046 229 82:197 5 Effectiveness Comparison 5.1 English-Chinese ATIS Models Both the transfer and transducer systems were trained and evaluated on English-to-Mandarin Chinese translation of transcribed utterances from the ATIS corpus (Hirschman et al. 1993).
W07-0703 230 71:186 To generate phrase pairs from a parallel corpus, we use the "diag-and" phrase induction algorithm described in (Koehn et al, 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al, 1993).
J03-1002 231 362:462 An analysis of the alignments shows that smoothing the fertility probabilities significantly reduces the frequently occurring problem of rare words forming garbage collectors in that they tend to align with too many words in the other language (Brown, Della Pietra, Della Pietra, Goldsmith, et al. 1993).
W03-1508 232 42:177 The IBM source-channel model for statistical machine translation (P. Brown et al. , 1993) plays a central role in our system.
P06-2112 233 10:214 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation (Brown et al. , 1993).
P06-2112 234 55:214 3 Statistical Word Alignment According to the IBM models (Brown et al. , 1993), the statistical word alignment model can be generally represented as in equation (1).
P97-1039 235 107:216 correspondence points associated with frequent token types (Church, 1993) or by deleting frequent token types from the bitext altogether (Dagan et al. , 1993).
P97-1039 236 208:216 One important application of bitext maps is the construction of translation lexicons (Dagan et al. , 1993) and, as discussed, translation lexicons are an important information source for bitext mapping.
P97-1039 237 8:216 In addition to their use in machine translation (Sato & Nagao, 1990; Brown et al. , 1993; Melamed, 1997), translation models can be applied to machineassisted translation (Sato, 1992; Foster et al. , 1996), cross-lingual information retrieval (SIGIR, 1996), and gisting of World Wide Web pages (Resnik, 1997).
P97-1039 238 9:216 Bitexts also play a role in less automated applications such as concordancing for bilingual lexicography (Catizone et al. , 1993; Gale & Church, 1991b), computer-assisted language learning, and tools for translators (e.g.
C04-1091 239 44:213 2 Decoding The decoding problem in SMT is one of finding the most probable translation e in the target language of a given source language sentence f in accordance with the Fundamental Equation of SMT (Brown et al. , 1993): e = argmaxe Pr(f|e)Pr(e).
C04-1091 240 78:213 In each iteration of local search, we look in the neighborhood of the current best alignment for a better alignment (Brown et al. , 1993).
C04-1091 241 10:213 1 Introduction Decoding is one of the three fundamental problems in classical SMT (translation model and language model being the other two) as proposed by IBM in the early 1990s (Brown et al. , 1993).
H05-1061 242 83:182 One widely used model is the IBM model (Brown et al. 1993).
W03-1001 243 8:171 1 Introduction Various papers use phrase-based translation systems (Och et al. , 1999; Marcu and Wong, 2002; Yamada and Knight, 2002) that have shown to improve translation quality over single-word based translation systems introduced in (Brown et al. , 1993).
W05-0804 244 13:210 Previous work from (Wang et al. , 1996) showed improvements in perplexity-oriented measures using mixture-based translation lexicon (Brown et al. , 1993).
W02-1019 245 84:149 5.4 IBM-3 Word Alignment Models Since the true distribution over alignments is not known, we used the IBM-3 statistical translation model (Brown et al. , 1993) to approximate . This model is specified through four components: Fertility probabilities for words; Fertility probabilities for NULL; Word Translation probabilities; and Distortion probabilities.
W02-1019 246 12:149 2 Word-to-Word Bitext Alignment We will study the problem of aligning an English sentence to a French sentence and we will use the word alignment of the IBM statistical translation models (Brown et al. , 1993).
P06-1002 247 24:186 2 Related Work Starting with the IBM models (Brown et al. , 1993), researchers have developed various statistical word alignment systems based on different models, such as hidden Markov models (HMM) (Vogel et al. , 1996), log-linear models (Och and Ney, 2003), and similarity-based heuristic methods (Melamed, 2000).
W05-0814 248 46:74 We solve this using the local search defined in (Brown et al. , 1993).
W05-0814 249 25:74 Turning off the extensions to GIZA++ and training p0 as in (Brown et al. , 1993) produces a substantial increase in AER.
W05-0814 250 11:74 For these experiments, we have implemented an alignment package for IBM Model 4 using a hillclimbing search and Viterbi training as described in (Brown et al. , 1993), and extended this to use new submodels.
W05-0814 251 7:74 The system used for baseline experiments is two runs of IBM Model 4 (Brown et al. , 1993) in the GIZA++ (Och and Ney, 2003) implementation, which includes smoothing extensions to Model 4.
C00-1064 252 9:273 Thus, a lot of alignment techniques have been suggested at; the sentence (Gale et al. , 1993), phrase (Shin et al. , 1996), nomt t)hrase (Kupiec, 1993), word (Brown et al. , 1993; Berger et al. , 1996; Melamed, 1997), collocation (Smadja et al. , 1996) and terminology level.
C00-1064 253 15:273 In statistical machine translation, IBM 1~5 models (Brown et al. , 1993) based on the source-chmmel model have been widely used and revised for many language donmins and applications.
W03-0302 254 52:98 ProAlign models P(A|E,F) directly, using a different decomposition of terms than the model used by IBM (Brown et al. , 1993).
W03-0302 255 89:98 To avoid this problem, we sample from a space of probable alignments, as is done in IBM models 3 and above (Brown et al. , 1993), and weight counts based on the likelihood of each alignment sampled under the current probability model.
W97-0405 256 86:166 Pure statistical machine translation (Brown et al. , 1993) mltst in principle recover the most probable alignment out of all possible alignments between the input and a translation.
D08-1026 257 67:195 The words with the highest association probabilities are chosen as acquired words for entity e. 4.1 Base Model I Using the translation model I (Brown et al., 1993), where each word is equally likely to be aligned with each entity, we have p(w|e) = 1(l + 1)m mproductdisplay j=1 lsummationdisplay i=0 p(wj|ei) (1) where l and m are the lengths of entity and word sequences respectively.
D08-1026 258 70:195 4.2 Base Model II Using the translation model II (Brown et al., 1993), where alignments are dependent on word/entity positions and word/entity sequence lengths, we have p(w|e) = mproductdisplay j=1 lsummationdisplay i=0 p(aj = i|j,m,l)p(wj|ei) (2) where aj = i means that wj is aligned with ei.
C08-1014 259 36:197 By introducing the hidden word alignment variable a (Brown et al., 1993), the optimal translation can be searched for based on the following criterion: * 1 , arg max( ( , , )) M mm m ea eh = = efa (1) where is a string of phrases in the target language, e f fa is the source language string of phrases, he are feature functions, weights (, , ) m m are typically optimized to maximize the scoring function (Och, 2003).
W05-0829 260 5:80 1 Introduction In recent years, various phrase translation approaches (Marcu and Wong, 2002; Och et al. , 1999; Koehn et al. , 2003) have been shown to outperform word-to-word translation models (Brown et al. , 1993).
W05-0836 261 10:153 Using the log-linear form to model p(e|f) gives us the flexibility to introduce overlapping features that can represent global context while decoding (searching the space of candidate translations) and rescoring (ranking a set of candidate translations before performing the argmax operation), albeit at the cost of the traditional source-channel generative model of translation proposed in (Brown et al. , 1993).
W02-1405 262 91:147 Among them, (Brown et al. , 1993a) have proposed a way to exploit bilingual dictionnaries at training time.
W02-1405 263 28:147 2 Our statistical engine 2.1 The statistical models In this study, we built an SMT engine designed to translate from French to English, following the noisy-channel paradigm flrst described by (Brown et al. , 1993b).
P06-1098 264 7:74 1 Introduction In a classical statistical machine translation, a foreign language sentence f J1 = f1, f2, fJ is translated into another language, i.e. English, eI1 = e1, e2,, eI by seeking a maximum likely solution of: eI1 = argmax eI1 Pr(eI1|f J1 ) (1) = argmax eI1 Pr( f J1|eI1)Pr(eI1) (2) The source channel approach in Equation 2 independently decomposes translation knowledge into a translation model and a language model, respectively (Brown et al. , 1993).
C98-2157 265 30:124 The application of this algorithm to the basic problem using a parallel bilingual corpus aligned on the sentence level is described in (Brown et al., 1993).
C98-2157 266 23:124 In the refined model 2 (Brown et al., 1993) alignment probabilities a(itj ,l, m) are included to model the effect that the position of a word influences the position of its translation.
W06-3125 267 26:145 Based on IBM Model 1 lexical parameters(Brown et al. , 1993), providing a complementary probability for each tuple in the translation table.
C02-1011 268 24:167 Related Work 2.1 Translation with Non-parallel Corpora A straightforward approach to word or phrase translation is to perform the task by using parallel bilingual corpora (e.g. , Brown et al, 1993).
P06-2107 269 30:185 The methodology used (Brown et al. , 1993) is based on the definition of a function Pr(tI1|sJ1) that returns the probability that tI1 is a 835 source Transferir documentos explorados a otro directorio interaction-0 Move documents scanned to other directory interaction-1 Move s canned documents to other directory interaction-2 Move scanned documents to a nother directory interaction-3 Move scanned documents to another f older acceptance Move scanned documents to another folder Figure 1: Example of CAT system interactions to translate the Spanish source sentence into English.
P06-2107 270 52:185 Models of this kind assume that an input word is generated by only one output word (Brown et al. , 1993).
P06-2107 271 72:185 These alignments can be obtained from single-word models (Brown et al. , 1993) using the available public software GIZA++ (Och and Ney, 2003).
P07-1108 272 8:179 1 Introduction For statistical machine translation (SMT), phrasebased methods (Koehn et al. , 2003; Och and Ney, 2004) and syntax-based methods (Wu, 1997; Alshawi et al. 2000; Yamada and Knignt, 2001; Melamed, 2004; Chiang, 2005; Quick et al. , 2005; Mellebeek et al. , 2006) outperform word-based methods (Brown et al. , 1993).
P07-3010 273 14:179 Similarity measures can be based on any level of linguistic analysis: semantic similarity relies on context vectors(Rapp, 1999), whilesyntacticsimilarityisbased on the alignment of parallel corpora (Brown et al. , 1993).
P07-1090 274 15:208 In pursuit of better translation, phrase-based models (OchandNey,2004)havesignificantlyimprovedthe quality over classical word-based models (Brown et al. , 1993).
W08-0333 275 106:168 203 Estimating the parameters for these models is more difficult (and more computationally expensive) than with the models considered in the previous section: rather than simply being able to count the word pairs and alignment relationships and estimate the models directly, we must use an existing model to compute the expected counts for all possible alignments, and then use these counts to update the new model.7 This training strategy is referred to as expectationmaximization (EM) and is guaranteed to always improve the quality of the prior model at each iteration (Brown et al., 1993; Dempster et al., 1977).
W08-0333 276 100:168 The IBM models, together with a Hidden Markov Model (HMM), form a class of generative models that are based on a lexical translation model P(fj|ei) where each word fj in the foreign sentence fm1 is generated by precisely one word ei in the sentence el1, independently of the other translation decisions (Brown et al., 1993; Vogel et al., 1996; Och and Ney, 2000).
W07-0717 277 35:158 To derive the joint counts c(?s,?t) from which p(?s|?t) and p(?t|?s) are estimated, we use the phrase induction algorithm described in (Koehn et al. , 2003), with symmetrized word alignments generated using IBM model 2 (Brown et al. , 1993).
W03-0304 278 5:87 Yet, the very nature of these alignments, as defined in the IBM modeling approach (Brown et al. , 1993), lead to descriptions of the correspondences between sourcelanguage (SL) and target-language (TL) words of a translation that are often unsatisfactory, at least from a human perspective.
P04-1083 279 162:203 This kind of synchronizer stands in contrast to more ad-hoc approaches (e.g. , Matsumoto, 1993; Meyers, 1996; Wu, 1998; Hwa et al. , 2002).
P04-1083 280 141:203 Bootstrapping a PMTG from a lower-dimensional PMTG and a word-to-word translation model is similar in spirit to the way that regular grammars can help to estimate CFGs (Lari & Young, 1990), and the way that simple translation models can help to bootstrap more sophisticated ones (Brown et al. , 1993).
W01-1413 281 116:118 One interesting approach to extending the current system is to introduce a statistical translation model (Brown et al. , 1993) to filter out irrelevant translation candidates and to extract the most appropriate subpart from a long English sequence as the translation by locally aligning the Japanese and English sequences.
W06-3105 282 13:193 Specifically, in the task of word alignment, heuristic approaches such as the Dice coefficient consistently underperform their re-estimated counterparts, such as the IBM word alignment models (Brown et al. , 1993).
P06-1011 283 42:172 The first one, GIZA-Lex, is obtained by running the GIZA++2 implementation of the IBM word alignment models (Brown et al. , 1993) on the initial parallel corpus.
W02-1039 284 43:151 When an S alignment exists, there will always also exist a P alignment such that P a65 S. The English sentences were parsed using a state-of-the-art statistical parser (Charniak, 2000) trained on the University of Pennsylvania Treebank (Marcus et al. , 1993).
W02-1039 285 6:151 The first work in SMT, done at IBM (Brown et al. , 1993), developed a noisy-channel model, factoring the translation process into two portions: the translation model and the language model.
P06-1009 286 11:213 Most current SMT systems (Och and Ney, 2004; Koehn et al. , 2003) use a generative model for word alignment such as the freely available GIZA++ (Och and Ney, 2003), an implementation of the IBM alignment models (Brown et al. , 1993).
W06-2402 287 5:216 1 Introduction Statistical machine translation (SMT) was originally focused on word to word translation and was based on the noisy channel approach (Brown et al. , 1993).
C96-1037 288 15:214 The resolution of alignment can vat3, from low to high: section, paragraph, sentence, phrase, and word (Gale and Church 1993; Matsumoto et al. 1993).
C96-1037 289 66:214 (McArthur 1992; Mei et al. 1993) Classification allows a word to align with a target word using the collective translation tendency of words in the same class.
N04-4003 290 6:102 1 Introduction The statistical machine translation framework (SMT) formulates the problem of translating a sentence from a source language S into a target language T as the maximization problem of the conditional probability: TM LM = argmaxT p(SjT) p(T), (1) where p(SjT) is called a translation model (TM), representing the generation probability from T into S, p(T) is called a language model (LM) and represents the likelihood of the target language (Brown et al. , 1993).
I08-1013 291 59:143 4 Pattern switching The compositional translation presents problems which have been reported by (Baldwin and Tanaka, 2004; Brown et al., 1993): Fertility SWTs and MWTs are not translated by a term of a same length.
N06-1002 292 152:217 = = = = )(),( InverseM1 )(),( DirectM1 )(),( InverseMLE )(),( DirectMLE )|(),,( )|(),,( )(*, ),(),,(,*)( ),(),,( Atreelets s t Atreelets t s Atreelets Atreelets tspATSf stpATSf c cATSf c cATSf We use word probability tables p(t | s) and p(s | t) estimated by IBM Model 1 (Brown et al. 1993).
D08-1082 293 176:235 9.1 Training Methodology Given a training set, we first run a variant of IBM alignment model 1 (Brown et al., 1993) for 100 iterations, and then initialize Model I with the learned parameter values.
D08-1082 294 30:235 It acquires a set of synchronous lexical entries by running the IBM alignment model (Brown et al., 1993) and learns a log-linear model to weight parses.
C96-1067 295 33:98 IIowever, (Dagan et al. , 1993) have shown that knowledge of target-text length is not crucial to the model's i)ertbrmanee.
C96-1067 296 10:98 This conclusion is supported by the fact that true IMT is not, to our knowledge, used in most modern translator's support environments, eg (Eurolang, 1995; I,'rederking et al. , 1993; IBM, 1995; Kugler et al. , 1991; Nirenburg, 1992; ~li'ados, 1995).
P04-1064 297 22:189 Although the first three are particular cases where N=1 and/or M=1, the distinction is relevant, because most word-based translation models (eg IBM models (Brown et al. , 1993)) can typically not accommodate general M-N alignments.
P04-1064 298 33:189 Note that our use of cepts differs slightly from that of (Brown et al. , 1993, sec.3), inasmuch cepts may not overlap, according to our definition.
P04-1064 299 9:189 Obtaining a word-aligned corpus usually involves training a word-based translation models (Brown et al. , 1993) in each directions and combining the resulting alignments.
P06-1091 300 191:210 5 Discussion and Future Work The work in this paper substantially differs from previous work in SMT based on the noisy channel approach presented in (Brown et al. , 1993).
P06-2065 301 231:288 Machine translation has code-like characteristics, and indeed, the initial models of (Brown et al. , 1993) took a word-substitution/transposition approach, trained on a parallel text.
P06-2065 302 211:288 Learned vowels include (in order of generation probability): e, a, o, u, i, y. Learned sonorous consonants include: n, s, r, l, m. Learned non-sonorous consonants include: d, c, t, l, b, m, p, q. The model bootstrapping is good for dealing with too many parameters; we see a similar approach in Brown et als (1993) march from Model 1 to Model 5.
P06-2065 303 101:288 This is similar to Model 3 of (Brown et al. , 1993), but without null-generated elements or re-ordering.
P06-2065 304 7:288 Such methods have also been a key driver of progress in statistical machine translation, which depends heavily on unsupervised word alignments (Brown et al. , 1993).
I05-2014 305 10:145 2 Overview 2.1 The word segmentation problem As statistical machine translation systems basically rely on the notion of words through their lexicon models (BROWN et al. , 1993), they are usually capable of outputting sentences already segmented into words when they translate into languages like Chinese or Japanese.
C04-1032 306 13:193 Using the IBM translation models IBM-1 to IBM-5 (Brown et al. , 1993), as well as the Hidden-Markov alignment model (Vogel et al. , 1996), we can produce alignments of good quality.
C04-1032 307 27:193 A detailed description of the popular translation/alignment models IBM-1 to IBM-5 (Brown et al. , 1993), as well as the Hidden-Markov alignment model (HMM) (Vogel et al. , 1996) can be found in (Och and Ney, 2003).
C04-1032 308 11:193 Word alignment models were first introduced in statistical machine translation (Brown et al. , 1993).
C04-1032 309 164:193 6 Related Work A description of the IBM models for statistical machine translation can be found in (Brown et al. , 1993).
C04-1032 310 22:193 They are based on the sourcechannel approach to statistical machine translation (Brown et al. , 1993).
P07-1004 311 36:233 These lists are rescored with the following models: (a) the different models used in the decoder which are described above, (b) two different features based on IBM Model 1 (Brown et al. , 1993), (c) posterior probabilities for words, phrases, n-grams, and sentence length (Zens and Ney, 2006; Ueffing and Ney, 2007), all calculated over the Nbest list and using the sentence probabilities which the baseline system assigns to the translation hypotheses.
W01-1405 312 26:83 In order to minimize the number of decision errors at the sentence level, we have to choose the sequence of target words eI1 according to the equation (Brown et al. 1993): eI1 = argmax eI1 n Pr(eI1jfJ1 ) o = argmax eI1 n Pr(eI1)Pr(fJ1 jeI1) o : Here, the posterior probability Pr(eI1jfJ1 ) is decomposed into the language model probability Pr(eJ1) and the string translation probability Pr(fJ1 jeI1).
W01-1405 313 60:83 3 Experimental Results Whereas stochastic modelling is widely used in speech recognition, there are so far only a few research groups that apply stochastic modelling to language translation (Berger et al. 1994; Brown et al. 1993; Knight 1999).
W01-1405 314 47:83 Models describing these types of dependencies are referred to as alignment mappings (Brown et al. 1993): alignment mapping: j ! i = aj ; which assigns a source word fj in position j to a target word ei in position i = aj.
W01-1405 315 48:83 As a result, the string translation probability can be decomposed into a lexicon probability and an alignment probability (Brown et al. 1993).
P06-2103 316 29:271 (A similar intuition holds for the Machine Translation models generically known as the IBM models (Brown et al. , 1993), which assume that certain words in a source language sentence tend to trigger the usage of certain words in a target language translation of that sentence.)
N06-2051 317 8:59 Lexical relationships under the standard IBM models (Brown et al. , 1993) do not account for many-to-many mappings, and phrase extraction relies heavily on the accuracy of the IBM word-toword alignment.
P08-1012 318 44:198 The traditional estimation method for word 98 alignment models is the EM algorithm (Brown et al., 1993) which iteratively updates parameters to maximize the likelihood of the data.
P95-1034 319 57:264 However, compositional approaches to lexical choice have been successful whenever detailed representations of lexical constraints can be collected and entered into the lexicon (e.g. , (Elhadad, 1993; Kukich et al. , 1994)).
P95-1034 320 259:264 This approach addresses the problematic aspects of both pure knowledge-based generation (where incomplete knowledge is inevitable) and pure statistical bag generation (Brown et al. , 1993) (where the statistical system has no linguistic guidance).
W06-1628 321 6:291 1 Introduction Phrase-based approaches (Och and Ney, 2004) to statistical machine translation (SMT) have recently achieved impressive results, leading to significant improvements in accuracy over the original IBM models (Brown et al. , 1993).
W05-0612 322 40:219 Our methods are most influenced by IBMs Model 1 (Brown et al. , 1993).
W05-0810 323 20:106 First, we considered single sentences as documents, and tokens as sentences (we define a token as a sequence of characters delimited by 1In our case, the score we seek to globally maximize by dynamic programming is not only taking into account the length criteria described in (Gale and Church, 1993) but also a cognate-based one similar to (Simard et al. , 1992).
W05-0810 324 6:106 When efficient techniques have been proposed (Brown et al. , 1993; Och and Ney, 2003), they have been mostly evaluated on safe pairs of languages where the notion of word is rather clear.
W03-0313 325 176:187 These methods are based on IBM statistical translation Model 2 (Brown et al. , 1993), but take advantage of certain characteristics of the segments of text that can typically be extracted from translation memories.
W01-1409 326 94:186 We trained IBM Translation Model 4 (Brown et al. , 1993) both on our corpus alone and on the augmented corpus, using the EGYPT toolkit (Knight et al. , 1999; Al-Onaizan et al. , 1999), and then translated a number of texts using different translation models and different transfer methods, namely glossing (replacing each Tamil word by the most likely candidate from the translation tables created with the EGYPT toolkit) and Model 4 decoding (Brown et al. , 1995; Germann et al. , 2001).
W01-1409 327 112:186 input pegging a ?transfer correct partially correct b incorrect 1 raw no M4 decoding c 7 4 4 2 stemmed yes M4 decoding 8 3 4 3 stemmed no M4 decoding 13 2 0 4 raw no gloss 13 1 1 5a stemmed yes gloss 8 3 4 5b stemmed yes gloss 12 2 1 6 stemmed no gloss 11 2 2 a pegging causes the training algorithm to consider a larger search space b correct top level category but incorrect sub-category c translation by maximizing the IBM Model 4 probability of the source/translation pair (Brown et al. , 1993; Brown et al. , 1995) classification might be performed by automatic procedures rather than humans.
W01-1409 328 109:186 (1993); Brown et al.
C98-2225 329 31:179 Estimation of the parameters has been described elsewhere (Brown et al., 1993).
C98-2225 330 41:179 'Various models have been constructed by the IBM team (Brown et al., 1993).
H05-1095 331 8:253 While in traditional word-based statistical models (Brown et al. , 1993) the atomic unit that translation operates on is the word, phrase-based methods acknowledge the significant role played in language by multiword expressions, thus incorporating in a statistical framework the insight behind Example-Based Machine Translation (Somers, 1999).
H05-1095 332 63:253 757 hbps strong tendency to overestimate the probability of rare bi-phrases; it is computed as in equation (2), except that bi-phrase probabilities are computed based on individual word translation probabilities, somewhat as in IBM model 1 (Brown et al. , 1993): Pr(t|s) = 1|s||t| productdisplay tt summationdisplay ss Pr(t|s) The target language feature function htl: this is based on a N-gram language model of the target language.
C04-1051 333 110:175 Giza++ is a freely available implementation of IBM Models 1-5 (Brown et al. 1993) and the HMM alignment (Vogel et al. 1996), along with various improvements and modifications motivated by experimentation by Och & Ney (2000).
C98-1066 334 107:121 This approach has also been used by (Dagan and Itai, 1994; Gale et al., 1992; Shiitze, 1992; Gale et al., 1993; Yarowsky, 1995; Gale and Church, 1Lunar is not an unknown word in English, Yeltsin finds its translation in the 4-th candidate.
C98-1066 335 7:121 In the years since the appearance of the first papers on using statistical models for bilingual lexicon compilation and machine translation(Brown et al., 1993; Brown et al., 1991; Gale and Church, 1993; Church, 1993; Simard et al., 1992), large amount of human effort and time has been invested in collecting parallel corpora of translated texts.
C98-1066 336 109:121 Some of the early statistical terminology translation methods are (Brown et al., 1993; Wu and Xia, 1994; Dagan and Church, 1994; Gale and Church, 1991; Kupiec, 1993; Smadja et al., 1996; Kay and RSscheisen, 1993; Fung and Church, 1994; Fhmg, 1995b).
W06-3602 337 117:191 4), it constitutes a bijection between source and target sentence positions, since the intersecting alignments are functions according to their definition in (Brown et al. , 1993) 3.
P06-1097 338 7:187 We rst recast the problem of estimating the IBM models (Brown et al. , 1993) in a discriminative framework, which leads to an initial increase in word-alignment accuracy.
P06-1097 339 4:187 1 Introduction The most widely applied training procedure for statistical machine translation IBM model 4 (Brown et al. , 1993) unsupervised training followed by post-processing with symmetrization heuristics (Och and Ney, 2003) yields low quality word alignments.
P06-1097 340 96:187 4 Semi-Supervised Training for Word Alignments Intuitively, in approximate EM training for Model 4 (Brown et al. , 1993), the E-step corresponds to calculating the probability of all alignments according to the current model estimate, while the M-step is the creation of a new model estimate given a probability distribution over alignments (calculated in the E-step).
W06-1626 341 8:181 1 Introduction Statistical language modeling has been widely used in natural language processing applications such as Automatic Speech Recognition (ASR), Statistical Machine Translation (SMT) (Brown et al. , 1993) and Information Retrieval (IR) (Ponte and Croft, 1998).
W07-0705 342 34:123 In the original work (Brown et al. , 1993) the posterior probability p(eI1|fJ1 ) is decomposed following a noisy-channel approach, but current stateof-the-art systems model the translation probability directly using a log-linear model(Och and Ney, 2002): p(eI1|fJ1 ) = exp parenleftBigsummationtextM m=1 mhm(e I1,fJ1 ) parenrightBig summationdisplay ?eI1 exp parenleftBigsummationtextM m=1 mhm(?eI1,fJ1 ) parenrightBig, (2) with hm different models, m scaling factors and the denominator a normalization factor that can be ignored in the maximization process.
P05-2016 343 8:31 The first work on SMT done at IBM (Brown et al. , 1990; Brown et al. , 1992; Brown et al. , 1993; Berger et al. , 1994), used a noisy-channel model, resulting in what Brown et al.
P01-1026 344 94:205 P (d) P L (d) (4) Statistical approaches to language modeling have been used in much NLP research, such as machine translation (Brown et al. , 1993) and speech recognition (Bahl et al. , 1983).
E06-1020 345 17:194 The implementation of MEBA was strongly influenced by the notorious five IBM models described in (Brown et al. 1993).
N04-1008 346 97:175 The mapping of answer terms to question terms is modeled using Black et al.s (1993) simplest model, called IBM Model 1.
N04-1008 347 77:175 For comparison purposes, we consider two different algorithms for our AnswerExtraction module: one that does not bridge the lexical chasm, based on N-gram cooccurrences between the question terms and the answer terms; and one that attempts to bridge the lexical chasm using Statistical Machine Translation inspired techniques (Brown et al. , 1993) in order to find the best answer for a given question.
W03-1003 348 18:162 Specifically, stochastic translation lexicons estimated using the IBM method (Brown et al. , 1993) from a fairly large sentence-aligned Chinese-English parallel corpus are used in their approach a considerable demand for a resourcedeficient language.
P03-2017 349 78:98 He then goes on to adapt the conventional noisy channel MT model of [Brown et al 1993] to NLU, where extracting a semantic representation from an input text corresponds to finding: argmax(Sem) {p(Input|Sem) p(Sem)}, where p(Sem) is a model for generating semantic representations, and p(Input|Sem) is a model for the relation between semantic representations and corresponding texts.
P03-2017 350 68:98 4, we see strong parallels between TransType and ITU: language model enumerating word sequences vs 4 Initially statistical MT used a noisy-channel approach [Brown et al. 1993]; but recently [Och and Ney 2002] have introduced a more general framework based on the maximum-entropy principle, which shows nice prospects in terms of flexibility and learnability.
E06-1019 351 7:254 The task originally emerged as an intermediate result of training the IBM translation models (Brown et al. , 1993).
E06-1019 352 46:254 The IBM models (Brown et al. , 1993) search a version of permutation space with a one-to-many constraint.
E06-1019 353 32:254 Alignment spaces can emerge from generative stories (Brown et al. , 1993), from syntactic notions (Wu, 1997), or they can be imposed to create competition between links (Melamed, 2000).
P93-1002 354 210:211 The natural next step in sentence alignment is to account for word ordering in the translation model, e.g., the models described in (Brown et al. , 1993) could be used.
P93-1002 355 100:211 11 However, modeling word order under translation is notoriously difficult (Brown et al. , 1993), and it is unclear how much improvement in accuracy a good model of word order would provide.
P95-1033 356 8:198 Aside from purely linguistic interest, bracket structure has been empirically shown to be highly effective at constraining subsequent training of, for example, stochastic context-free grammars (Pereira & ~ 1992; Black et al. 1993).
P95-1033 357 123:198 Since Chinese text is not orthographically separated into words, the standard methodology is to first preproce~ input texts through a segmentation module (Chiang et al. 1992; Linet al. 1992; Chang & Chert 1993; Linet al. 1993; Wu & Tseng 1993; Sproat et al. 1994).
P95-1033 358 6:198 1 Introduction Parallel corpora have been shown to provide an extremely rich source of constraints for statistical analysis (e.g. , Brown et al. 1990; Gale & Church 1991; Gale et al. 1992; Church 1993; Brown et al. 1993; Dagan et al. 1993; Dagan & Church 1994; Fung & Church 1994; Wu & Xia 1994; Fung & McKeown 1994).
P95-1033 359 98:198 A simpler, related idea of penalizing distortion from some ideal matching pattern can be found in the statistical translation (Brown et al. 1990; Brown et al. 1993) and word alignment (Dagan et al. 1993; Dagan & Church 1994) models.
P06-1067 360 10:241 N-gram language models have also been used in Statistical Machine Translation (SMT) as proposed by (Brown et al. , 1990; Brown et al. , 1993).
P06-1067 361 66:241 Distortion models were first proposed by (Brown et al. , 1993) in the so-called IBM Models.
P04-3005 362 48:91 For the results in this paper, we have used Pointwise Mutual Information (PMI) instead of IBM Model 1 (Brown et al. , 1993), since (Rogati and Yang, 2004) found it to be as effective on Springer, but faster to compute.
J06-4004 363 43:388 According to our experience, the best performance is achieved when the union of the source-to-target and target-to-source alignment sets (IBM models; Brown et al. [1993]) is used for tuple extraction (some experimental results regarding this issue are presented in Section 4.2.2).
J06-4004 364 149:388 This feature is implemented by using the IBM-1 lexical parameters (Brown et al. 1993; Och et al. 2004).
J06-4004 365 11:388 For these first SMT systems, translation-model probabilities at the sentence level were approximated from word-based translation models that were trained by using bilingual corpora (Brown et al. 1993).
J06-4004 366 306:388 More specifically, the latter system uses the IBM-1 lexical parameters (Brown et al. 1993) for computing the translation probabilities of two possible new tuples: the one resulting when the null-aligned-word is attached to Table 6 Evaluation results for experiments on n-gram size incidence.
J06-4004 367 9:388 The first SMT systems were developed in the early nineties (Brown et al. 1990, 1993).
C02-1050 368 6:171 According to the Bayes Rule, the problem is transformed into the noisy channel model paradigm, where the translation is the maximum a posteriori solution of a distribution for a channel target text given a channel source text and a prior distribution for the channel source text (Brown et al. , 1993).
P06-2093 369 44:213 2 Statistical Translation Engine A word-based translation engine is used based on the so-called IBM-4 model (Brown et al. , 1993).
P06-2093 370 11:213 The classical Bayes relation is used to introduce a target language model (Brown et al. , 1993): e = argmaxe Pr(e|f) = argmaxe Pr(f|e)Pr(e) where Pr(f|e) is the translation model and Pr(e) is the target language model.
P06-2061 371 18:217 In (Brown et al. , 1994), the authors proposed a method to integrate the IBM translation model 2 (Brown et al. , 1993) with an ASR system.
P06-2061 372 161:217 4.7 Fertility-Based Transducer In (Brown et al. , 1993), three alignment models are described that include fertility models, these are IBM Models 3, 4, and 5.
P06-2061 373 90:217 We rescore the ASR N-best lists with the standard HMM (Vogel et al. , 1996) and IBM (Brown et al. , 1993) MT models.
W07-0724 374 23:91 These lists are rescored with the different models described above, a character penalty, and three different features based on IBM Models 1 and 2 (Brown et al. , 1993) calculated in both translation directions.
C02-1002 375 23:152 The assumptions we made were the following: a lexical token in one half of the translation unit (TU) corresponds to at most one non-empty lexical unit in the other half of the TU; this is the 1:1 mapping assumption which underlines the work of many other researchers (Ahrenberg et al (2000), Brew and McKelvie (1996), Hiemstra (1996), Kay and Rscheisen (1993), Tiedmann (1998), Melamed (2001) etc); a polysemous lexical token, if used several times in the same TU, is used with the same meaning; this assumption is explicitly used by Gale and Church (1991), Melamed (2001) and implicitly by all the previously mentioned authors; a lexical token in one part of a TU can be aligned to a lexical token in the other part of the TU only if the two tokens have compatible types (part-of-speech); in most cases, compatibility reduces to the same POS, but it is also possible to define other compatibility mappings (e.g. participles or gerunds in English are quite often translated as adjectives or nouns in Romanian and vice-versa); although the word order is not an invariant of translation, it is not random either (Ahrenberg et al (2000)); when two or more candidate translation pairs are equally scored, the one containing tokens which are closer in relative position are preferred.
P98-2162 376 20:119 The simple model 1 (Brown et al. , 1993) for the translation of a SL sentence d = dldt in a TL sentence e = el em assumes that every TL word is generated independently as a mixture of the SL words: m l P(e\[d),,~ H ~ t(ej\[di) (2) j=l i=O In the equation above t(ej\[di) stands for the probability that ej is generated by di.
P98-2162 377 29:119 The application of this algorithm to the basic problem using a parallel bilingual corpus aligned on the sentence level is described in (Brown et al. , 1993).
P98-2162 378 22:119 In the refined model 2 (Brown et al. , 1993) alignment probabilities a(ilj, l, m) are included to model the effect that the position of a word influences the position of its translation.
N03-1019 379 7:163 The ATTM attempts to overcome the deficiencies of word-to-word translation models (Brown et al. , 1993) through the use of phrasal translations.
P98-2230 380 31:182 Estimation of the parameters has been described elsewhere (Brown et al. , 1993).
P98-2230 381 41:182 I Various models have been constructed by the IBM team (Brown et al. , 1993).
W05-1208 382 68:134 This differs from typical generative settings for IR and MT (Ponte and croft, 1998; Brown et al. , 1993), where all conditioned events are disjoint by construction.
W05-1208 383 81:134 Alternatively, one can view (2) as inducing an alignment between terms in the h to the terms in the t, somewhat similar to alignment models in statistical MT (Brown et al. , 1993).
H05-1023 384 10:217 Word alignments traditionally are based on IBM Models 1-5 (Brown et al. , 1993) or on HMMs (Vogel et al. , 1996).
W07-0403 385 27:234 This alignment system is powered by the IBM translation models (Brown et al. , 1993), in which one sentence generates the other.
W05-0817 386 9:109 A quite different approach from our hypotheses testing implemented in the TREQ-AL aligner is taken by the model-estimating aligners, most of them relying on the IBM models (1 to 5) described in the (Brown et al. 1993) seminal paper.
W05-0817 387 3:109 The first one is a hypotheses testing approach (Gale and Church, 1991; Melamed, 2001; Tufi 2002) while the second one is closer to a model estimating approach (Brown et al. , 1993; Och and Ney, 2000).
I08-2104 388 55:133 In this paper, we use IBM model 1 (Brown et al., 1993) in order to get the probability P(Q|DA) as follows.
I08-2104 389 13:133 In the proposed method, the statistical machine translation (SMT) (Brown et al., 1993) is deeply incorporated into the question answering process, instead of using the SMT as the preprocessing before the mono-lingual QA process as in the previous work.
W05-0823 390 50:86 Finally, the fourth and fifth feature functions corresponded to two lexicon models based on IBM Model 1 lexical parameters p(t|s) (Brown et al. , 1993).
W05-0823 391 4:86 1 Introduction During the last decade, statistical machine translation (SMT) systems have evolved from the original word-based approach (Brown et al. , 1993) into phrase-based translation systems (Koehn et al. , 2003).
P06-2111 392 78:233 For the word alignment, we apply standard techniques derived from statistical machine translation using the well-known IBM alignment models (Brown et al. , 1993) implemented in the opensource tool GIZA++ (Och, 2003).
P06-2111 393 150:233 For this we used two resources: CELEX a linguistically annotated dictionary of English, Dutch and German (Baayen et al. , 1993), and the Dutch snowball stemmer implementing a suf x stripping algorithm based on the Porter stemmer.
W04-0857 394 106:120 To solve this problem, we will adapt the idea of null generated words from machine translation (Brown et al. , 1993).
P05-1066 395 30:229 2 Background 2.1 Previous Work 2.1.1 Research on Phrase-Based SMT The original work on statistical machine translation was carried out by researchers at IBM (Brown et al. , 1993).
P05-1066 396 9:229 These methods go beyond the original IBM machine translation models (Brown et al. , 1993), by allowing multi-word units (phrases) in one language to be translated directly into phrases in another language.
W01-1410 397 62:111 We based our design on the IBM models 1 and 2 (Brown et al. , 1993), but taking into account that our model must generate correct derivations in a given grammar, not any seBEGIN some END eat (a) "some a88 animalsa89 eat a88 animalsa89 " BEGIN some END eat are dangerous (b) "some a88 animalsa89 are dangerous" BEGIN some END eat are dangerous (c) "a88 animalsa89 are dangerous" BEGIN snakes rats people some END eat are snakes rats people dangerous (d) Expansion of a88 animalsa89 Figure 3: Using a category a86 animalsa87 for "snakes", "rats" and "people" in the example of Figure 1.
W01-1410 398 85:111 We carefully implemented the original Grammar Association system described in (Vidal et al. , 1993), tuned empirically a couple of smoothing parameters, trained the models and, finally, obtained an a119a21a120 a100 a104a122a121 of correct translations.9 Then, we studied the impact of: (1) sorting, as proposed in Section 3, the set of sentences presented to ECGI; (2) making language models deterministic and minimum; (3) constraining the best translation search to those sentences whose lengths have been seen, in the training set, related to the length of the input sentence.
W01-1410 399 100:111 6 Concluding remarks Our work presents a set of improvements on previous state of the art of Grammar Association: first, by providing better language models to the original system described in (Vidal et al. , 1993); second, by setting the technique into a rigorous statistical framework, clarifying which kind of probabilities have to be estimated by association models; third, by developing a novel and especially adequate association model: Loco C. On the other hand, though experimental results are quite good, we find them particularly relevant for pointing out directions to follow for further improvement of the Grammar Association technique.
W01-1410 400 16:111 However, in the Grammar Association context, when developing (using Bayes decomposition) the basic equations of the system presented in (Vidal et al. , 1993), it is said that the reverse model for a28 a13a37a3a38a5a39a32a21a0a35a7 does not seem to admit a simple factorization which is also correct and convenient, so crude heuristics were adopted in the mathematical development of the expression to be maximized.
W01-1410 401 30:111 Moreover, it was (without imposing determinism) the inference technique employed in (Vidal et al. , 1993).
W01-1208 402 114:148 P(d) P L (d) (4) Statistical approaches to language modeling have been used in much NLP research, such as machine translation (Brown et al. , 1993) and speech recognition (Bahl et al. , 1983).
C04-1031 403 29:147 Estimated clues are derived from the parallel data using, for example, measures of co-occurrence (e.g. the Dice coefficient (Smadja et al. , 1996)), statistical alignment models (e.g. IBM models from statistical machine translation (Brown et al. , 1993)), or string similarity measures (e.g. the longest common sub-sequence ratio (Melamed, 1995)).
C04-1031 404 8:147 (Brown et al. , 1993; Vogel et al. , 1996; Garca-Varea et al. , 2002; Ahrenberg et al. , 1998; Tiedemann, 1999; Tufis and Barbu, 2002; Melamed, 2000).
P01-1008 405 93:220 We also record for each token its derivational root, using the CELEX(Baayen et al. , 1993) database.
P01-1008 406 76:220 This characteristic of our corpus is similar to problems with noisy and comparable corpora (Veronis, 2000), and it prevents us from using methods developed in the MT community based on clean parallel corpora, such as (Brown et al. , 1993).
P01-1008 407 102:220 Examples of such contexts are verb-object relations and noun-modifier relations, which were traditionally used in word similarity tasks from non-parallel corpora (Pereira et al. , 1993; Hatzivassiloglou and McKeown, 1993).
C96-1040 408 90:171 (>\[" t, he EM algorit, hnt (Brown et al. 1993)(I)etrtt>stcr et al. 1977).
C96-1040 409 23:171 of the position infer marion of words at ltlat(;hillg pairs of sellte/lCeS, which turned out useful (Brown et al. 1993).
C96-1040 410 12:171 machine translation (Brown et al. 1993) but also in other applications such as word sense disanabiguation (Brown et al. 1991) and bilingnal lexicography (Klavans and Tzoukermann 1990).
J97-3002 411 13:359 Parallel bilingual corpora have been shown to provide a rich source of constraints for statistical analysis (Brown et al. 1990; Gale and Church 1991; Gale, Church, and Yarowsky 1992; Church 1993; Brown et al. 1993; Dagan, Church, and Gale 1993; Department of Computer Science, University of Science and Technology, Clear Water Bay, Hong Kong.
J97-3002 412 178:359 The usual Chinese NLP architecture first preprocesses input text through a word segmentation module (Chiang et al. 1992; Lin, Chiang, and Su 1992, 1993; Chang and Chen 1993; Wu and Tseng 1993; Sproat et al. 1994; Wu and Fung 1994), but, clearly, bilingual parsing will be hampered by any errors arising from segmentation ambiguities that could not be resolved in the isolated monolingual context because even if the Chinese segmentation is acceptable monolingually, it may not agree with the words present in the English sentence.
J97-3002 413 301:359 The later IBM models are formulated to prefer collocations (Brown et al. 1993).
N03-1010 414 3:227 1 Introduction Most of the current work in statistical machine translation builds on word replacement models developed at IBM in the early 1990s (Brown et al. , 1990, 1993; Berger et al. , 1994, 1996).
W05-0826 415 10:103 Far from full syntactic complexity, we suggest to go back to the simpler alignment methods first described by (Brown et al. , 1993).
C08-1128 416 27:207 Given a manually compiled lexicon containing words and their relative frequencies Ps(fprimej), the best segmentationfJ1 is the one that maximizes the joint probability of all words in the sentence, with the assumption that words are independent of each other1: fJ1 = argmax fprimeJprime1 Pr(fprimeJprime1 |cK1 ) argmax fprimeJprime1 Jprimeproductdisplay j=1 Ps(fprimej), where the maximization is taken over Chinese word sequences whose character sequence is cK1 . 2.2 Translation system Once we have segmented the Chinese sentences into words, we train standard alignment models in both directions with GIZA++ (Och and Ney, 2002) using models of IBM-1 (Brown et al., 1993), HMM (Vogel et al., 1996) and IBM-4 (Brown et al., 1993).
P03-1039 417 17:155 The next section briefly reviews the word alignment based statistical machine translation (Brown et al. , 1993).
P03-1039 418 8:155 The former term P(E) is called a language model, representing the likelihood of E. The latter term P(J|E) is called a translation model, representing the generation probability from E into J. As an implementation of P(J|E), the word alignment based statistical translation (Brown et al. , 1993) has been successfully applied to similar language pairs, such as FrenchEnglish and German English, but not to drastically dierent ones, such as JapaneseEnglish.
N06-1056 419 91:173 In this work, we use the GIZA++ implementation (Och and Ney, 2003) of IBM Model 5 (Brown et al. , 1993).
N06-1056 420 22:173 More specifically, a statistical word alignment model (Brown et al. , 1993) is used to acquire a bilingual lexicon consisting of NL substrings coupled with their translations in the target MRL.
C04-1005 421 12:181 For example, the statistical word alignment in IBM translation models (Brown et al. 1993) can only handle word to word and multi-word to word alignments.
C04-1005 422 6:181 1 Introduction Bilingual word alignment is first introduced as an intermediate result in statistical machine translation (SMT) (Brown et al. 1993).
C04-1005 423 30:181 2 Statistical Word Alignment Statistical translation models (Brown, et al. 1993) only allow word to word and multi-word to word alignments.
C04-1005 424 7:181 Besides being used in SMT, it is also used in translation lexicon building (Melamed 1996), transfer rule learning (Menezes and Richardson 2001), example-based machine translation (Somers 1999), etc. In previous alignment methods, some researches modeled the alignments as hidden parameters in a statistical translation model (Brown et al. 1993; Och and Ney 2000) or directly modeled them given the sentence pairs (Cherry and Lin 2003).
N06-1013 425 7:176 1 Introduction Word alignmentdetection of corresponding words between two sentences that are translations of each otheris usually an intermediate step of statistical machine translation (MT) (Brown et al. , 1993; Och and Ney, 2003; Koehn et al. , 2003), but also has been shown useful for other applications such as construction of bilingual lexicons, word-sense disambiguation, projection of resources, and crosslanguage information retrieval.
N06-1013 426 154:176 stance, the IBM models (Brown et al. , 1993) can be improved by adding more context dependencies into the translation model using a ME framework rather than using only p(f j |e i ) (Garcia-Varea et al. , 2002).
W07-0708 427 38:131 A monotonic segmentation copes with monotonic alignments, that is, j < k ??aj < ak following the notation of (Brown et al. , 1993).
C94-2175 428 67:134 Dynamic programming is applied to bilingual sentence alignment in most of previous works (Brown et al. , 1991; Gate and Church, 1993; Chen, 1993).
C94-2175 429 10:134 For example, sentence alignment of bilingual texts are performed just by measuring sentence lengths in words or in characters (Brown et al. , 1991; Gale and Church, 1993), or by statistically estimating word level correspondences (Chen, 1993; Kay and RSscheisen, 1993).
C94-2175 430 7:134 The statistical approach involves the following: alignment of bilingual texts at the sentence level nsing statistical techniques (e.g. Brown, Lai and Mercer (1991), Gale and Church (1993), Chen (1993), and Kay and RSscheisen (1993)), statistical machine translation models (e.g. Brown, Cooke, Pietra, Pietra et al.
C94-2175 431 17:134 Then, those structurally matched parallel sentences are used as a source for acquiring lexical knowledge snch as verbal case frames (Utsuro et al. , 1992; Utsuro et al. , 1993).
C94-2175 432 46:134 So fitr, we have implemented the following,: sentence ~dignment btLsed-on word correspondence information, word correspondence estimation by cooccnl'rence-ffequency-based methods in GMe mid Church (19.~H) and Kay and R6scheisen (1993), structured Imttehlng of parallel sentences (Matsumoto et a l. , 1993), and case Dame acquisition of Japanese verbs (Utsuro et al. , 1993).
W08-0307 433 136:224 The simple idea that words in a source chunk are typically aligned to words in a single possible target chunk is used to discard alignments which link words from 2We use IBM-1 to IBM-5 models (Brown et al., 1993) implemented with GIZA++ (Och and Ney, 2003).
P98-1069 434 7:117 In the years since the appearance of the first papers on using statistical models for bilingual lexicon compilation and machine translation(Brown et al. , 1993; Brown et al. , 1991; Gale and Church, 1993; Church, 1993; Simard et al. , 1992), large amount of human effort and time has been invested in collecting parallel corpora of translated texts.
P98-1069 435 103:117 This approach has also been used by (Dagan and Itai, 1994; Gale et al. , 1992; Shiitze, 1992; Gale et al. , 1993; Yarowsky, 1995; Gale and Church, 1Lunar is not an unknown word in English, Yeltsin finds its translation in the 4-th candidate.
P98-1069 436 105:117 Some of the early statistical terminology translation methods are (Brown et al. , 1993; Wu and Xia, 1994; Dagan and Church, 1994; Gale and Church, 1991; Kupiec, 1993; Smadja et al. , 1996; Kay and RSscheisen, 1993; Fung and Church, 1994; Fung, 1995b).
W06-3109 437 34:141 The most widely used single-word-based statistical alignment models (SAMs) have been proposed in (Brown et al. , 1993; Ney et al. , 2000).
P00-1041 438 21:138 The work reported in this paper is most closely related to work on statistical machine translation, particularly the IBM-style work on CANDIDE (Brown et al. , 1993).
N03-2036 439 6:78 1 Phrase-based Unigram Model Various papers use phrase-based translation systems (Och et al. , 1999; Marcu and Wong, 2002; Yamada and Knight, 2002) that have shown to improve translation quality over single-word based translation systems introduced in (Brown et al. , 1993).
P04-1022 440 9:200 These range from twoword to multi-word, with or without syntactic structure (Smadja 1993; Lin, 1998; Pearce, 2001; Seretan et al. 2003).
P04-1022 441 43:200 Most previous research in translation knowledge acquisition is based on parallel corpora (Brown et al. , 1993).
P04-1022 442 87:200 We have: )|(),|(),|( )|,,()|( 21 21 trictrictric trictritri erpercpercp ecrcpecp = = (6) Assumption 2: For an English triple tri e, assume that i c only depends on {1,2}) (i i e, and c r only depends on e r . Equation (6) is rewritten as: )|()|()|( )|(),|(),|()|( 2211 21 ec trietrictrictritri rrpecpecp erpercpercpecp = = (7) Notice that )|( 11 ecp and )|( 22 ecp are translation probabilities within triples, they are different from the unrestricted probabilities such as the ones in IBM models (Brown et al. , 1993).
P04-1022 443 20:200 Some studies have been done for acquiring collocation translations using parallel corpora (Smadja et al, 1996; Kupiec, 1993; Echizen-ya et al. , 2003).
P02-1038 444 19:155 If the language model Pr(eI1) = p (eI1) depends on parameters and the translation model Pr(fJ1 jeI1) = p (fJ1 jeI1) depends on parameters, then the optimal parameter values are obtained by maximizing the likelihood on a parallel training corpus fS1;eS1 (Brown et al. , 1993): = argmax SY s=1 p (fsjes) (3) = argmax SY s=1 p (es) (4) Computational Linguistics (ACL), Philadelphia, July 2002, pp.
P02-1038 445 10:155 1 perform the following maximization: eI1 = argmax eI1 fPr(eI1)Pr(fJ1 jeI1)g (2) This approach is referred to as source-channel approach to statistical MT. Sometimes, it is also referred to as the fundamental equation of statistical MT (Brown et al. , 1993).
P02-1052 446 26:169 Proceedings of the 40th Annual Meeting of the Association for (Brown et al. , 1990; Brown et al. , 1993), a number of other algorithms have been developed.
P97-1022 447 28:129 In earlier IBM translation systems (Brown et al. , 1993) each English word would be generated by, or "aligned to", exactly one formal language word.
P97-1022 448 4:129 This paper extends the IBM Machine Translation Group's concept of fertility (Brown et al. , 1993) to the generation of clumps for natural language understanding.
P97-1022 449 119:129 Model 1 is the word-pair translation model used in simple machine translation and understanding models (Brown et al. , 1993; Epstein et al. , 1996).
P03-1003 450 50:185 (see Brown et al. , 1993 for a detailed mathematical description of the model and the formula for computing the probability of an alignment and target string given a source string).
P03-1003 451 87:185 To help our model learn that it is desirable to copy answer words into the question, we add to each corpus a list of identical dictionary word pairs w iw i . For each corpus, we use GIZA (Al-Onaizan et al. , 1999), a publicly available SMT package that implements the IBM models (Brown et al. , 1993), to train a QA noisy-channel model that maps flattened answer parse trees, obtained using the cut procedure described in Section 3.1, into questions.
P03-1003 452 20:185 Being inspired by the success of noisy-channel-based approaches in applications as diverse as speech recognition (Jelinek, 1997), part of speech tagging (Church, 1988), machine translation (Brown et al. , 1993), information retrieval (Berger and Lafferty, 1999), and text summarization (Knight and Marcu, 2002), we develop a noisy channel model for QA.
N07-2010 453 63:98 Similar to work in image retrieval (Barnard et al. , 2003), we cast the problem in terms of Machine Translation: given a paired corpus of words and a set of video event representations to which they refer, we make the IBM Model 1 assumption and use the expectation-maximization method to estimate the parameters (Brown et al. , 1993): =+ = m j ajm jvideowordpl Cvideowordp 1 )|()1()|( (1) This paired corpus is created from a corpus of raw video by first abstracting each video into the feature streams described above.
J00-1004 454 228:231 At the same time, we believe our method has advantages over the approach developed initially at IBM (Brown et al. 1990; Brown et al. 1993) for training translation systems automatically.
J00-1004 455 122:231 Brown et al. 1993).
I08-1068 456 114:193 IBM Model1 (Brown et al., 1993) is a simplistic model which takes no account of the subtler aspects of language translation including the way word order tends to differ across languages.
I08-1068 457 112:193 5.1.2 Learning Translation Model According to the standard statistical translation model (Brown et al., 1993), we can find the optimal model M by maximizing the probability of generating queries from documents or M = argmax M NY i=1 P(QijDi;M) 524 qw dw P(qwjdw,u) journal kdd 0.0176 journal conference 0.0123 journal journal 0.0176 journal sigkdd 0.0088 journal discovery 0.0211 journal mining 0.0017 journal acm 0.0088 music music 0.0375 music purchase 0.0090 music mp3 0.0090 music listen 0.0180 music mp3.com 0.0450 music free 0.0008 Table 1: Sample user profile To find the optimal word translation probabilities P(qwjdw;M ), we can use the EM algorithm.
I08-1068 458 113:193 The details of the algorithm can be found in the literature for statistical translation models, such as (Brown et al., 1993).
P04-1063 459 29:208 ALM does this by using alignment models from the statistical machine translation literature (Brown et al. , 1993).
J96-1001 460 47:576 Related Work The recent availability of large amounts of bilingual data has attracted interest in several areas, including sentence alignment (Gale and Church 1991b; Brown, Lai, and Mercer 1991; Simard, Foster and Isabelle 1992; Gale and Church 1993; Chen 1993), word alignment (Gale and Church 1991a; Brown et al. 1993; Dagan, Church, and Gale 1993; Fung and McKeown 1994; Fung 1995b), alignment of groups of words (Smadja 1992; Kupiec 1993; van der Eijk 1993), and statistical translation (Brown et al. 1993).
P05-1033 461 9:249 The basic phrase-based model is an instance of the noisy-channel approach (Brown et al. , 1993),1 in which the translation of a French sentence f into an 1Throughout this paper, we follow the convention of Brown et al. of designating the source and target languages as French and English, respectively.
P07-1047 462 82:180 This situation is very similar to the training process of translation models in statistical machine translation (Brown et al. , 1993), where parallel corpus is used to find the mappings between words from different languages by exploiting their co-occurrence patterns.
P07-1047 463 94:180 Finally, the translation model can be formalized as the following optimization problem argmax logPr(D;) s.t. mwsummationdisplay j=1 Pr(wj|ok) = 1,k This optimization problem can be solved by the EM algorithm (Brown et al. , 1993).
W06-1609 464 8:175 1 Introduction During the last few years, SMT systems have evolved from the original word-based approach (Brown et al. , 1993) to phrase-based translation systems (Koehn et al. , 2003).
W06-1609 465 59:175 This feature, which is based on the lexical parameters of the IBM Model 1 (Brown et al. , 1993), provides a complementary probability for each tuple in the translation table.
W07-0412 466 20:166 Systems based on word-to-word lexicons, such as the IBM systems (Brown et al. , 1990; Brown et al. , 1993), incorporate further devices that allow reordering of words (a distortion model) and ranking of alternatives (a monolingual language model).
C08-1138 467 14:198 Based on these grammars, a great number of SMT models have been recently proposed, including string-to-string model (Synchronous FSG) (Brown et al., 1993; Koehn et al., 2003), tree-to-string model (TSG-string) (Huang et al., 2006; Liu et al., 2006; Liu et al., 2007), string-totree model (string-CFG/TSG) (Yamada and Knight, 2001; Galley et al., 2006; Marcu et al., 2006), tree-to-tree model (Synchronous CFG/TSG, Data-Oriented Translation) (Chiang, 2005; Cowan et al., 2006; Eisner, 2003; Ding and Palmer, 2005; Zhang et al., 2007; Bod, 2007; Quirk wt al., 2005; Poutsma, 2000; Hearne and Way, 2003) and so on.
W03-0413 468 42:157 The first model, referred to as Maxent1 below, is a loglinear combination of a trigram language model with a maximum entropy translation component that is an analog of the IBM translation model 2 (Brown et al. , 1993).
D08-1039 469 53:197 3 Model As an extension to commonly used lexical word pair probabilities p(f|e) as introduced in (Brown et al., 1993), we define our model to operate on word triplets.
D08-1039 470 69:197 The resulting training procedure is analogous to the one presented in (Brown et al., 1993) and (Tillmann and Ney, 1997).
D08-1039 471 28:197 One of the simplest models that can be seen in the context of lexical triggers is the IBM model 1 (Brown et al., 1993) which captures lexical dependencies between source and target words.
N07-2022 472 8:92 1 Introduction In the first SMT systems (Brown et al. , 1993), word alignment was introduced as a hidden variable of the translation model.
W03-0303 473 22:126 However, instead of estimating the probabilities for the production rules via EM as described in [Wu 1997], we assign the probabilities to the rules using the Model-1 statistical translation lexicon [Brown et al. 1993].
P07-1020 474 8:189 Most of the previous work on statistical machine translation, as exemplified in (Brown et al. , 1993), employs word-alignment algorithm (such as GIZA++ (Och and Ney, 2003)) that provides local associations between source and target words.
P08-1019 475 225:261 More specifically, by using translation probabilities, we can rewrite equation (11) and (12) as follow: nullnullnullnullnull null nullnull null nullnullnull null null nullnullnullnull null nullnull null null nullnull null nullnull null null | null null null null nullnull null nullnull null nullnull null null null null nullnull null nullnull null null null 1nullnull null nullnull null null null nullnull|nullnull (13) nullnullnullnullnull null nullnull null nullnullnull null null nullnullnullnull null nullnull null null nullnull null nullnull null null | null null null null nullnull null nullnull null nullnull null null null null nullnull null nullnull null null null 1nullnull null nullnull null null null nullnull|nullnull (14) where nullnullnullnull|null null null denotes the probability that topic term null is the translation of null null . In our experiments, to estimate the probability nullnullnullnull|null null null , we used the collections of question titles and question descriptions as the parallel corpus and the IBM model 1 (Brown et al., 1993) as the alignment model.
W00-0707 476 16:89 In previous work (Foster, 2000), I described a Maximum Entropy/Minimum Divergence (MEMD) model (Berger et al. , 1996) for p(w\[hi, s) which incorporates a trigram language model and a translation component which is an analog of the well-known IBM translation model 1 (Brown et al. , 1993).
W00-0707 477 22:89 The model consists of a set of word-pair parameters p(t\[s) and position parameters p(j\[i,/); in model 1 (IBM1) the latter are fixed at 1/(1 + 1), as each position, including the empty position 0, is considered equally likely to contain a translation for w. Maximum likelihood estimates for these parameters can be obtained with the EM algorithm over a bilingual training corpus, as described in (Brown et al. , 1993).
W97-1014 478 6:104 1 Introduction In this paper, we study the use of so-called word trigger pairs (for short: word triggers) (Bahl et al. , 1984, Lau and Rosenfeld, 1993, Tillmann and Ney, 1996) to improve an existing language model, which is typically a trigram model in combination with a cache component (Ney and Essen, 1994).
W97-1014 479 14:104 In several papers (Bahl et al. , 1984, Lau and Rosenfeld, 1993, Tillmann and Ney, 1996), selection criteria for single word trigger pairs were studied.
J05-4004 480 437:457 Our system outperforms competing approaches, including the standard machine translation alignment models (Brown et al. 1993; Vogel, Ney, and Tillmann 1996) and the state-of-the-art Cut and Paste summary alignment technique (Jing 2002).
J05-4004 481 350:457 We compare against several competing systems, the first of which is based on the original IBM Model 4 for machine translation (Brown et al. 1993) and the HMM machine translation alignment model (Vogel, Ney, and Tillmann 1996) as implemented in the GIZA++ package (Och and Ney 2003).
J05-4004 482 446:457 One obvious first approach would be to run a simpler model for the first iteration (for example, Model 1 from machine translation (Brown et al. 1993), which tends to be very recall oriented) and use this to see subsequent iterations of the more complex model.
J05-4004 483 51:457 In the context of headline generation, simple statistical models are used for aligning documents and headlines (Banko, Mittal, and Witbrock 2000; Berger and Mittal 2000; Schwartz, Zajic, and Dorr 2002), based on IBM Model 1 (Brown et al. 1993).
W07-0212 484 165:236 Appendix A: Derivation of the Probability of RWE We take a noisy channel approach, which is a common technique in NLP (for example (Brown et al. , 1993)), including spellchecking (Kernighan et al. , 1990).
N07-1057 485 55:167 We then train IBM models (Brown et al. , 1993) using the GIZA++ package (Och and Ney, 2000).
D07-1030 486 34:173 SMT has evolved from the original word-based approach (Brown et al. , 1993) into phrase-based approaches (Koehn et al. , 2003; Och and Ney, 2004) and syntax-based approaches (Wu, 1997; Alshawi et al. , 2000; Yamada and Knignt, 2001; Chiang, 2005).
W97-0119 487 7:165 1 Introduction Despite a surge in research using parallel corpora for various machine translation tasks (Brown et al. 1993),(Brown et al. 1991; Gale & Church 1993; Church 1993; Dagan & Church 1994; Simard et al. 1992; Chen 1993; Melamed 1995; Wu & Xia 1994; Wu 1994; Smadja et aI.
W05-0712 488 16:175 A word based approach depends upon traditional statistical machine translation techniques such as IBM Model1 (Brown et al. , 1993) and may not always yield satisfactory results due to its inability to handle difficult many-to-many phrase translations.
N07-1046 489 110:233 4.1.3 Letter Lexical Transliteration Similar to IBM Model-1 (Brown et al. , 1993), we use a bag-of-letter generative model within a block to approximate the lexical transliteration equivalence: P(fj+lj |ei+ki )= j+lproductdisplay jprime=j i+ksummationdisplay iprime=i P(fjprime|eiprime)P(eiprime|ei+ki ), (10) where P(eiprime|ei+ki ) similarequal 1/(k+1) is approximated by a bagof-word unigram.
N07-1046 490 61:233 3 Bi-Stream HMMs for Transliteration Standard IBM translation models (Brown et al. , 1993) can be used to obtain letter-to-letter translations.
N07-1046 491 25:233 Standard SMT alignment models (Brown et al. , 1993) are used to align letter-pairs within named entity pairs for transliteration.
P97-1047 492 28:191 Therefore, P(g l e) is the sum of the probabilities of generating g from e over all possible alignments A, in which the position i in the target sentence g is aligned to the position ai in the source sentence e: P(gle) = I l m e ~, ~" IT t(g# le=jla(a~ Ij, l,m)= al=0 amm0j=l m ! e 1"I ~ t(g# l e,)a(ilj, t, m) (3) j=l i=0 (Brown et al. , 1993) also described how to use the EM algorithm to estimate the parameters a(i I j,l, m) and $(g I e) in the aforementioned model.
P97-1047 493 29:191 1.2 Decoding in Statistical Machine Translation (Brown et al. , 1993) and (Vogel, Ney, and Tillman, 1996) have discussed the first two of the three problems in statistical machine translation.
P97-1047 494 30:191 Although the authors of (Brown et al. , 1993) stated that they would discuss the search problem in a follow-up arti cle, so far there have no publications devoted to the decoding issue for statistical machine translation.
P97-1047 495 59:191 We dealt with this by either limiting the translation probability from the null word (Brown 367 et al. , 1993) at the hypothetical 0-position(Brown et al. , 1993) over a threshold during the EM training, or setting SHo (j) to a small probability 7r instead of 0 for the initial null hypothesis H0.
N04-4015 496 6:115 Introduction Translation of two languages with highly different morphological structures as exemplified by Arabic and English poses a challenge to successful implementation of statistical machine translation models (Brown et al. 1993).
N04-1021 497 16:293 We are given a source (Chinese) sentence f = fJ1 = f1,,fj,,fJ, which is to be translated into a target (English) sentence e = eI1 = e1,,ei,,eI Among all possible target sentences, we will choose the sentence with the highest probability: eI1 = argmax eI1 {Pr(eI1|fJ1 )} (1) As an alternative to the often used source-channel approach (Brown et al. , 1993), we directly model the posterior probability Pr(eI1|fJ1 ) (Och and Ney, 2002) using a log-linear combination of feature functions.
N04-1021 498 97:293 4.1 Model 1 Score We used IBM Model 1 (Brown et al. , 1993) as one of the feature functions.
W06-3104 499 141:270 Initial estimates of lexical translation probabilities came from the IBM Model 4 translation tables produced by GIZA++ (Brown et al. , 1993; Och and Ney, 2003).
W06-3104 500 33:270 1.2 From Synchronous to Quasi-Synchronous Grammars Because our approach will let anything align to anything, it is reminiscent of IBM Models 15 (Brown et al. , 1993).
W05-0614 501 48:160 Further, we can learn the channel probabilities in an unsupervised manner using a variant of the EM algorithm similar to machine translation (Brown et al. , 1993), and statistical language understanding (Epstein, 1996).
W05-0614 502 85:160 We follow IBM Model 1 (Brown et al. , 1993) and assume that each word in an utterance is generated by exactly one role in the parallel frame Using standard EM to learn the role to word mapping is only sufficient if one knows to which level in the tree the utterance should be mapped.
W04-1118 503 41:191 The relationship between the translation model and the alignment model is given by: Pr(fJ1 jeI1) = X aJ1 Pr(fJ1 ;aJ1jeI1) (3) In this paper, we use the models IBM-1, IBM4 from (Brown et al. , 1993) and the HiddenMarkovalignmentmodel(HMM)from(Vogelet al. , 1996).
P05-1067 504 10:217 1 Introduction Statistical approaches to machine translation, pioneered by (Brown et al. , 1993), achieved impressive performance by leveraging large amounts of parallel corpora.
P05-1067 505 188:217 In comparison, we deployed the GIZA++ MT modeling tool kit, which is an implementation of the IBM Models 1 to 4 (Brown et al. , 1993; AlOnaizan et al. , 1999; Och and Ney, 2003).
P05-1067 506 165:217 As a unified approach, we augment the SDIG by adding all the possible word pairs (,) ji fe as a parallel ET pair and using the IBM Model 1 (Brown et al. , 1993) word to word translation probability as the ET translation probability.
P05-1067 507 91:217 In our implementation, the IBM Model 1 (Brown et al. , 1993) is used.
W99-0604 508 15:232 Many statistical translation models (Vogel et al. , 1996; Tillmann et al. , 1997; Niessen et al. , 1998; Brown et al. , 1993) try to model word-toword correspondences between source and target words.
W99-0604 509 89:232 This alignment representation is a generalization of the baseline alignments described in (Brown et al. , 1993) and allows for many-to-many alignments.
W02-1022 510 196:207 Using alignment for grammar and lexicon induction has been an active area of research, both in monolingual settings (van Zaanen, 2000) and in machine translation (MT) (Brown et al. , 1993; Melamed, 2000; Och and Ney, 2000) | interestingly, statistical MT techniques have been used to derive lexico-semantic mappings in the \reverse" direction of language understanding rather than generation (Papineni et al. , 1997; Macherey et al. , 2001).
W95-0106 511 87:153 It is interesting to constrast this method with the "parse-parse-match" approaches that have been reported recently for producing parallel bracketed corpora (Sadler & Vendelmans 1990; Kaji et al. 1992; Matsumoto et al. 1993; Cranias et al. 1994; Gfishman 1994).
W95-0106 512 7:153 1 Introduction A number of empirical studies have found bracketing to be a useful type of corpus annotation (e.g. , Pereira & Schabes 1992; Black et al. 1993).
W95-0106 513 13:153 Numerous experiments have shown parallel bilingual corpora to provide a rich source of constraints for statistical analysis (e.g. , Brown et al. 1990; Gale & Church 1991 ; Gale et al. 1992; Church 1993; Brown et al. 1993; Dagan et al. 1993; Fung & Church 1994; Wu & Xia 1994; Fung & McKeown 1994).
P06-1032 514 12:159 In this paper, we show that a noisy channel model instantiated within the paradigm of Statistical Machine Translation (SMT) (Brown et al. , 1993) can successfully provide editorial assistance for non-native writers.
P06-1032 515 30:159 Rather than learning how strings in one language map to strings in another, however, translation now involves learning how systematic patterns of errors in ESL learners English map to corresponding patterns in native English 2.2 A Noisy Channel Model of ESL Errors If ESL error correction is seen as a translation task, the task can be treated as an SMT problem using the noisy channel model of (Brown et al. , 1993): here the L2 sentence produced by the learner can be regarded as having been corrupted by noise in the form of interference from his or her L1 model and incomplete language models internalized during language learning.
P05-1057 516 78:247 If e has length l and f has length m, there are possible 2lm alignments between e and f (Brown et al. , 1993).
P05-1057 517 7:247 1 Introduction Word alignment, which can be defined as an object for indicating the corresponding words in a parallel text, was first introduced as an intermediate result of statistical translation models (Brown et al. , 1993).
P05-1057 518 11:247 Statistical approaches, which depend on a set of unknown parameters that are learned from training data, try to describe the relationship between a bilingual sentence pair (Brown et al. , 1993; Vogel and Ney, 1996).
W07-1205 519 12:213 Training of the phrase translation model builds on top of a standard statistical word alignment over the training corpus of parallel text (Brown et al. , 1993) for identifying corresponding word blocks, assuming no further linguistic analysis of the source or target language.
H05-1012 520 18:201 The IBM models 1-5 (Brown et al. , 1993) produce word alignments with increasing algorithmic complexity and performance.
W08-0306 521 6:125 GIZA++ (Och and Ney, 2003), an implementation of the IBM (Brown et al., 1993) and HMM (?)
P06-1122 522 125:211 Aligning tokens in parallel sentences using the IBM Models (Brown et al. , 1993), (Och and Ney, 2003) may require less information than full-blown translation since the task is constrained by the source and target tokens present in each sentence pair.
E06-1004 523 21:203 For a detailed introduction to IBM translation models, please see (Brown et al. , 1993).
E06-1004 524 201:203 An open question in SMT is whether there existsclosed formexpressions (whoserepresentation is polynomial in the size of the input) for P (f|e) and the counts in the EM iterations for models 3-5 (Brown et al. , 1993).
E06-1004 525 11:203 Increasingly, parallel corpora are becoming available for many language pairs and SMT systems have been built for French-English, German-English, Arabic-English, Chinese-English, Hindi-English and other language pairs (Brown et al. , 1993), (AlOnaizan et al. , 1999), (Udupa, 2004).
E06-1004 526 8:203 The parameters of the models are estimated by iterative maximum-likelihood training on a large parallel corpus of natural language texts using the EM algorithm (Brown et al. , 1993).
E06-1004 527 32:203 Exact Decoding is the original decoding problem as defined in (Brown et al. , 1993) and Relaxed Decoding is the relaxation of the decoding problem typically used in practice.
E06-1004 528 29:203 Expectation Evaluation is the soul of parameter estimation (Brown et al. , 1993), (Al-Onaizan et al. , 1999).
E06-1004 529 7:203 1 Introduction Statistical Machine Translation is a data driven machine translation technique which uses probabilistic models of natural language for automatic translation (Brown et al. , 1993), (Al-Onaizan et al. , 1999).
E06-1004 530 34:203 In their seminal paper on SMT, Brownand his colleagues highlighted the problems weface aswe go from IBM Models 1-2 to 3-5(Brown et al. , 1993) 3: Asweprogress from Model1toModel5, evaluating the expectations that gives us counts becomes increasingly difficult.
E06-1004 531 15:203 In the classic work on SMT,Brownandhiscolleagues atIBMintroduced the notion of alignment between a sentence f and its translation e and used it in the development of translation models (Brown et al. , 1993).
N07-1008 532 55:189 3 A Categorization of Block Styles In (Brown et al. , 1993), multi-word cepts (which are realized in our block concept) are discussed and the authors state that when a target sequence is sufficiently different from a word by word translation, only then should the target sequence should be promoted to a cept.
N07-1008 533 26:189 The translation model is estimated via the EM algorithm or approximations that are bootstrapped from the previous model in the sequence as introduced in (Brown et al. , 1993).
N07-1008 534 57:189 Following the perspective of (Brown et al. , 1993), a minimal set of phrase blocks with lengths (m, n) where either m or n must be greater than zero results in the following types of blocks: 1.
N07-1008 535 23:189 1.2 Statistical modeling for translation Earlier work in statistical machine translation (Brown et al. , 1993) is based on the noisy-channel formulation where T = arg max T p(TjS) = argmax T p(T)p(SjT) (1) where the target language model p(T) is further decomposed as p(T) / productdisplay i p(tijti1, . . ., tik+1) where k is the order of the language model and the translation model p(SjT) has been modeled by a sequence of five models with increasing complexity (Brown et al. , 1993).
I08-6006 536 17:152 Most current transliteration systems use a generative model for transliteration such as freely available GIZA++1 (Och and Ney , 2000),an implementation of the IBM alignment models (Brown et al., 1993).
W06-1606 537 37:175 Pr(pi,F,A) = summationdisplay i,c()=(pi,F,A) productdisplay rji p(rj) (4) In order to acquire the rules specific to our model and to induce their probabilities, we parse the English side of our corpus with an in-house implementation (Soricut, 2005) of Collins parsing models (Collins, 2003) and we word-align the parallel corpus with the Giza++2 implementation of the IBM models (Brown et al. , 1993).
W00-0508 538 43:148 The statistical machine translation approach is based on the noisy channel paradigm and the Maximum-A-Posteriori decoding algorithm (Brown et al. , 1993).
W00-0508 539 44:148 The sequence Ws is thought as a noisy version of WT and the best guess I)d~ is then computed as ^ W~ = argmax P(WTWs) wT = argmax P(WslWT)P(WT) (1) wT In (Brown et al. , 1993) they propose a method for maximizing P(WTIWs) by estimating P(WT) and P(WsIWT) and solving the problem in equation 1.
W00-0508 540 45:148 Our approach to statistical machine translation differs from the model proposed in (Brown et al. , 1993) in that: We compute the joint model P(Ws, WT) from the bilanguage corpus to account for the direct mapping of the source sentence Ws into the target sentence I?VT that is ordered according to the source language word order.
W00-0508 541 21:148 In (Knight and A1-Onaizan, 1998), finite-state machine translation is based on (Brown et al. , 1993) and is used for decoding the target language string.
P98-2158 542 186:191 (Vogel et al. , 1996) report better perplexity results on the Verbmobil Corpus with their HMMbased alignment model in comparison to Model 2 of (Brown et al. , 1993).
P98-2158 543 26:191 In our search procedure, we use a mixture-based alignment model that slightly differs from the model introduced as Model 2 in (Brown et al. , 1993).
P98-2158 544 37:191 It assumes that the distance of the positions relative to the diagonal of the (j, i) plane is the dominating factor: r(i _j I) p(ilj, J, I) = (7), Ei,=l r(i' j ) As described in (Brown et al. , 1993), the EM algorithm can be used to estimate the parameters of the model.
P98-2158 545 51:191 The underlying translation model is Model 2 from (Brown et al. , 1993).
P98-2158 546 25:191 960 1.2 Alignment with Mixture Distribution Several papers have discussed the first issue, especially the problem of word alignments for bilingual corpora (Brown et al. , 1993), (Dagan et al. , 1993), (Kay and RSscheisen, 1993), (Fung and Church, 1994), (Vogel et al. , 1996).
W08-0321 547 8:99 The empirical probability for each sentence pair is estimated by maximum likelihood estimation over the training data (Brown et al., 1993).
W08-0321 548 31:99 In the well-known so-called IBM word alignment models (Brown et al., 1993), re-estimating the model parameters depends on the empirical probability P(ek,fk) for each sentence pair (ek,fk).
W07-0721 549 53:163 This feature, which is based on the lexical parameters of the IBM Model 1 (Brown et al. , 1993), provides a complementary probability for each tuple in the translation table.
C96-1078 550 8:234 Some o1' l;his research has treated the sentenees as unstructured word sequences to be aligned; this work has primarily involved the acquisition of bilingual lexical correspondences (Chen, 1993), although there has also been a,n attempt to create a full MT system based on such trcat, ment (Brown et al. , 1993).
P97-1063 551 24:218 The co-occurrence relation can also be based on distance in a bitext space, which is a more general representations of bitext correspondence (Dagan et al. , 1993; Resnik & Melamed, 1997), or it can be restricted to words pairs that satisfy some matching predicate, which can be extrinsic to the model (Melamed, 1995; Melamed, 1997).
P97-1063 552 56:218 It is analogous to the step in other translation model induction algorithms that sets all probabilities below a certain threshold to negligible values (Brown et al. , 1990; Dagan et al. , 1993; Chen, 1996).
P97-1063 553 10:218 (Macklovitch, 1994; Melamed, 1996b)), concordancing for bilingual lexicography (Catizone et al. , 1993; Gale & Church, 1991), computerassisted language learning, corpus linguistics (Melby.
P97-1063 554 7:218 1 Introduction Over the past decade, researchers at IBM have developed a series of increasingly sophisticated statistical models for machine translation (Brown et al. , 1988; Brown et al. , 1990; Brown et al. , 1993a).
P97-1063 555 50:218 Models of translational equivalence that are ignorant of indirect associations have "a tendency to be confused by collocates" (Dagan et al. , 1993).
P97-1063 556 125:218 for their models (Brown et al. , 1993b).
W05-0825 557 28:75 3 Length Model: Dynamic Programming Given the word fertility de nitions in IBM Models (Brown et al. , 1993), we can compute a probability to predict phrase length: given the candidate target phrase (English) eI1, and a source phrase (French) of length J, the model gives the estimation of P(J|eI1) via a dynamic programming algorithm using the source word fertilities.
D08-1084 558 26:232 The MT community has developed not only an extensive literature on alignment (Brown et al., 1993; Vogel et al., 1996; Marcu and Wong, 2002; DeNero et al., 2006), but also standard, proven alignment tools such as GIZA++ (Och and Ney, 2003).
D08-1084 559 137:232 Although we have argued (section 2) that this is unlikely to succeed, to our knowledge, we are the first to investigate the matter empirically.11 The best-known MT aligner is undoubtedly GIZA++ (Och and Ney, 2003), which contains implementations of various IBM models (Brown et al., 1993), as well as the HMM model of Vogel et al.
H05-1021 560 17:173 The IBM translation models (Brown et al. , 1993) describe word reordering via a distortion model defined over word positions within sentence pairs.
H05-1021 561 28:173 2 The WFST Reordering Model The Translation Template Model (TTM) is a generative model of phrase-based translation (Brown et al. , 1993).
H05-1021 562 148:173 The 1000-best lists are augmented with IBM Model-1 (Brown et al. , 1993) scores and then rescored with a second set of MET parameters.
D07-1090 563 7:316 1 Introduction Given a source-language (e.g. , French) sentence f, the problem of machine translation is to automatically produce a target-language (e.g. , English) translation e. The mathematics of the problem were formalized by (Brown et al. , 1993), and re-formulated by (Och and Ney, 2004) in terms of the optimization e = arg maxe Msummationdisplay m=1 mhm(e,f) (1) where fhm(e,f)g is a set of M feature functions and fmg a set of weights.
H05-1097 564 40:225 For example, in the IBM Models (Brown et al. , 1993), each word ti independently generates 0, 1, or more 2Note that we refer to t as the target sentence, even though in the source-channel model, t is the source sentence which goes through the channel model P(s|t) to produce the observed sentence s. words in the source language.
W00-0507 565 42:91 2.2.1 The evaluator The evaluator is a function p(t\[t', s) which assigns to each target-text unit t an estimate of its probability given a source text s and the tokens t' which precede t in the current translation of s. Our approach to modeling this distribution is based to a large extent on that of the IBM group (Brown et al. , 1993), but it diflhrs in one significant aspect: whereas the IBM model involves a "noisy channel" decomposition, we use a linear combination of separate predictions from a language model p(t\[t') and a translation model p(t\[s).
W00-0507 566 57:91 Both models are based on IBM translation model 2 (Brown et al. , 1993) which has the 49 property that it generates tokens independently.
W00-0507 567 59:91 This formula follows the convention of (Brown et al. , 1993) in letting so designate the null state.
W05-0816 568 33:198 Word correspondence was further developed in IBM Model-1 (Brown et al. , 1993) for statistical machine translation.
W05-0812 569 5:102 1 Introduction The most widely used alignment model is IBM Model 4 (Brown et al. , 1993).
W05-0812 570 11:102 IBM Model 4 parameters are then estimated over this partial search space as an approximation to EM (Brown et al. , 1993; Och and Ney, 2003).
D07-1079 571 9:289 Approaches include word substitution systems (Brown et al. , 1993), phrase substitution systems (Koehn et al. , 2003; Och and Ney, 2004), and synchronous context-free grammar systems (Wu and Wong, 1998; Chiang, 2005), all of which train on string pairs and seek to establish connections between source and target strings.
P05-1032 572 18:147 translation including the joint probability phrasebased model (Marcu and Wong, 2002) and a variant on the alignment template approach (Och and Ney, 2004), and contrast them to the performance of the word-based IBM Model 4 (Brown et al. , 1993).
P05-1032 573 39:147 By increasing the size of the basic unit of translation, phrase-based machine translation does away with many of the problems associated with the original word-based formulation of statistical machine translation (Brown et al. , 1993), in particular: The Brown et al.
J03-1005 574 80:672 For placing the head the center function center(i) (Brown et al. [1993] uses the notation circledot i ) is used: the average position of the source words with which the target word e i1 is aligned.
J03-1005 575 72:672 The fertility for the null word is treated specially (for details see Brown et al. [1993]).
J00-2004 576 202:517 Choosing the most advantageous, Hiemstra has published parts of the translational distributions of certain words, induced using both his method and Brown et al.'s (1993b) Model 1 from the same training bitext.
J00-2004 577 194:517 A word order correlation bias, as well as the phrase structure biases in Brown et al.'s (1993b) Models 4 and 5, would be less beneficial with noisier training bitexts or for language pairs with less similar word order.
J00-2004 578 364:517 Evaluation 6.1 Evaluation at the Token Level This section compares translation model estimation methods A, B, and C to each other and to Brown et al.'s (1993b) Model 1.
J00-2004 579 361:517 Brown et al. 1993).
J00-2004 580 374:517 Until now, translation models have been evaluated either subjectively (e.g. White and O'Connell 1993) or using relative metrics, such as perplexity with respect to other models (Brown et al. 1993b).
J00-2004 581 187:517 Dagan, Church, and Gale (1993) expanded on this idea by replacing Brown et al.'s (1988) word alignment parameters, which were based on absolute word positions in aligned segments, with a much smaller set of relative offset parameters.
J00-2004 582 233:517 Due to the parameter interdependencies introduced by the one-to-one assumption, we are unlikely to find a method for decomposing the assignments into parameters that can be estimated independently of each other as in Brown et al. \[1993b, Equation 26\]).
W06-3107 583 39:160 Nevertheless, in the problem described in this article, the source and the target sentences are given, and we are focusing on the optimization of the aligment a. The translation probability Pr(f,a|e) can be rewritten as follows: Pr(f,a|e) = Jproductdisplay j=1 Pr(fj,aj|fj11,aj11,eI1) = Jproductdisplay j=1 Pr(aj|fj11,aj11,eI1) Pr(fj|fj11,aj1,eI1) (2) The probability Pr(f,a|e) can be estimated by using the word-based IBM statistical alignment models (Brown et al. , 1993).
W06-3107 584 15:160 Statistical models for machine translation heavily depend on the concept of alignment, specifically, the well known IBM word based models (Brown et al. , 1993).
W06-3107 585 35:160 Alignment models to structure the translation model are introduced in (Brown et al. , 1993).
P07-1092 586 8:201 1 Introduction Statistical machine translation (Brown et al. , 1993) has seen many improvements in recent years, most notably the transition from wordto phrase-based models (Koehn et al. , 2003).
C00-2123 587 10:267 The model is often further restricted so that each source word is assigned to exactly one target word (Brown et al. , 1993; Ney et al. , 2000).
P96-1023 588 138:283 The node mapping function f for the entire tree thus has a different role from the alignment function in the IBM statistical translation model (Brown et al. 1990, 1993); the role of the latter includes the linear ordering of words in the target string.
P99-1027 589 49:192 2 Translation Model The algorithm for fast translation, which has been described previously in some detail (McCarley and Roukos, 1998) and used with considerable success in TREC (Franz et al. , 1999), is a descendent of IBM Model 1 (Brown et al. , 1993).
P99-1027 590 55:192 This model is trained on approximately 5 million sentence pairs of Hansard (Canadian parliamentary) and UN proceedings which have been aligned on a sentence-by-sentence basis by the methods of (Brown et al. , 1991), and then further aligned on a word-by-word basis by methods similar to (Brown et al. , 1993).
W05-0806 591 10:124 For detailed descriptions of SMT models see for example (Brown et al. , 1993; Och and Ney, 2003).
P04-1066 592 36:168 The first of these nonstructural problems with Model 1, as standardly trained, is that rare words in the source language tend to act as garbage collectors (Brown et al. , 1993b; Och and Ney, 2004), aligning to too many words in the target language.
P04-1066 593 3:168 1 Introduction IBM Model 1 (Brown et al. , 1993a) is a wordalignment model that is widely used in working with parallel bilingual corpora.
D07-1025 594 183:229 Therefore, to make the phrase-based SMT system robust against data sparseness for the ranking task, we also make use of the IBM Model 4 (Brown et al. , 1993) in both directions.
P08-1010 595 59:204 3.1 Model-based Phrase Pair Posterior In a statistical generative word alignment model (Brown et al., 1993), it is assumed that (i) a random variable a specifies how each target word fj is generated by (therefore aligned to) a source 1 word eaj; and (ii) the likelihood function f(f,a|e) specifies a generativeprocedurefromthesourcesentencetothe target sentence.
W03-0414 596 12:242 (Brown et al. , 1990; Brown et al. , 1993)) are best known and studied.
N06-4004 597 8:70 Atthefinestlevel, thisinvolvesthealignment of words and phrases within two sentences that are known to be translations (Brown et al. , 1993; Och and Ney, 2003; Vogel et al. , 1996; Deng and Byrne, 2005).
N06-4004 598 35:70 Alignment quality can be further improved when the chunking procedure is based on translation lexicons from IBM Model-1 alignment model (Brown et al. , 1993).
N06-4004 599 38:70 MTTK provides implementations of various alignment, models including IBM Model-1, Model-2 (Brown et al. , 1993), HMM-based word-to-word alignment model (Vogel et al. , 1996; Och and Ney, 2003) and HMM-based word-to-phrase alignment model (Deng and Byrne, 2005).
J04-4002 600 56:482 As an alternative to the often used sourcechannel approach (Brown et al. 1993), we directly model the posterior probability Pr(e I 1 | f J 1 ) (Och and Ney 2002).
J04-4002 601 27:482 Yet the modeling, training, and search methods have also improved since the field of statistical machine translation was pioneered by IBM in the late 1980s and early 1990s (Brown et al. 1990; Brown et al. 1993; Berger et al. 1994).
A00-1019 602 25:153 2.1 The Evaluator The evaluator is a function p(t\[t', s) which assigns to each target-text unit t an estimate of its probability given a source text s and the tokens t' which precede t in the current translation of s. 1 Our approach to modeling this distribution is based to a large extent on that of the IBM group (Brown et al. , 1993), but it differs in one significant aspect: whereas the IBM model involves a "noisy channel" decomposition, we use a linear combination of separate predictions from a language model p(tlt ~) and a translation model p(tls ).
A00-1019 603 44:153 Furthermore, the underlying decoding strategies are too time consuming for our application We therefore use a translation model based on the simple linear interpolation given in equation 2 which combines predictions of two translation models -Ms and M~ -both based on IBM-like model 2(Brown et al. , 1993).
A00-1019 604 87:153 3.2 Mapping Mapping the identified units (tokens or sequences) to their equivalents in the other language was achieved by training a new translation model (IBM 2) using the EM algorithm as described in (Brown et al. , 1993).
A00-1019 605 42:153 Techniques for weakening the independence assumptions made by the IBM models 1 and 2 have been proposed in recent work (Brown et al. , 1993; Berger et al. , 1996; Och and Weber, 98; Wang and Waibel, 98; Wu and Wong, 98).
P06-2124 606 36:213 2.1 Baseline: IBM Model-1 The translation process can be viewed as operations of word substitutions, permutations, and insertions/deletions (Brown et al. , 1993) in noisychannel modeling scheme at parallel sentence-pair level.
P06-2124 607 10:213 Most current approaches emphasize within-sentence dependencies such as the distortion in (Brown et al. , 1993), the dependency of alignment in HMM (Vogel et al. , 1996), and syntax mappings in (Yamada and Knight, 2001).
P06-2124 608 39:213 (1) 1We follow the notations in (Brown et al. , 1993) for English-French, i.e., e f, although our models are tested, in this paper, for English-Chinese.
C04-1090 609 7:186 In word-based models, such as IBM Model 1-5 (Brown et al 1993), the probability P(T|S) is decomposed into statistical parameters involving words.
P06-2005 610 118:209 A null Assuming that one SMS word is mapped exactly to one English word in the channel model under an alignment, we need to consider only two types of probabilities: the alignment probabilities denoted by Pm and the lexicon mapping probabilities denoted by (Brown et al. 1993).
P06-2005 611 110:209 We thus propose to adapt the statistical machine translation model (Brown et al. , 1993; Zens and Ney, 2004) for SMS text normalization.
W06-3111 612 36:194 Monotone Nonmonotone Target B A Positions C D Source Positions Figure 1: Two Types of Alignment The IBM model 1 (IBM-1) (Brown et al. , 1993) assumes that all alignments have the same probability by using a uniform distribution: p(fJ1 |eI1) = 1IJ Jproductdisplay j=1 Isummationdisplay i=1 p(fj|ei) (2) We use the IBM-1 to train the lexicon parameters p(f|e), the training software is GIZA++ (Och and Ney, 2003).
C94-2178 613 22:102 These tables were computed from a small fragment of the Canadian Hansards that has been used in a number of other studies: Church (1993) and Simard et al (1992).
C94-2178 614 10:102 In previous work (Church et al, 1993), we have reported some preliminary success in aligning the English and Japanese versions of the AWK manual (Aho, Kernighan, Weinberger (1980)), using charalign (Church, 1993), a method that looks for character sequences that are the same in both the source and target.
C94-2178 615 7:102 Motivation There have been quite a number of recent papers on parallel text: Brown et al (1990, 1991, 1993), Chen (1993), Church (1993), Church et al (1993), Dagan et al (1993), Gale and Church (1991, 1993), Isabelle (1992), Kay and Rgsenschein (1993), Klavans and Tzoukermann (1990), Kupiec (1993), Matsumoto (1991), Ogden and Gonzales (1993), Shemtov (1993), Simard et al (1992), WarwickArmstrong and Russell (1990), Wu (to appear).
C94-2178 616 100:102 This estimate could be used as a starting point for a more detailed alignment algorithm such as word_align (Dagan et al, 1993).
C94-2178 617 85:102 Results This algorithm was applied to a fragment of the Canadian Hansards that has been used in a number of other studies: Church (1993) and Simard et al (1992).
W93-0301 618 179:185 4 Conclusions Compared with other word alignment algorithms (Brown et al. , 1993; Gale and Church, 1991a), word_align does not require sentence alignment as input, and was shown to produce useful alignments for small and noisy corpora.
W93-0301 619 32:185 The method was intended as a replacement for sentence-based methods (e.g. , (Brown et al. , 1991a; Gale and Church, 1991b; Kay and Rosenschein, 1993)), which are very sensitive to noise.
W93-0301 620 6:185 1 Introduction Aligning parallel texts has recently received considerable attention (Warwick et al. , 1990; Brown et al. , 1991a; Gale and Church, 1991b; Gale and Church, 1991a; Kay and Rosenschein, 1993; Simard et al. , 1992; Church, 1993; Kupiec, 1993; Matsumoto et al. , 1993).
W93-0301 621 2:185 The program takes the output of char_align (Church, 1993), a robust alternative to sentence-based alignment programs, and applies word-level constraints using a version of Brown el al.'s Model 2 (Brown et al. , 1993), modified and extended to deal with robustness issues.
W93-0301 622 7:185 These methods have been used in machine translation (Brown et al. , 1990; Sadler, 1989), terminology research and translation aids (Isabelle, 1992; Ogden and Gonzales, 1993), bilingual lexicography (Klavans and Tzoukermann, 1990), collocation studies (Smadja, 1992), word-sense disambiguation (Brown et al. , 1991b; Gale et al. , 1992) and information retrieval in a multilingual environment (Landauer and Littman, 1990).
W93-0301 623 36:185 2 The alignment Algorithm 2.1 Estimation of translation probabilities The translation probabilities are estimated using a method based on Brown et al.'s Model 2 (1993), which is summarized in the following subsection, 2.1.1.
W06-3116 624 42:97 A Viterbi alignment computed from an IBM model 4 (Brown et al. , 1993) was computed for each translation direction.
C08-1136 625 27:191 In the context of statistical machine translation (Brown et al., 1993), we may interpretE as an English sentence, F its translation in French, and A a representation of how the words correspond to each other in the two sentences.
C04-1059 626 10:166 Statistical machine translation is based on the noisy channel model, where the translation hypothesis is searched over the space defined by a translation model and a target language (Brown et al, 1993).
W07-0715 627 16:155 (2006) tried a different generative phrase translation model analogous to IBM word-translation Model 3 (Brown et al. , 1993), and again found that the standard model outperformed their generative model.
W07-0715 628 84:155 The lexical scores are computed as the (unnormalized) log probability of the Viterbi alignment for a phrase pair under IBM word-translation Model 1 (Brown et al. , 1993).
P06-2014 629 7:233 Originally introduced as a byproduct of training statistical translation models in (Brown et al. , 1993), word alignment has become the first step in training most statistical translation systems, and alignments are useful to a host of other tasks.
P06-2014 630 24:233 The IBM models (Brown et al. , 1993) benefit from a one-tomany constraint, where each target word has ex105 the tax causes unrest l' impt cause le malaise Figure 1: A cohesion constraint violation.
E06-1005 631 44:186 We use the IBM Model 1 (Brown et al. , 1993) (uniform distribution) and the Hidden Markov Model (HMM, first-order dependency, (Vogel et al. , 1996)) to estimate the alignment model.
I05-2012 632 66:156 (Chen et al. , 1993; Gale et al. , 1993) proposed sentence alignment techniques based on dynamic programming, using sentence length and lexical mapping information.
I05-2012 633 57:156 For the give source text, S, it finds the most probable alignment set, A, and target text, T. = Aa SaTpSTp )|,()|( (1) Brown (Brown et al. , 1993) proposed five alignment models, called IBM Model, for an English-French alignment task based on equa68 tion (1).
I05-2012 634 67:156 (Haruno et al. , 1996; Kay et al. , 1993) applied iterative refinement algorithms to sentence level alignment tasks.
I05-2012 635 68:156 In this paper, we propose an alignment algorithm between English and Korean conceptual units (or between English and Korean term constituents) in English-Korean technical term pairs based on IBM Model (Brown et al. , 1993).
I05-2012 636 54:156 It was initially proposed by (Brown et al. , 1993) and, more recently, have been intensively studied by several research groups (Germann et al. , 2001; Och et al. , 2003).
J05-4003 637 134:416 One such model is the IBM Model 1 (Brown et al. 1993).
P96-1020 638 13:199 Corpus-based or example-based MT (Sato and Nagao, 1990; Sumita and Iida, 1991) and statistical MT (Brown et al. , 1993) systems provide the easiest customizability, since users have only to supply a collection of source and target sentence pairs (a bilingual corpus).
P06-1077 639 7:252 1 Introduction Phrase-based translation models (Marcu and Wong, 2002; Koehn et al. , 2003; Och and Ney, 2004), which go beyond the original IBM translation models (Brown et al. , 1993) 1 by modeling translations of phrases rather than individual words, have been suggested to be the state-of-theart in statistical machine translation by empirical evaluations.
P06-2117 640 9:221 1 Introduction Word alignment was first proposed as an intermediate result of statistical machine translation (Brown et al. , 1993).
P06-2117 641 49:221 2 Statistical Word Alignment Model According to the IBM models (Brown et al. , 1993), the statistical word alignment model can be generally represented as in equation (1).
P06-2117 642 63:221 1 A cept is defined as the set of target words connected to a source word (Brown et al. , 1993).
J04-2004 643 73:257 In the following section we show how this drawback can be overcome using statistical alignments (Brown et al. 1993).
J04-2004 644 170:257 These results were achieved using the statistical alignments provided by model 5 (Brown et al. 1993; Och and Ney 2000) and smoothed 11-grams and 6-grams, respectively.
W03-0315 645 57:171 Given training data consisting of parallel sentences: }1),,{( )()( Sief ii =, our Model-1 training for t(f|e) is as follows: = = S s ss e efefceft 1 )()(1 ),;|()|( Where 1 e is a normalization factor such that 0.1)|( = j j eft ),;|( )()( ss efefc denotes the expected number of times that word e connects to word f. == = = l i i m j jl k k ss eeff eft eft efefc 11 1 )()( ),(),( )|( )|( ),;|( With the conditional probability t(f|e), the probability for an alignment of foreign string F given English string E is in (1): = = + = m j n i ijm eft l EFP 1 0 )|( )1( 1 )|( (1) The probability of alignment F given E: )|( EFP is shown to achieve the global maximum under this EM framework as stated in (Brown et al. ,1993).
W03-0315 646 58:171 In our approach, equation (1) is further normalized so that the probability for different lengths of F is comparable at the word level: m m j n i ijm eft l EFP /1 10 )|( )1( 1 )|( + = == (2) The alignment models described in (Brown et al. , 1993) are all based on the notion that an alignment aligns each source word to exactly one target word.
W03-0315 647 54:171 2.2 Statistical Translation Lexicon We use a statistical translation lexicon known as IBM Model-1 in (Brown et al. , 1993) for both efficiency and simplicity.
W01-1408 648 17:178 In this paper we use the so-called Model 4 from (Brown et al. , 1993).
W01-1408 649 69:178 They developed a simple heuristic function for Model 2 from (Brown et al. , 1993) which was non admissible.
W01-1408 650 16:178 2 IBM Model 4 Various statistical alignment models of the form Pr(fJ1 ;aJ1jeI1) have been introduced in (Brown et al. , 1993; Vogel et al. , 1996; Och and Ney, 2000a).
W01-1408 651 9:178 Many statistical translation models (Brown et al. , 1993; Vogel et al. , 1996; Och and Ney, 2000b) try to model word-to-word correspondences between source and target words.
W01-1408 652 20:178 For a detailed description for Model 4 the reader is referred to (Brown et al. , 1993).
D07-1045 653 11:218 This approach is usually referred to as the noisy source-channel approach in statistical machine translation (Brown et al. , 1993).
Copyright © Univ. of Mich. and the CLAIR Group at the Univ. of Mich.
All information provided herein should be considered tentative and still under construction. Further analysis and correction is still being performed. Please remember that all statistics contained herein are the results of independent research and should not be considered a statement of fact regarding any of the papers, authors, or other entities they refer to.