Citation Summary
| Citing sentences |
|---|
| C08-1093 1 52:155 Och and Ney (2004) reformulate equation 3 as a linear combination of feature functions hm(e,f) and weights m, including feature functions for translation models hi(e,f) = P(f|e) and language models hj(e) = P(e): e = argmaxe Msummationdisplay m=1 mhm(e,f) (4) The translation model used in our approach is based on the sequence of alignment models described in Och and Ney (2003). |
| C08-1093 2 54:155 The different alignment models described in Och and Ney (2003) each parameterize equation 5 differently so as to capture different properties of source and target mappings. |
| C08-1093 3 67:155 Furthermore, various additional smoothing techniques are employed in alignment to avoid overfitting and improved coping with rare words (see Och and Ney (2003)). |
| W07-0401 4 98:352 Here, we train word alignments in both directions with GIZA++ (Och and Ney, 2003). |
| W08-2118 5 64:194 2.2 Annotate Morphemes To extract the Arabic morphemes that align to English text, we use English as the source corpus and aligned to Arabic morpheme corpus using GIZA++ (Och and Ney, 2003) toolkit. |
| C04-1006 6 105:187 We use the same training schemes (model sequences) as presented in (Och and Ney, 2003). |
| C04-1006 7 40:187 A detailed description of the popular translation models IBM-1 to IBM-5 (Brown et al. , 1993), aswellastheHidden-Markovalignmentmodel (HMM) (Vogel et al. , 1996) can be found in (Och and Ney, 2003). |
| C04-1006 8 147:187 For the Verbmobil task, the refined method of (Och and Ney, 2003) is used. |
| C04-1006 9 163:187 A good overview of these models is given in (Och and Ney, 2003). |
| C04-1006 10 109:187 5.3 Lexicon Symmetrization In Table 3 and Table 4, we present the following experiments performed for both the Verbmobil and the Canadian Hansards task: Base: the system taken from (Och and Ney, 2003) that we use as baseline system. |
| C04-1006 11 120:187 In Table 3, we compare both interpolation variants for the Verbmobil task to (Och and Ney, 2003). |
| C04-1006 12 145:187 5.4 Generalized Alignments In (Och and Ney, 2003) generalized alignments are used, thus the final Viterbi alignments of both translation directions are combined using some heuristic. |
| C04-1006 13 17:187 In (Och and Ney, 2003), it is shown that the statistical approach performs very well compared to alternative approaches, e.g. based on the Dice coefficient or the competitive linking algorithm (Melamed, 2000). |
| C04-1006 14 106:187 As we use the same training and testing conditions as (Och and Ney, 2003), we will refer to the results presented in that article as the baseline results. |
| C04-1006 15 33:187 We will show statistically significant improvements compared to state-of-the-art results in (Och and Ney, 2003). |
| C04-1006 16 103:187 French English Train Sentences 128K Words 2.12M 1.93M Vocabulary 37542 29414 Singletons 12986 9572 Test Sentences 500 Words 8749 7946 (Och and Ney, 2003), the first 100 sentences of the test corpus are used as a development corpus to optimize model parameters that are not trained via the EM algorithm, e.g. the discounting parameter for lexicon smoothing. |
| C04-1006 17 107:187 In (Och and Ney, 2003), the alignment quality of statistical models is compared to alternative approaches, e.g. using the Dice coefficient or the competitive linking algorithm. |
| C04-1006 18 180:187 We have evaluated these methods on the Verbmobil task and the Canadian Hansards task and compared our results to the state-of-the-art system of (Och and Ney, 2003). |
| P07-1001 19 130:185 We also compare with results of IBM Model-4 word alignments implemented in GIZA++ toolkit (Och and Ney, 2003). |
| P07-1001 20 117:185 By combining word alignments in two directions using heuristics (Och and Ney, 2003), a single set of static word alignments is then formed. |
| P07-1001 21 152:185 We simply modify the GIZA++ toolkit (Och and Ney, 2003) by always weighting lexicon probabilities with soft constraints during iterative model training, and obtain 0.7% TER reduction on both sets and 0.4% BLEU improvement on the test set. |
| D07-1006 22 163:193 (Och and Ney, 2003) invented heuristic symmetriza57 FRENCH/ENGLISH ARABIC/ENGLISH SYSTEM F-MEASURE ( = 0.4) BLEU F-MEASURE ( = 0.1) BLEU GIZA++ 73.5 30.63 75.8 51.55 (FRASER AND MARCU, 2006B) 74.1 31.40 79.1 52.89 LEAF UNSUPERVISED 74.5 72.3 LEAF SEMI-SUPERVISED 76.3 31.86 84.5 54.34 Table 3: Experimental Results tion of the output of a 1-to-N model and a M-to-1 model resulting in a M-to-N alignment, this was extended in (Koehn et al. , 2003). |
| D07-1006 23 86:193 (Och and Ney, 2003) discussed efficient implementation. |
| D07-1006 24 112:193 For all non-LEAF systems, we take the best performing of the union, refined and intersection symmetrization heuristics (Och and Ney, 2003) to combine the 1-to-N and M-to-1 directions resulting in a M-to-N alignment. |
| D07-1006 25 69:193 2.2 Unsupervised Parameter Estimation We can perform maximum likelihood estimation of the parameters of this model in a similar fashion to that of Model 4 (Brown et al. , 1993), described thoroughly in (Och and Ney, 2003). |
| D07-1006 26 111:193 4.2 Experiments To build all alignment systems, we start with 5 iterations of Model 1 followed by 4 iterations of HMM (Vogel et al. , 1996), as implemented in GIZA++ (Och and Ney, 2003). |
| D07-1006 27 70:193 We use Viterbi training (Brown et al. , 1993) but neighborhood estimation (Al-Onaizan et al. , 1999; Och and Ney, 2003) or pegging (Brown et al. , 1993) could also be used. |
| D07-1006 28 61:193 (Och and Ney, 2003) presented results suggesting that the additional parameters required to ensure that a model is not deficient result in inferior performance, but we plan to study whether this is the case for our generative model in future work. |
| D07-1006 29 128:193 For French/English translation we use a state of the art phrase-based MT system similar to (Och and Ney, 2004; Koehn et al. , 2003). |
| D07-1006 30 181:193 Our work is most similar to work using discriminative log-linear models for alignment, which is similar to discriminative log-linear models used for the SMT decoding (translation) problem (Och and Ney, 2002; Och, 2003). |
| N04-1035 31 165:232 For the FBIS corpus (representing eight million English words), we automatically generated word-alignments using GIZA++ (Och and Ney, 2003), which we trained on a much larger data set (150 million words). |
| P06-1090 32 100:135 Word translation probabilities are obtained by using GIZA++ (Och and Ney, 2003). |
| P06-1090 33 45:135 Here, we added the second terma3a5a4 a2 a33 a6a0 a3 a8 to cope with the asymmetry between a3a5a4 a0 a3 a6a2 a33a50a8 and a3a5a4 a2 a33 a6 a0 a3 a8 . The word translation probabilities are estimated using the GIZA++ (Och and Ney, 2003). |
| P06-1090 34 9:135 a1 Graduated in March 2006 Standard phrase-based translation systems use a word distance-based reordering model in which non-monotonic phrase alignment is penalized based on the word distance between successively translated source phrases without considering the orientation of the phrase alignment or the identities of the source and target phrases (Koehn et al. , 2003; Och and Ney, 2004). |
| P07-1039 35 103:170 4.3 Baseline We use a standard log-linear phrase-based statistical machine translation system as a baseline: GIZA++ implementation of IBM word alignment model 4 (Brown et al. , 1993; Och and Ney, 2003),8 the refinement and phrase-extraction heuristics described in (Koehn et al. , 2003), minimum-error-rate training 7More specifically, we choose the first English reference from the 7 references and the Chinese sentence to construct new sentence pairs. |
| P07-1039 36 26:170 To quickly (and approximately) evaluate this phenomenon, we trained the statistical IBM wordalignment model 4 (Brown et al. , 1993),1 using the GIZA++ software (Och and Ney, 2003) for the following language pairs: ChineseEnglish, Italian English, and DutchEnglish, using the IWSLT-2006 corpus (Takezawa et al. , 2002; Paul, 2006) for the first two language pairs, and the Europarl corpus (Koehn, 2005) for the last one. |
| P07-1039 37 89:170 : there is : want to : need not : in front of : as soon as : look at Figure 2: Examples of entries from the manually developed dictionary 4 Experimental Setting 4.1 Evaluation The intrinsic quality of word alignment can be assessed using the Alignment Error Rate (AER) metric (Och and Ney, 2003), that compares a systems alignment output to a set of gold-standard alignment. |
| P07-1039 38 109:170 Running words 1,864 14,437 Vocabulary size 569 1,081 Table 2: ChineseEnglish corpus statistics (Och, 2003) using Phramer (Olteanu et al. , 2006), a 3-gram language model with Kneser-Ney smoothing trained with SRILM (Stolcke, 2002) on the English side of the training data and Pharaoh (Koehn, 2004) with default settings to decode. |
| C04-1168 39 102:197 The training of IBM model 4 was implemented by the GIZA++ package (Och and Ney, 2003). |
| H05-1011 40 138:181 Automatic sentence alignment of the training data was provided by Ulrich Germann, and the hand alignments of the words in the test data were created by Franz Och and Hermann Ney (Och and Ney, 2003). |
| H05-1011 41 13:181 For, example, Och and Ney (2003) suggest supervised optimization of a number of parameters, including the probablity of jumping to the empty word in the HMM model, as well as smoothing parameters for the distortion probabilities and fertility probabilities of the more complex models. |
| H05-1011 42 154:181 For comparison, we aligned our parallel corpus with IBM Model 4 using Ochs Giza++ software package (Och and Ney, 2003). |
| H05-1011 43 9:181 (1993), sometimes augmented by an HMM-based model or Och and Neys Model 6 (Och and Ney, 2003). |
| H05-1011 44 156:181 We trained the models in both directions, Englishto-French and French-to-English, and computed the union, intersection, and what Och and Ney (2003) call the refined combination of the two alignments. |
| H05-1011 45 141:181 We report the performance of our alignment models in terms of precision, recall, and alignment error rate (AER) as defined by Och and Ney (2003): recall = |A S| |S| precision = |A P| |A| AER = 1 |A P|+|A S| |A|+|S| In these definitions, S denotes the set of alignments annotated as sure, P denotes the set of alignments annotated possible or sure, and A denotes the set of alignments produced by the method under test. |
| H05-1011 46 64:181 For example, Och and Ney (2003) found that the intersection of the alignments found training the IBM models in both directions always outperformed either direction alone in their experiments. |
| P08-2057 47 37:80 Typical phrasebased SMT approaches obtain word-level alignments from a bilingual corpus using tools such as GIZA++ (Och and Ney, 2003) and extract phrase translation pairs from the bilingual word alignment using heuristics. |
| P08-2057 48 6:80 1 Introduction Recent approaches to statistical speech translation have relied on improving translation quality with the use of phrase translation (Och and Ney, 2003; Koehn, 2004). |
| I08-2088 49 68:145 3.2.2 Features We used eight features (Och and Ney, 2003; Koehn et al., 2003) and their weights for the translations. |
| I08-2088 50 67:145 We used the preprocessed data to train the phrase-based translation model by using GIZA++ (Och and Ney, 2003) and the Pharaoh tool kit (Koehn et al., 2003). |
| W06-1204 51 215:232 State-of-art systems for doing word alignment use generative models like GIZA++ (Och and Ney, 2003; Brown et al. , 1993). |
| W06-1204 52 167:232 7.2 Experiments with Giza We evaluated our discriminative approach by comparing it with the state-of-art Giza++ alignments (Och and Ney, 2003). |
| W06-1204 53 8:232 The alignment error rate which we achieve (AER = 0.5040) is significantly better (about 10% decrease in AER) than the alignment error rates of the state-of-art models (Och and Ney, 2003) (Best AER = 0.5518) on the English-Hindi dataset. |
| W06-1204 54 229:232 Our overall results are better than those obtained using the GIZA++ models (Och and Ney, 2003). |
| W08-0305 55 26:200 The de-facto answer came during the 1990s from the research community on Statistical Machine Translation, who made use of statistical tools based on a noisy channel model originally developed for speech recognition (Brown et al., 1994; Och and Weber, 1998; R.Zens et al., 2002; Och and Ney, 2001; Koehn et al., 2003). |
| W08-0305 56 63:200 Moses uses standard external tools for some of these tasks, such as GIZA++ (Och and Ney, 2003) for word alignments and SRILM (Stolcke, 2002) for language modeling. |
| W08-1403 57 53:187 It is also updated continuously and incorporates other alignment models, such as GIZA++ (Och & Ney 2003). |
| C04-1015 58 87:201 Translation Model and Language Model We used a lexicon model of IBM Model 4 learned by GIZA++ (Och and Ney, 2003) and word bigram and trigram models learned by CMU-Cambridge Statistical Language Modeling Toolkit (Clarkson and Rosenfeld, 1997). |
| W08-0328 59 7:74 This setup provides an elegant solution to the fairly complex task of integrating multiple MT results that may differ in word order using only standard software modules, in particular GIZA++ (Och and Ney, 2003) for the identification of building blocks and Moses for the recombination, but the authors were not able to observe improvements in 1see http://www.statmt.org/moses/ terms of BLEU score. |
| W08-0326 60 36:80 Assuming that the parameters P(etk|fsk) are known, the most likely alignment is computed by a simple dynamic-programming algorithm.1 Instead of using an Expectation-Maximization algorithm to estimate these parameters, as commonly done when performing word alignment (Brown et al., 1993; Och and Ney, 2003), we directly compute these parameters by relying on the information contained within the chunks. |
| W08-0326 61 20:80 For example, our system configuration for the shared task incorporates a wrapper around GIZA++ (Och and Ney, 2003) for word alignment and a wrapper around Moses (Koehn et al., 2007) for decoding. |
| C08-1067 62 108:167 Figure 2: Manual reference: regular links are indicated by xs, fuzzy links and null links by 0s To evaluate the systems performance, we used the evaluation methodology of Och and Ney (2003). |
| W06-2705 63 93:182 For the purpose of the CroCo project word alignment is realised with GIZA++ (Och & Ney 2003), a statistical alignment tool. |
| W06-2008 64 82:175 Second, it allows to skip the cases for whichitisdifficulttodecidewhethertheyarecorrect or not: syntactic analysis may be ambiguous and translation often makes it difficult to determine which source unit corresponds to which target one (Och and Ney, 2003). |
| W06-2008 65 68:175 (the)Regulation_Nsg SUBJ establishes_Vsg (le)rglement_Nmsg SUBJ dtermine_Vsg covering_PPR OBJ placing_PPR PMOD on_PREP PCOMP (the)market_Nsg (qui)rgissent_Vpl OBJ (la)mise_Nfsg PMOD sur_PREP PCOMP (le)march_Nmsg further_ADJ MOD calls_Npl appels_Nmpl MOD supplmentaires_ADJpl (the)Member_Nsg MOD States_Npl (les)tats_Nmpl MOD Membres_Nmpl (thedebates)clearly_ADV MOD illustrate_Vpl (lesdbats)montrent_Vpl MOD clairement_ADV (placingon)the_DET DET market_Nsg la_DET DET mise(sur)le_DET DET march_Nmsg Table 2: Syntactic dependencies identified with SYNTEX 4.2.3 Wordalignment The English/Polish parts of the corpus on the one hand, and the French/Polish parts on the other hand,havebeenalignedatthewordlevelusingthe GIZA++package 9 (OchandNey,2003). |
| W06-3106 66 119:161 The training corpus has been aligned at the word level by two Viterbi word-alignments (French2English and English2French) that we combined in a heuristic way similar to the refined method described in (Och and Ney, 2003). |
| D07-1056 67 26:196 There have been considerable amount of efforts to improve the reordering model in SMT systems, ranging from the fundamental distance-based distortion model (Och and Ney, 2004; Koehn et al. , 2003), flat reordering model (Wu, 1996; Zens et al. , 2004; Kumar et al. , 2005), to lexicalized reordering model (Tillmann, 2004; Kumar et al. , 2005; Koehn et al. , 2005), hierarchical phrase-based model (Chiang, 2005), and maximum entropy-based phrase reordering model (Xiong et al. , 2006). |
| D07-1056 68 149:196 6 Training Similar to most state-of-the-art phrase-based SMT systems, we use the SRI toolkit (Stolcke, 2002) for language model training and Giza++ toolkit (Och and Ney, 2003) for word alignment. |
| W07-0737 69 65:228 Specifically, we generate wordalignments using GIZA++ (Och 2001) in both directions and combine them using the refined methodology (Och and Ney 2003), and then we applied Koehn?s toolkit (2004) to extract parallel phrases. |
| W05-0815 70 60:82 For this, an alignment was obtained by training an IBM model 4 using GIZA++ (Och and Ney, 2003). |
| C04-1053 71 110:176 See (Och and Ney, 2003) and Arenberg et al. , 2000) for different evaluation metrics. |
| E06-1046 72 143:194 a62 a60 a123 a66a18a68a36a69a57a70a38a71a72a68a38a73 a145a55a146 a75 a76 a62a142a143a27a81a83a62a124a123 a79 (7) We then use a Markov approximation (trigram for our purposes) to compute the joint probability a75 a76 a62 a143 a81a83a62 a123 a79 . a62 a60 a123 a66a18a68a36a69a57a70a38a71a72a68a36a73 a145 a146 a97 a75 a76 a62 a98 a143 a81a83a62 a98 a123 a77a147a62 a98 a104 a65 a143 a81a83a62 a98 a104a23a105 a143 a81a83a62 a98 a104 a65 a123 a81a83a62 a98 a104a23a105 a123 a79 (8) where a62a142a143 a66 a62 a65 a143 a62 a105 a143 a53a148a53a148a53 a62 a103 a143 and a62a124a123 a66 a62 a65 a123 a62 a105 a123 a53a148a53a148a53 a62a150a149 a123 . In order to compute the joint probability, we need to construct an alignment between tokens a76 a62 a98 a143 a81a83a62 a98 a123 a79 . We use the viterbi alignment provided by GIZA++ toolkit (Och and Ney, 2003) for this purpose. |
| E06-1046 73 28:194 To this end, we adopt techniques from statistical machine translation (Brown et al. , 1993; Och and Ney, 2003) and use statistical alignment to learn the edit patterns. |
| C08-1064 74 35:260 Our baseline uses Giza++ alignments (Och and Ney, 2003) symmetrized with the grow-diag-final-and heuristic (Koehn et al., 2003). |
| C08-1064 75 116:260 Except where noted, each system was trained on 27 million words of newswire data, aligned with GIZA++ (Och and Ney, 2003) and symmetrized with the grow-diag-final-and heuristic (Koehn et al., 2003). |
| W08-0324 76 15:79 To obtain high-quality statistical word alignments, we run GIZA++ (Och and Ney, 2003) in both the source-to-target and target-to-source directions, then combine the resulting alignments with the Sym2 symmetric alignment heuristic of OrtizMartnez et al. |
| N07-1022 77 77:209 In WASP, GIZA++ (Och and Ney, 2003) is used to obtain the best alignments from the training examples. |
| W05-0820 78 31:91 The field of statistical machine translation has been blessed with a long tradition of freely available software tools such as GIZA++ (Och and Ney, 2003) and parallel corpora such as the Canadian Hansards2. |
| W05-0820 79 46:91 In addition, we also made a word alignment available, which was derived using a variant of the current default method for word alignment Och and Ney (2003)s refined method. |
| I08-4001 80 133:153 We use the default tool in the Moses to train the model and tune the weights, in which the word alignment tool is Giza++ (Och and Ney 2003) and the language model tool is SRILM(Stolcke, 2002). |
| N07-1029 81 96:215 If the alignments are not available, they can be automatically generated; e.g., using GIZA++ (Och and Ney, 2003). |
| I08-3011 82 139:161 Some tools have been reused for this purpose: GIZA++: for word/morpheme alignment we used the GIZA++ statistical word alignment toolkit, and following the refined method of (Och and Ney, 2003), extracted a set of high-quality word/ morpheme alignments from the original unidirectional alignment sets. |
| P04-1060 83 22:158 We use the Europarl corpus (Koehn, 2002), and the statistical word alignment was performed with the GIZA++ toolkit (Al-Onaizan et al. , 1999; Och and Ney, 2003).1 For the current experiments we assume no preexisting parser for any of the languages, contrary to the information projection scenario. |
| P04-1060 84 100:158 For each span in the chart, we get a weight factor that is multiplied with the parameter-based expectations.9 4 Experiments We applied GIZA++ (Al-Onaizan et al. , 1999; Och and Ney, 2003) to word-align parts of the Europarl corpus (Koehn, 2002) for English and all other 10 languages. |
| P06-1146 85 119:209 Word alignments were computed with the GIZA++ toolkit (Och and Ney, 2003), using the 3The corpus can be downloaded from http://www. |
| I05-5001 86 97:155 To compute Precision, Recall, and Alignment Error Rate (AER), we adhere to the formulae listed in Och & Ney (2003). |
| I05-5001 87 93:155 Following the practice of Och & Ney (2000, 2003), the annotators each created an initial annotation, categorizing alignments as either SURE (necessary) or POSSIBLE (allowed, but not required). |
| I05-5001 88 88:155 Since our purpose in the present work is nonapplication specific corpus construction, we apply an automated technique that is widely used for reporting intermediate results in the SMT community, and is being extended in other fields such as summarization (Daum and Marcu, forthcoming), namely word-level alignment using an off-the-shelf implementation of the SMT system GIZA++ (Och & Ney, 2003). |
| I05-5001 89 8:155 One promising approach extends standard Statistical Machine Translation (SMT) techniques (e.g. , Brown et al. , 1993; Och & Ney, 2000, 2003) to the problems of monolingual paraphrase identification and generation. |
| I05-5001 90 12:155 (2004) describe an end-to-end paraphrase identification and generation system using GIZA++ (Och & Ney, 2003) and a monotone decoder to generate informationpreserving paraphrases. |
| W07-0719 91 37:244 We used the GIZA++ SMT Toolkit1 (Och and Ney, 2003) to generate word alignments. |
| P07-1018 92 15:178 Thus the present system is unlike SMT (Och and Ney, 2003), where lexical selection is effected by a translation model based on aligned, parallel corpora, but the novel techniques it has developed are exploitable in the SMT paradigm. |
| P07-1018 93 106:178 It has been aligned on the sentence level by JAPA (Langlais et al. , 1998), and further on the word level by GIZA++ (Och and Ney, 2003). |
| H05-1024 94 31:187 These approaches include an enhanced HMM alignment model that uses part-ofspeech tags (Toutanova et al. , 2002), a log-linear combination of IBM translation models and HMM models (Och and Ney, 2003), techniques that rely on dependency relations (Cherry and Lin, 2003), and a log-linear combination of IBM Model 3 alignment probabilities, POS tags, and bilingual dictionary coverage (Liu et al. , 2005). |
| H05-1024 95 30:187 The standard method to overcome this problem to use the model in both directions (interchanging the source and target languages) and applying heuristic-based combination techniques to produce a refined alignment (Och and Ney, 2000; Koehn et al. , 2003)henceforth referred to as RA. Several researchers have proposed algorithms for improving word alignment systems by injecting additional knowledge or combining different alignment models. |
| H05-1024 96 59:187 For our experiments, we chose GIZA++ (Och and Ney, 2000) and the RA approach (Koehn et al. , 2003) the best known alignment combination technique as our initial aligners.1 4.2 TBL Templates Our templates consider consecutive words (of size 1, 2 or 3) in both languages. |
| W04-2708 97 102:184 Furthermore, by training GIZA++ (Och and Ney, 2003) translation model on the training part of the PCEDT extended by the manual dictionaries, we obtained a probabilistic Czech-English dictionary, more sensitive to the domain of financial news specific for the Wall Street Journal. |
| W07-0701 98 139:168 The comparison phrasal system was constructed using the same GIZA++ alignments and the heuristic combination described in (Och & Ney, 2003). |
| P05-2022 99 144:149 5 Precision, recall and f-measure reported by Och and Ney (2003) for the intersection of IBM-4 Viterbi alignments from both translation directions. |
| P06-2057 100 48:194 We projected the bracketing of the target words and the frame elements onto the Swedish side of the corpus by using the Giza++ word aligner (Och and Ney, 2003). |
| W08-0302 101 114:197 Baseline We use the Moses MT system (Koehn et al., 2007) as a baseline and closely follow the example training procedure given for the WMT-07 and WMT-08 shared tasks.4 In particular, we perform word alignment in each direction using GIZA++ (Och and Ney, 2003), apply the grow-diag-finaland heuristic for symmetrization and use a maximum phrase length of 7. |
| P08-2020 102 40:80 xin 0.30 cna 0.06 nyt 0.03 bil 0.26 un 0.07 ltw 0.01 afp 0.21 apw 0.05 Table 3: LM interpolation weights per source 3.2 Speeding up Model Training To accelerate the training of word alignment models we implemented a distributed version of GIZA++ (Och and Ney, 2003), based on the latest version of GIZA++ and a parallel version developed at Peking University (Lin et al., 2006). |
| C04-1127 103 71:153 We used word alignment obtained by using Giza++ (Och and Ney, 2003) to get names in the English translation from names in the original Japanese sentences. |
| W07-0726 104 32:61 3see http://www.statmt.org/moses/ 194 4 Implementation Details 4.1 Alignment of MT output The input text and the output text of the MT systems was aligned by means of GIZA++ (Och and Ney, 2003), a tool with which statistical models for alignment of parallel texts can be trained. |
| W08-0509 105 49:208 While (Och and Ney, 2003) presents algorithm to implement counting over all the alignments for Model 1,2 and HMM, it is prohibitive to do that for Models 3 through 6. |
| W08-0509 106 41:208 2.2 Implementation of GIZA++ GIZA++ is an implementation of ML estimators for several statistical alignment models, including IBM Model 1 through 5 (Brown et al., 1993), HMM (Vogel et al., 1996) and Model 6 (Och and Ney, 2003). |
| W08-0509 107 39:208 In order to express the probability in statistical way, several different parametric forms of P(fJ1 ,aJ1|eI1) = p(fJ1 ,aJ1|eI1) have been proposed, and the parameters can be estimated using maximum likelihood estimation(MLE) on a training corpus(Och and Ney, 2003). |
| W08-0509 108 8:208 The most widely used tool to perform this training step is the well-known GIZA++(Och and Ney, 2003). |
| W08-0509 109 37:208 Given a source string fJ1 = f1,,fj,,fJ and a target string eI1 = e1,,ei,,eI, an alignment A of the two strings is defined as(Och and Ney, 2003): A {(j,i) : j = 1,,J;i = 0,,I} (1) in case that i = 0 in some (j,i) A, it represents that the source word j aligns to an empty target word e0. |
| W08-0509 110 44:208 (Och and Ney, 2003) So in this paper we focus on Model 1, HMM, Model 3 and 4. |
| P08-1066 111 141:243 Given sentence-aligned bi-lingual training data, we first use GIZA++ (Och and Ney, 2003) to generate word level alignment. |
| N07-2009 112 53:91 For comparison, we use the MT training program, GIZA++ (Och and Ney, 2003), the phrase-base decoder, Pharaoh (Koehn et al. , 2003), and the wordbased decoder, Rewrite (Germann, 2003). |
| D08-1033 113 202:234 5.1 Baseline System We trained Moses on all Spanish-English Europarl sentences up to length 20 (177k sentences) using GIZA++ Model 4 word alignments and the growdiag-final-and combination heuristic (Koehn et al., 2007; Och and Ney, 2003; Koehn, 2002), which performed better than any alternative combination heuristic.13 The baseline estimates (Heuristic) come fromextractingphrasesuptolength7fromtheword alignment. |
| W08-0409 114 103:167 4.3 Baselines 4.3.1 Word Alignment We used the GIZA++ implementation of IBM word alignment model 4 (Brown et al., 1993; Och and Ney, 2003) for word alignment, and the heuristics described in (Och and Ney, 2003) to derive the intersection and refined alignment. |
| W08-0409 115 108:167 73 ment and phrase-extraction heuristics described in (Koehn et al., 2003), minimum-error-rate training (Och, 2003), a trigram language model with KneserNey smoothing trained with SRILM (Stolcke, 2002) on the English side of the training data, and Moses (Koehn et al., 2007) to decode. |
| W08-0409 116 110:167 Slightly differently from (Och and Ney, 2003), we use possible alignments in computing recall. |
| W08-0409 117 91:167 Since manual word alignment is an ambiguous task, we also explicitly allow for ambiguous alignments, i.e. the links are marked as sure (S) or possible (P) (Och and Ney, 2003). |
| D07-1041 118 119:222 3.1.2 GIZA++ alignments Giza++ (Och and Ney, 2003) was also used to derive 1-to-n word alignments between the NET Bible and the Wycliffe Bible. |
| H05-1022 119 93:196 Alignment performance is measured by the Alignment Error Rate (AER) (Och and Ney, 2003) AER(B;B) = 12|B B|/(|B|+|B|) where B is a set reference word links, and B are the word links generated automatically. |
| H05-1022 120 43:196 The hallucination process is motivated by the use of NULL alignments into Markov alignment models as done by (Och and Ney, 2003). |
| P07-1003 121 26:186 (2006) to initialize the model parameters, we achieve an AER superior to the GIZA++ implementation of IBM model 4 (Och and Ney, 2003) and a reduction of 56.3% in aligned interior nodes, a measure of agreement between alignments and parses. |
| P07-1003 122 147:186 For Chinese, we trained on the FBIS corpus and the LDC bilingual dictionary, then tested on 491 hand-aligned sentences from the 2002 2The hand-aligned test data has been annotated with both sure alignments S and possible alignments P, with S P, according to the specifications described in Och and Ney (2003). |
| P07-1003 123 114:186 Och and Ney (2003) gives a detailed exposition of the technique. |
| P07-1082 124 17:191 From this perspective, past work is roughly divided into those methods which apply a word alignment tool such as GIZA++ (Och and Ney, 2003), and approaches that combine the alignment step into their main transliteration process. |
| W05-0801 125 171:178 9 Conclusions The conventional wisdom in the statistical MT community has been that heuristic alignment methods based on word association statistics could not be competitive with methods that have a well-founded mathematical theory that underlies their parameter estimation (Och and Ney, 2003, p. 37). |
| W05-0801 126 30:178 and the hand alignments of the words in the trial and test data were created by Franz Och and Hermann Ney (Och and Ney, 2003). |
| W05-0801 127 23:178 We report the performance of various alignment algorithms in terms of precision, recall, and alignment error rate (AER) as defined by Och and Ney (2003): recall = |A S| |S| precision = |A P| |A| AER = 1 |A P|+ |A S| |A| + |S| In these definitions, S denotes the set of alignments annotated as sure, P denotes the set of alignments annotated possible or sure, and A denotes the set of alignments produced by the method under test. |
| W05-0801 128 139:178 7 Evaluation We computed the recall, precision, and AER on the held-out subset of the English-French data both for our Method 4C (using parameter values optimized on the development subset) and for IBM Model 4, computed using Ochs Giza++ software package (Och and Ney, 2003) trained on the same data as Method 4C. |
| W05-0801 129 9:178 (1993), sometimes augmented by an HMM-based model or Och and Neys Model 6 (Och and Ney, 2003). |
| W05-0801 130 141:178 We trained and evaluated the models in both directions, English-toFrench and French-to-English, as well as the union, intersection, and what Och and Ney (2003) call the refined combination of the two alignments. |
| C04-1045 131 30:76 Detailed description of those models can be found in (Brown et al. , 1993), (Vogel et al. , 1996) and (Och and Ney, 2003). |
| C04-1045 132 44:76 For the other models, the sum is approximated using an appropriately defined neighbourhood of the Viterbi alignment (see (Och and Ney, 2003) for details). |
| C04-1045 133 14:76 A good overview of all these models is given in (Och and Ney, 2003) where the model IBM-6 is also introduced as the log-linear interpolation of the other models. |
| P04-1023 134 68:168 In order to measure the accuracy of the predictions that the statistical translation models make under our various experimental settings, we choose the alignment error rate (AER) metric, which is defined in Och and Ney (2003). |
| P04-1023 135 148:168 6 Related Work Och and Ney (2003) is the most extensive analysis to date of how many different factors contribute towards improved alignments error rates, but the inclusion of word-alignments is not considered. |
| P04-1023 136 105:168 For this experiment we allowed a bilingual dictionary to constrain which words can act as translations of each other during the initial estimates of translation probabilities (as described in Och and Ney (2003)). |
| P04-1023 137 65:168 4 Experimental Design To perform our experiments with word-level alignements we modified GIZA++, an existing and freely available implementation of the IBM models and HMM variants (Och and Ney, 2003). |
| W08-0304 138 161:189 (2003) of running GIZA++ (Och & Ney, 2003) in both directions and then merging the alignments using the grow-diag-final heuristic. |
| W08-0304 139 6:189 1 Introduction Och (2003) introduced minimum error rate training (MERT) as an alternative training regime to the conditional likelihood objective previously used with log-linear translation models (Och & Ney, 2002). |
| H05-1057 140 157:240 4.4 Implementation Details We use GIZA++ (Och and Ney, 2003) to train the machine translation system and the ISI ReWrite Decoder (ISI, 2001) to do the actual translations. |
| W05-0809 141 99:120 Several teams had approaches that relied (to varying degrees) on an IBM model of statistical machine translation (Brown et al. , 1993), with different improvements brought by different teams, consisting of new submodels, improvements in the HMM model, model combination for optimal alignment, etc. Se-veral teams used symmetrization metrics, as introduced in (Och and Ney, 2003) (union, intersection, refined), most of the times applied on the alignments produced for the two directions sourcetarget and targetsource, but also as a way to combine different word alignment systems. |
| P08-4006 142 12:76 Consequently, considerable effort has gone into devising and improving automatic word alignment algorithms, and into evaluating their performance (e.g., Och and Ney, 2003; Taskar et al., 2005; Moore et al., 2006; Fraser and Marcu, 2006, among many others). |
| E06-2002 143 31:77 Starting from the parallel training corpus, provided with direct and inverted alignments, the socalled union alignment (Och and Ney, 2003) is computed. |
| E06-2002 144 30:77 This preprocessing step can be accomplished by applying the GIZA++ toolkit (Och and Ney, 2003) that provides Viterbi alignments based on IBM Model-4. |
| W05-0835 145 146:185 Two different alignments were used: (a) the one provided in the definition of the task and (b) one obtained using GIZA++ (Och and Ney, 2003) to train an IBMs model 4. |
| P05-1074 146 102:147 Och and Ney (2003) show that the accuracy of alignments produced by Giza++ improve as the size of the training corpus increases. |
| P05-1074 147 58:147 We produced automatic alignments for it with the Giza++ toolkit (Och and Ney, 2003). |
| P05-1074 148 49:147 We use the heuristic for phrase alignment described in Och and Ney (2003) which aligns phrases by incrementally building longer phrases from words and phrases which have adjacent alignment points.1 1Note that while we induce the translations of phrases from 598 what is more, the relevant cost dynamic is completelyunder control im brigen ist die diesbezgliche kostenentwicklung vllig unter kontrolle we owe it to the taxpayers to keep in checkthe costs wir sind es den steuerzahlern die kosten zu habenschuldig unter kontrolle Figure 2: Using a bilingual parallel corpus to extract paraphrases 2.2 Assigning probabilities We define a paraphrase probability p(e2|e1) in terms of the translation model probabilities p(f|e1), that the original English phrase e1 translates as a particular phrase f in the other language, and p(e2|f), that the candidate paraphrase e2 translates as the foreign language phrase. |
| D07-1005 149 47:211 (2) We note that these posterior probabilities can be computed efficiently for some alignment models such as the HMM (Vogel et al. , 1996; Och and Ney, 2003), Models 1 and 2 (Brown et al. , 1993). |
| D07-1005 150 8:211 High quality word alignments can yield more accurate phrase-pairs which improve quality of a phrase-based SMT system (Och and Ney, 2003; Fraser and Marcu, 2006b). |
| D07-1005 151 189:211 Such an approach contrasts with the log-linear HMM/Model-4 combination proposed by Och and Ney (2003). |
| D07-1005 152 9:211 Much of the recent work in word alignment has focussed on improving the word alignment quality through better modeling (Och and Ney, 2003; Deng and Byrne, 2005; Martin et al. , 2005) or alternative approaches to training (Fraser and Marcu, 2006b; Moore, 2005; Ittycheriah and Roukos, 2005). |
| D07-1005 153 36:211 2 Word Alignment Framework A statistical translation model (Brown et al. , 1993; Och and Ney, 2003) describes the relationship between a pair of sentences in the source and target languages (f = fJ1,e = eI1) using a translation probability P(f|e). |
| D07-1005 154 109:211 Our human word alignments do not distinguish between Sure and Probable links (Och and Ney, 2003). |
| N07-1061 155 28:313 2 Phrase-based SMT We use a phrase-based SMT system, Pharaoh, (Koehn et al. , 2003; Koehn, 2004), which is based on a log-linear formulation (Och and Ney, 2002). |
| N07-1061 156 27:313 This is the shared task baseline system for the 2006 NAACL/HLT workshop on statistical machine translation (Koehn and Monz, 2006) and consists of the Pharaoh decoder (Koehn, 2004), SRILM (Stolcke, 2002), GIZA++ (Och and Ney, 2003), mkcls (Och, 1999), Carmel,1 and a phrase model training code. |
| W06-3112 157 46:135 Word alignment and phrase extraction We used the GIZA++ word alignment software 3 to produce initial word alignments for our miniature bilingual corpus consisting of the source French file and the English reference file, and the refined word alignment strategy of (Och and Ney, 2003; Koehn et al. , 2003; Tiedemann, 2004) to obtain improved word and phrase alignments. |
| P06-2112 158 11:214 Many researchers build alignment links with bilingual corpora (Wu, 1997; Och and Ney, 2003; Cherry and Lin, 2003; Zhang and Gildea, 2005). |
| I08-6003 159 40:175 In this study, the translation lexicon was created by aligning part of the German-English portion of the Europarl corpus (Koehn, 2005) using the Giza++ package (Och and Ney, 2003). |
| W07-0407 160 20:217 In the past, popular approaches for doing word alignment have largely been generative (Och and Ney, 2003; Vogel et al. , 1996). |
| W07-0407 161 174:217 As the recall of the alignment links of the intersection is very low for this dataset, further refinements of the alignments as suggested by (Och and Ney, 2003) were not performed. |
| W05-0804 162 38:210 The (Och and Ney, 2003) model includes other refinements such as special treatment of a jump to a Null word, and a uniform smoothing prior. |
| P06-1002 163 24:186 2 Related Work Starting with the IBM models (Brown et al. , 1993), researchers have developed various statistical word alignment systems based on different models, such as hidden Markov models (HMM) (Vogel et al. , 1996), log-linear models (Och and Ney, 2003), and similarity-based heuristic methods (Melamed, 2000). |
| W05-0814 164 20:74 We applied the union, intersection and refined symmetrization metrics (Och and Ney, 2003) to the final alignments output from training, as well as evaluating the two final alignments directly. |
| W05-0814 165 8:74 For symmetrization, we found that Och and Neys refined technique described in (Och and Ney, 2003) produced the best AER for this data set under all experimental conditions. |
| W05-0814 166 7:74 The system used for baseline experiments is two runs of IBM Model 4 (Brown et al. , 1993) in the GIZA++ (Och and Ney, 2003) implementation, which includes smoothing extensions to Model 4. |
| P06-2095 167 173:177 The proposed model looks similar to some implementations of statistical machine translation (SMT), which typically uses a parallel corpus for its translation model, and then finds the best possible recombination that fits into the target language model (Och and Ney, 2003). |
| H05-1085 168 122:254 Our translation models were trained using GIZA++ (Och and Ney, 2003), which we modi1Although we did not use it for the experiments in this paper, the PCEDT corpus does contain lemma information for the English data. |
| P06-1098 169 11:74 Feature function scaling factors m are optimized based on a maximum likely approach (Och and Ney, 2002) or on a direct error minimization approach (Och, 2003). |
| P06-1098 170 65:74 Many-to-many word alignments are induced by running a one-to-many word alignment model, such as GIZA++ (Och and Ney, 2003), in both directions and by combining the results based on a heuristic (Koehn et al. , 2003). |
| W08-0309 171 104:288 The word alignments were created with Giza++ (Och and Ney, 2003) applied to a parallel corpus containing the complete Europarl training data, plus sets of 4,051 sentence pairs created by pairing the test sentences with the reference translations, and the test sentences paired with each of the system translations. |
| P06-2107 172 121:185 Word Error Rate (WER): Edit distance in terms of words between the target sentence provided by the system and the reference translation (Och and Ney, 2003). |
| P06-2107 173 72:185 These alignments can be obtained from single-word models (Brown et al. , 1993) using the available public software GIZA++ (Och and Ney, 2003). |
| D08-1065 174 148:259 We then train word alignment models (Och and Ney, 2003) using 6 Model-1 iterations and 6 HMM iterations. |
| W08-0333 175 109:168 Figure 6 shows measurements of the average iteration run-time for Model 1 and the HMM alignment model as implemented in Giza++ (Och and Ney, 2003), a state-of-the-art C++ implementation of the IBM and HMM alignment models that is widely used. |
| P07-1120 176 9:209 Pipeline systems are ubiquitous in natural language processing, used not only in parsing (Ratnaparkhi, 1999; Charniak, 2000), but also machine translation(OchandNey, 2003)andspeechrecognition (Fiscus, 1997; Goel et al. , 2000), among others. |
| P08-2015 177 88:126 Word alignment is done with GIZA++ (Och and Ney, 2003). |
| H05-1009 178 123:201 We computed precision, recall and error rate on the entire set of sentence pairs for each data set.5 To evaluate NeurAlign, we used GIZA++ in both directions (E-to-F and F-to-E, where F is either Chinese (C) or Spanish (S)) as input and a refined alignment approach (Och and Ney, 2000) that uses a heuristic combination method called grow-diagfinal (Koehn et al. , 2003) for comparison. |
| H05-1009 179 25:201 Other approaches to improving alignment have combined alignment models, e.g., using a log-linear combination (Och and Ney, 2003) or mutually independent association clues (Tiedemann, 2003). |
| I08-2087 180 37:170 (2003), bilingual sentences are trained by GIZA++ (Och and Ney 2003) in two directions (from source to target and target to source). |
| D07-1104 181 282:433 To generate alignments, we used GIZA++ (Och and Ney, 2003). |
| D07-1104 182 21:433 So far, these techniques have focused on phrasebased models using contiguous phrases (Koehn et al. , 2003; Och and Ney, 2004). |
| P04-1083 183 137:203 We shall consider the common synchronization scenario where a lexicalized monolingual grammar is available for at least one component.5 Also, given a tokenized set of a87 -tuples of parallel sentences, it is always possible to estimate a word-to-word translation model a0a2a1 a23a4a3 a51 a49a6a5 a3 a41 a51 a48 a49 a24 (e.g. , Och & Ney, 2003).6 A word-to-word translation model and a lexicalized monolingual grammar are sufficient to drive a synchronizer. |
| P06-1011 184 72:172 Following Och and Ney (2003), we run GIZA++ in both directions, and then symmetrize the alignments using the refined heuristic. |
| P06-1009 185 160:213 Interestingly, without any features derived from the sentence aligned corpus, our model achieves performance equivalent to Model 3 trained on the full corpus (Och and Ney, 2003). |
| P06-1009 186 11:213 Most current SMT systems (Och and Ney, 2004; Koehn et al. , 2003) use a generative model for word alignment such as the freely available GIZA++ (Och and Ney, 2003), an implementation of the IBM alignment models (Brown et al. , 1993). |
| P06-1009 187 123:213 The word aligned data are annotated with both sure(S)andpossible(P)alignments(S P; Och and Ney (2003)), where the possible alignments indicate ambiguous or idiomatic alignments. |
| P06-1009 188 27:213 We use a similar graphical structure to the directed hidden Markov model (HMM) from GIZA++ (Och and Ney, 2003). |
| P06-1009 189 103:213 ThisdiffersfromtheGIZA++hidden Markov model which has individual parameters for each different jump width (Och and Ney, 2003; Vogel et al. , 1996): we found a single feature (and thus parameter) to be more effective. |
| P06-1009 190 114:213 We use the refined method from Och and Ney (2003) which starts from the intersection of the two models predictions and grows the predicted alignments to neighbouring alignments which only appear in the output of one of the models. |
| W06-2402 191 37:216 More specifically, word-toword alignments are performed in both directions, source-to-target and target-to-source, by using GIZA++ (Och and Ney, 2003), and tuples are extracted from the union set of alignments according to the following constraints (de Gispert and Marino, 2004): a monotonous segmentation of each bilingual sentence pairs is produced, no word inside the tuple is aligned to words outside the tuple, and no smaller tuples can be extracted without violating the previous constraints. |
| W06-2402 192 8:216 Despite the change from a word-based to a phrase-based translation approach, word to word approaches for inferring alignment models from bilingual data (Vogel et al. , 1996; Och and Ney, 2003) continue to be widely used. |
| W06-2402 193 6:216 Present SMT systems have evolved from the original ones in such a way that mainly differ from them in two issues: first, word-based translation models have been replaced by phrasebased translation models (Zens et al. , 2002) and (Koehn et al. , 2003); and second, the noisy channel approach has been expanded to a more general maximum entropy approach in which a log-linear combination of multiple feature functions is implemented (Och and Ney, 2002). |
| N06-1002 194 167:217 Word alignments were produced by GIZA++ (Och and Ney 2003) with a standard training regimen of five iterations of Model 1, five iterations of the HMM Model, and five iterations of Model 4, in both directions. |
| N06-1002 195 174:217 We used the heuristic combination described in (Och and Ney 2003) and extracted phrasal translation pairs from this combined alignment as described in (Koehn et al. , 2003). |
| D07-1080 196 6:227 1 Introduction The recent advances in statistical machine translation have been achieved by discriminatively training a small number of real-valued features based either on (hierarchical) phrase-based translation (Och and Ney, 2004; Koehn et al. , 2003; Chiang, 2005) or syntax-based translation (Galley et al. , 2006). |
| D07-1080 197 149:227 The hierarchical phrase translation pairs are extracted in a standard way (Chiang, 2005): First, the bilingual data are word alignment annotated by running GIZA++ (Och and Ney, 2003) in two directions. |
| P08-1009 198 146:223 Word alignments are provided by GIZA++ (Och and Ney, 2003) with grow-diag-final combination, with infrastructure for alignment combination and phrase extraction provided by the shared task. |
| W06-3108 199 91:203 Then the alignments are symmetrized using a refined heuristic as described in (Och and Ney, 2003). |
| W06-3108 200 90:203 We train IBM Model 4 with GIZA++ (Och and Ney, 2003) in both translation directions. |
| C04-1032 201 111:193 Because we use the same training and testing conditions as (Och and Ney, 2003), we will refer to the results presented in that article as the baseline results. |
| C04-1032 202 141:193 On the Verbmobil task, we obtain a further improvement of 19% relative over the baseline result reported in (Och and Ney, 2003), reaching an AER as low as 3.8%. |
| C04-1032 203 117:193 Tables 3 and 4 show the performance of the one-sided MWEC algorithm in comparison with the experiment reported by (Och and Ney, 2003). |
| C04-1032 204 107:193 We use the same training schemes (model sequences) as presented in (Och and Ney, 2003): 15H5334363 for the Verbmobil Task, i.e. 5 iteration of IBM-1, 5 iterations of the HMM, 3 iteration of IBM-3, etc. for the Canadian Hansards task, we use 15H10334363. |
| C04-1032 205 27:193 A detailed description of the popular translation/alignment models IBM-1 to IBM-5 (Brown et al. , 1993), as well as the Hidden-Markov alignment model (HMM) (Vogel et al. , 1996) can be found in (Och and Ney, 2003). |
| C04-1032 206 166:193 An overview of these models is given in (Och and Ney, 2003). |
| C04-1032 207 105:193 As in (Och and Ney, 2003), the first 100 sentences of the test corpus are used as a development corpus to optimize model parameters that are not trained via the EM algorithm, e.g. the interpolation weights. |
| C04-1032 208 20:193 We will show statistically significant improvements compared to state-ofthe-art results in (Och and Ney, 2003). |
| C04-1032 209 123:193 For the Verbmobil task, the refined method of (Och and Ney, 2003) is used. |
| P08-1012 210 167:198 We also trained a baseline model with GIZA++ (Och and Ney, 2003) following a regimen of 5 iterations of Model 1, 5 iterations of HMM, and 5 iterations of Model 4. |
| W06-1628 211 136:291 To extract AEPs, we perform the following steps: NP and PP Alignment To align NPs and PPs, first all German and English nouns, personal and possessive pronouns, numbers, and adjectives are identified in each sentence and aligned using GIZA++ (Och and Ney, 2003). |
| H05-1010 212 66:145 The validation and test sentences have been hand-aligned (see Och and Ney (2003)) and are marked with both sure and possible alignments. |
| H05-1010 213 89:145 2 With just a Dice feature meaning no learning is needed yet we achieve an AER of 29.8, between the Dice with competitive linking result of 34.0 and Model 1 of 25.9 given in Och and Ney (2003). |
| H05-1010 214 14:145 While tools like GIZA++ (Och and Ney, 2003) do make it easier to build on the long history of the generative IBM approach, they also underscore how complex high-performance generative models can, and have, become. |
| H05-1010 215 85:145 With Dice counts taken from the 1.1M sentences, this gives and AER of 38.7 with English as the target, and 36.0 with French as the target (in line with the numbers from Och and Ney (2003)). |
| H05-1010 216 6:145 1 Introduction The standard approach to word alignment from sentence-aligned bitexts has been to construct models which generate sentences of one language from the other, then fitting those generative models with EM (Brown et al. , 1990; Och and Ney, 2003). |
| H05-1010 217 102:145 With these features, we got an AER of 15.5 (compare to 19.5 for Model 2 in (Och and Ney, 2003)). |
| W05-0810 218 6:106 When efficient techniques have been proposed (Brown et al. , 1993; Och and Ney, 2003), they have been mostly evaluated on safe pairs of languages where the notion of word is rather clear. |
| I05-5002 219 120:264 Until reliable metrics can be established for end-to14 end paraphrase tasksthese will probably need to be application specificthe Alignment Error Rate strategy that was successfully applied in early development of machine translation systems (Och & Ney, 2000, 2003) offers a useful intermediate representation of the coverage and precision of a corpus and extraction techniques. |
| C04-1051 220 97:175 We closely followed the evaluation standards established in Melamed (2001) and Och & Ney (2000, 2003). |
| C04-1051 221 116:175 These alignments can subsequently be recombined in a variety of ways, 5 The formula for AER given here and in Och & Ney (2003) is intended to compare an automatic alignment against a gold standard alignment. |
| C04-1051 222 17:175 Such data would be amenable to conventional statistical machine translation (SMT) techniques (e.g. , those discussed in Och & Ney 2003). |
| C04-1051 223 101:175 To compute Precision, Recall, and Alignment Error Rate (AER) for the twin datasets, we used exactly the formulae listed in Och & Ney (2003). |
| C04-1051 224 92:175 In order to address such questions, we used word Alignment Error Rate (AER), a metric borrowed from the field of statistical machine translation (Och & Ney 2003). |
| H05-1095 225 116:253 A first family of libraries was based on a word alignment A, produced using the Refined method described in (Och and Ney, 2003) (combination of two IBM-Viterbi alignments): we call these the A libraries. |
| H05-1095 226 42:253 The first is to align the words using a standard word alignement technique, such as the Refined Method described in (Och and Ney, 2003) (the intersection of two IBM Viterbi alignments, forward and reverse, enriched with alignments from the union) and then generate bi-phrases by combining together individual alignments that co-occur in the same pair of sentences. |
| H05-1095 227 43:253 This is the strategy that is usually adopted in other phrase-based MT approaches (Zens and Ney, 2003; Och and Ney, 2004). |
| W07-0735 228 52:302 In all experiments, word alignment was obtained using the grow-diag-final heuristic for symmetrizing GIZA++ (Och and Ney, 2003) alignments. |
| W06-3121 229 24:69 The software also required GIZA++ word alignment tool(Och and Ney, 2003). |
| P06-1097 230 38:187 We use the union, re ned and intersection heuristics de ned in (Och and Ney, 2003) which are used in conjunction with IBM Model 4 as the baseline in virtually all recent work on word alignment. |
| P06-1097 231 4:187 1 Introduction The most widely applied training procedure for statistical machine translation IBM model 4 (Brown et al. , 1993) unsupervised training followed by post-processing with symmetrization heuristics (Och and Ney, 2003) yields low quality word alignments. |
| P06-1097 232 33:187 For each training direction, we run GIZA++ (Och and Ney, 2003), specifying 5 iterations of Model 1, 4 iterations of the HMM model (Vogel et al. , 1996), and 4 iterations of Model 4. |
| P06-1097 233 157:187 However, union and rened alignments, which are many-to-many, are what are used to build competitive phrasal SMT systems, because intersection performs poorly, despite having been shown to have the best AER scores for the French/English corpus we are using (Och and Ney, 2003). |
| D07-1105 234 16:270 For instance, word alignment models are often trained using the GIZA++ toolkit (Och and Ney, 2003); error minimizing training criteria such as the Minimum Error Rate Training (Och, 2003) are employed in order to learn feature function weights for log-linear models; and translation candidates are produced using phrase-based decoders (Koehn et al. , 2003) in combination with n-gram language models (Brants et al. , 2007). |
| D07-1105 235 170:270 For instance, changing the training procedure for word alignment models turned out to be most beneficial; for details see (Och and Ney, 2003). |
| W07-0705 236 56:123 In the first part, a word alignment was computed (using the GIZA++ toolkit (Och and Ney, 2003)). |
| W08-0303 237 151:239 This data consists of 1.1 million sentences, a validation set of 37 sentences and a test set of 447 sentences, which have been hand-aligned (Och and Ney, 2003). |
| W08-0303 238 162:239 For the first two tasks, all heuristics of the Pharaoh-Toolkit (Koehn et al., 2003) as well as the refined heuristic (Och and Ney, 2003) to combine both IBM4-alignments were tested and the best ones are shown in the tables. |
| P07-2046 239 8:108 1 Introduction Raw parallel data need to be preprocessed in the modern phrase-based SMT before they are aligned by alignment algorithms, one of which is the wellknown tool, GIZA++ (Och and Ney, 2003), for training IBM models (1-4). |
| E06-1020 240 18:194 We used GIZA++ (Och and Ney 2000; Och and Ney, 2003) to estimate different parameters of the MEBA aligner. |
| E06-1019 241 9:254 They remain the dominant method for word alignment (Och and Ney, 2003). |
| E06-1019 242 223:254 Gold standard links are broken up into twocategories in Och and Neys evaluation framework (2003). |
| E06-1019 243 169:254 Our test set is the 500 manually aligned sentence pairs created by Franz Och and Hermann Ney (2003). |
| E06-1019 244 205:254 Our weakest method outperforms Model 2, which scores an AERof 22.0 on this test set whentrained with roughly twice as many sentence pairs (Och and Ney, 2003). |
| E06-1019 245 168:254 In all cases, we report our results in terms of alignment quality, using the standard word alignment error metrics: precision, recall, F-measure andalignment error rate (OchandNey, 2003). |
| J06-4004 246 207:388 Three different alignment sets are considered: source-to-target, the union of source-to-target and target-to-source, and the refined alignment method described by Och and Ney (2003). |
| J06-4004 247 14:388 Submission received: 9 August 2005; revised submission received: 26 April 2006; accepted for publication: 5 July 2006 2006 Association for Computational Linguistics Computational Linguistics Volume 32, Number 4 replaced by phrase-based translation models (Zens, Och, and Ney 2002; Koehn, Och, and Marcu 2003) which are directly estimated from aligned bilingual corpora by considering relative frequencies, and second, the noisy channel approach has been expanded to a more general maximum entropy approach in which a log-linear combination of multiple feature functions is implemented (Och and Ney 2002). |
| I08-1064 248 147:193 4 Experiments Our evaluation setup consists of experiments conducted on the English-German portion of the Europarl corpus (Koehn, 2005); specifically, we work with the preprocessed and word-aligned version used in Pado and Lapata (2006): the source-target and target-source word alignments were automatically established by GIZA++ (Och and Ney, 2003), and their intersection achieves a precision of 98.6% and a recall of 52.9% (Pado, 2007). |
| D08-1078 249 132:241 The automatic alignments were extracted by appending the manually aligned sentences on to the respective Europarl v3 corpora and aligning them using GIZA++ (Och and Ney, 2003) and the growfinal-diag algorithm (Koehn et al., 2003). |
| D08-1066 250 54:243 These heuristics define a phrase pair to consist of a source and target ngrams of a word-aligned source-target sentence pair such that if one end of an alignment is in the one ngram, the other end is in the other ngram (and there is at least one such alignment) (Och and Ney, 2004; Koehn et al., 2003). |
| D08-1066 251 13:243 The heuristic estimator employs word-alignment (Giza++) (Och and Ney, 2003) and a few thumb rules for defining phrase pairs, and then extracts a multi-set of phrase pairs and estimates their conditional probabilities based on the counts in the multi-set. |
| D08-1066 252 53:243 (Koehn et al., 2003; Och and Ney, 2004)). |
| H05-1023 253 8:217 1 Introduction Todays statistical machine translation systems rely on high quality phrase translation pairs to acquire state-of-the-art performance, see (Koehn et al. , 2003; Zens and Ney, 2004; Och and Ney, 2003). |
| H05-1023 254 155:217 IBM Model-4 is trained with GIZA++ using the best reported settings in (Och and Ney, 2003). |
| H05-1023 255 157:217 We collect bidirectional (bi) refined word alignment by growing the intersection of Chinese-to-English (CE) alignments and English-to-Chinese (EC) alignments with the neighboring unaligned word pairs which appear in the union similar to the final-and approaches (Koehn, 2003; Och and Ney, 2003; Tillmann, 2003). |
| W04-0821 256 42:73 For example, in the English-Spanish sentence pair Me gusta la ciudad/I like the city, one would find the translation pairs a1a3a2a5a4a7a6 a13a9a8, a1a3a10 a1a12a11 a13a13a4a15a14a17a16 a3 a16a19a18a20a8, a1 a16a7a21 a13a13a4a22a10a3a18a23a8, and a1a3a24 a1 a16a19a25a26a4a22a24 a1 a16a28a27a17a18a29a27a29a8 .2 Those familiar with statistical machine translation (MT) models will note that a translation pair is equivalent to a link in a word-level alignment, and in fact we obtain translation pairs from sentence-aligned parallel text by training a statistical MT model (using GIZA++, (Och and Ney, 2003)) and using the word-level alignments that result. |
| W07-0403 257 26:234 The surface heuristic can define consistency according to any word alignment; but most often, the alignment is provided by GIZA++ (Och and Ney, 2003). |
| W07-0403 258 29:234 Many-to-many alignments can be created by combining two GIZA++ alignments, one where English generates Foreign and another with those roles reversed (Och and Ney, 2003). |
| W07-0403 259 178:234 We report precision, recall and balanced F-measure (Och and Ney, 2003). |
| W05-0817 260 11:109 GIZA has been superseded by its recent extension GIZA++ (Och and Ney, 2000, 2003) publicly available 2. |
| P07-1040 261 22:212 In (Matusov et al. , 2006), different word orderings are taken into account by training alignment models by considering all hypothesis pairs as a parallel corpus using GIZA++ (Och and Ney, 2003). |
| I08-2104 262 75:133 GIZA++ (Och and Ney, 2003) was used for training the IBM model 4 from the normalized parallel corpus. |
| W05-1204 263 112:174 Training of the IBM model 4 was implemented by the GIZA++ package (Och and Ney, 2003). |
| H05-1108 264 160:204 Words were 864 Model Precision Recall F-score w 0.41 0.40 0.41 cw 0.46 0.45 0.46 Upper bound 0.85 0.84 0.84 Table 6: Results for word-based projection models aligned using the default setting4 of GIZA++ (Och and Ney, 2003), a publicly available implementation of the IBM models and HMM word alignment models. |
| H05-1108 265 65:204 We first used the publicly available GIZA++ (Och and Ney, 2003) software to induce English-German word alignments. |
| C04-1031 266 25:147 Alignment clues represent probabilistic indications of associa1A similar study on statistical alignment models is included in (Och and Ney, 2003). |
| C04-1031 267 99:147 Alternative measures for the evaluation of one-to-one word links have been proposed in (Och and Ney, 2000a; Och and Ney, 2003). |
| C04-1031 268 112:147 Three different clue types are used for the alignment: the Dice coefficient (dice), lexical translation probabilities derived from statistical translation models (giza) using the GIZA++ toolbox (Och and Ney, 2003), and, finally, POS/relative-wordposition-clues learned from previous alignments (pp). |
| W06-3126 269 38:85 Then, running GIZA++ (Och and Ney, 2003), we obtain token alignments for each of the data views. |
| E06-1021 270 57:191 In subsequent work, the same authors (Quirk et al. , 2004) used such matched sentence pairs to train Giza++ (Och and Ney, 2003) on word-level alignment. |
| P07-1121 271 65:199 We use the K = 5 most probable word alignments for the training set given by GIZA++ (Och and Ney, 2003), with variable names ignored to reduce sparsity. |
| N06-1056 272 71:173 Using word alignments to induce a lexicon is not a new idea (Och and Ney, 2003). |
| N06-1056 273 91:173 In this work, we use the GIZA++ implementation (Och and Ney, 2003) of IBM Model 5 (Brown et al. , 1993). |
| N06-1056 274 165:173 Our method is like many phrasebased translation models, which require a simpler, word-based alignment model for the acquisition of a phrasal lexicon (Och and Ney, 2003). |
| C08-2032 275 23:70 Let us suppose that we have two bilingual lexicons L f L p and L p L e . We obtain word alignments of these lexicons by applying GIZA++ (Och and Ney, 2003), and grow-diag-final heuristics (Koehn et al., 2007). |
| P06-1065 276 142:192 Results for IBM Model 4 are reported for models trained in both directions, English-to-French and French-toEnglish, and for the union, intersection, and what Och and Ney (2003) call the refined combination of the those two alignments. |
| P06-1065 277 7:192 (1993), sometimes augmented by an HMMbased model or Och and Neys Model 6 (Och and Ney, 2003). |
| P06-1065 278 138:192 Automatic sentence alignment of the training data was provided by Ulrich Germann, and the hand alignments of the labeled data were created by Franz Och and Hermann Ney (Och and Ney, 2003). |
| P06-1065 279 140:192 package, using the default configuration file (Och and Ney, 2003).2 Prev LLR is our earlier stage 1 model, and CLP1 and CLP2 are two versions of our earlier stage 2 model. |
| P06-1065 280 168:192 The best previously reported result was by Och and Ney (2003), who obtained 5.2% AER for a combination including all the IBM models except Model 2, plus the HMM model and their Model 6, together with a bilingual dictionary, for the refined alignment combination, trained on three times as much data as we used. |
| N06-1013 281 8:176 Maximum entropy (ME) models have been used in bilingual sense disambiguation, word reordering, and sentence segmentation (Berger et al. , 1996), parsing, POS tagging and PP attachment (Ratnaparkhi, 1998), machine translation (Och and Ney, 2002), and FrameNet classification (Fleischman et al. , 2003). |
| N06-1013 282 155:176 In a later study, Och and Ney (2003) present a loglinear combination of the HMM and IBM Model 4 that produces better alignments than either of those. |
| N06-1013 283 7:176 1 Introduction Word alignmentdetection of corresponding words between two sentences that are translations of each otheris usually an intermediate step of statistical machine translation (MT) (Brown et al. , 1993; Och and Ney, 2003; Koehn et al. , 2003), but also has been shown useful for other applications such as construction of bilingual lexicons, word-sense disambiguation, projection of resources, and crosslanguage information retrieval. |
| W08-0316 284 38:76 Word alignments were generated using GIZA++ (Och and Ney, 2003) over a stemmed version of the parallel text. |
| W05-0908 285 32:148 Our system is a re-implementation of the phrase-based system described in Koehn (2003), and uses publicly available components for word alignment (Och and Ney, 2003)1, decoding (Koehn, 2004a)2, language modeling (Stolcke, 2002)3 and finite-state processing (Knight and Al-Onaizan, 1999)4. |
| W08-0307 286 156:224 Word alignment is done with GIZA++ (Och and Ney, 2003). |
| W08-0307 287 136:224 The simple idea that words in a source chunk are typically aligned to words in a single possible target chunk is used to discard alignments which link words from 2We use IBM-1 to IBM-5 models (Brown et al., 1993) implemented with GIZA++ (Och and Ney, 2003). |
| W06-3117 288 8:110 Several methods have been defined for dealing with this problem (Och and Ney, 2003). |
| W06-3117 289 5:110 1 Introduction Phrase-based statistical translation systems are currently providing excellent results in real machine translation tasks (Zens et al. , 2002; Och and Ney, 2003; Koehn, 2004). |
| C08-2014 290 7:97 Among them, the bilingual word aligner GIZA++ (Och and Ney, 2003) can perform high quality alignments based on words statistics and is considered the most efficient tool. |
| W06-3115 291 16:84 Feature function scaling factors m are optimized based on a maximum likelihood approach (Och and Ney, 2002) or on a direct error minimization approach (Och, 2003). |
| W06-3115 292 23:84 First, manyto-many word alignments are induced by running a one-to-many word alignment model, such as GIZA++ (Och and Ney, 2003), in both directions and by combining the results based on a heuristic (Och and Ney, 2004). |
| W06-3115 293 50:84 For each differently tokenized corpus, we computed word alignments by a HMM translation model (Och and Ney, 2003) and by a word alignment refinement heuristic of grow-diagfinal (Koehn et al. , 2003). |
| I08-1068 294 116:193 GIZA++ (Och and Ney, 2003), an open source tool which implements the IBM Models which we have used in our work for computing the translation probabilities. |
| P07-1037 295 96:160 The bidirectional word alignmentisusedtoobtainlexicalphrasetranslationpairs using heuristics presented in (Och & Ney, 2003) and (Koehn et al. , 2003). |
| P07-1037 296 41:160 Firstly, rather than induce millions of xRS rules from parallel data, we extract phrase pairs in the standard way (Och & Ney, 2003) and associate with each phrase-pair a set of target language syntactic structures based on supertag sequences. |
| P07-1037 297 50:160 The bidirectional word alignment is used to obtain phrase translation pairs using heuristics presented in 2http://www.fjoch.com/GIZA++.html 289 (Och & Ney, 2003) and (Koehn et al. , 2003), and the Moses decoder was used for phrase extraction and decoding.3 Let t and s be the target and source language sentences respectively. |
| C04-1156 298 149:189 These results compare badly with those reported by (Och and Ney, 2003) on the Hansard alignment task. |
| C04-1156 299 162:189 This is a small corpus which allows for only an approximate comparison with the experiment reported by Och and Ney (2003) on a set of 8,000 sentences from the Hansard corpus. |
| C04-1156 300 19:189 The statistics-based algorithm to be evaluated is described in (Och and Ney, 2003). |
| W07-0411 301 95:166 Using the standard GIZA++ software and the refined word alignment strategy of Och and Ney (2003) on our test set of 4,000 Spanish-English sentences, the method generated paraphrases for just over 1100 items. |
| W07-0411 302 87:166 4.1.1 Experimental design In order to check for the existence of a bias in the dependency-based metric, we created a set of 4,000 sentences drawn randomly from the SpanishEnglish subset of Europarl (Koehn, 2005), and we produced two translations: one by a rule-based system Logomedia, and the other by the standard phrase-based statistical decoder Pharaoh, using alignments produced by GIZA++8 and the refined word alignment strategy of Och and Ney (2003). |
| N07-2022 303 17:92 Unsupervised systems (Och and Ney, 2003; Liang et al. , 2006) are based on generative models trained with the EM algorithm. |
| P07-1020 304 8:189 Most of the previous work on statistical machine translation, as exemplified in (Brown et al. , 1993), employs word-alignment algorithm (such as GIZA++ (Och and Ney, 2003)) that provides local associations between source and target words. |
| P07-1020 305 29:189 ij(f(si) = tj f(si) = epsilon1) (1) For the work reported in this paper, we have used the GIZA++ tool (Och and Ney, 2003) which implements a string-alignment algorithm. |
| W08-0336 306 46:196 We build phrase translations by first acquiring bidirectional GIZA++ (Och and Ney, 2003) alignments, and using Moses grow-diag alignment symmetrization heuristic.1 We set the maximum phrase length to a large value (10), because some segmenters described later in this paper will result in shorter 1In our experiments, this heuristic consistently performed better than the default, grow-diag-final. |
| J05-4004 307 60:457 Based on these efforts, one might be initially tempted to use readily available alignment models developed in the context of machine translation, such as GIZA++ (Och and Ney 2003), to obtain wordlevel alignments in document, abstract corpora. |
| J05-4004 308 350:457 We compare against several competing systems, the first of which is based on the original IBM Model 4 for machine translation (Brown et al. 1993) and the HMM machine translation alignment model (Vogel, Ney, and Tillmann 1996) as implemented in the GIZA++ package (Och and Ney 2003). |
| J05-4004 309 84:457 2.1 Annotation Guidelines Annotators were asked to perform word-to-word and phrase-to-phrase alignments between abstracts and documents, and to classify each alignment as either possible (P) or sure (S), where S P, following the methodology used in the machine translation community (Och and Ney 2003). |
| N06-1014 310 101:166 The validation and test sentences have been hand-aligned (see Och and Ney (2003)) and are marked with both sure and possible alignments. |
| N06-1014 311 60:166 Each z can be thought of as an element in the set of generalized alignments, where any subset of word pairs may be aligned (Och and Ney, 2003). |
| N06-1014 312 6:166 The classic approaches to unsupervised word alignment are based on IBM models 15 (Brown et al. , 1994) and the HMM model (Ney and Vogel, 1996) (see Och and Ney (2003) for a systematic comparison). |
| N06-1014 313 109:166 Each model was trained for 5 iterations, using the same training regimen as in Och and Ney (2003). |
| N06-1014 314 28:166 2 Alignment models: IBM 1, 2 and HMM We briefly review the sequence-based word alignment models (Brown et al. , 1994; Och and Ney, 2003) and describe some of the choices in our implementation. |
| N06-1014 315 9:166 As a result, many practitioners use the complex GIZA++ software package (Och and Ney, 2003) as a black box, selecting model 4 as a good compromise between alignment quality and efficiency. |
| N06-1014 316 31:166 The distortion parameters pd(aj = iprime | aj = i) depend on the particular model (we write aj = 0 to denote the event that the j-th French word 2The dependence on aj can in fact be implemented as afirst-order HMM (see Och and Ney (2003)). |
| N06-1014 317 13:166 For example, gains from the new model 6 of Och and Ney (2003) are modest. |
| W06-1610 318 80:198 We ran the alignment algorithm from (Och and Ney, 2003) on a Chinese-English parallel corpus of 218 million English words, available from the Linguistic Data Consortium (LDC). |
| P05-1048 319 76:160 The training scheme consists of IBM-1, HMM, IBM-3 and IBM-4, following (Och and Ney, 2003). |
| P05-1048 320 73:160 389 Table 1: Example of the translation candidates before and after mapping for the target word a52 (lu) HowNet Sense ID HowNet glosses HowNet glosses + improved translations 56520 distance distance 56521 sort sort 56524 Lu Lu 56525, 56526, 56527, 56528 path, road, route, way path, road, route, way, circuit, roads 56530, 56531, 56532 line, means, sequence line, means, sequence, lines 56533, 56534 district, region district, region 4.1 Alignment model The alignment model was trained with GIZA++ (Och and Ney, 2003), which implements the most typical IBM and HMM alignment models. |
| I05-2021 321 48:135 The training scheme is IBM-1, HMM, IBM-3 and IBM-4, as specified in (Och and Ney, 2003). |
| I05-2021 322 45:135 3.1 Alignment model The alignment model was trained with GIZA++ (Och and Ney, 2003), which implements the most typical IBM and HMM alignment models. |
| W08-1911 323 46:160 (Och and Ney, 2003)), and the phrase-based approach to Statistical Machine Translation (Koehn et al., 2003) has led to the development of heuristics for obtaining alignments between phrases of any number of words. |
| W08-1911 324 104:160 4 Experiments and evaluation We carried out an evaluation on the local rephrasing of French sentences, using English as the pivot language.2 We extracted phrase alignments of up to 7 word forms using the Giza++ alignment tool (Och and Ney, 2003) and the grow-diag-final-and heuristics described in (Koehn et al., 2003) on 948,507 sentences of the French-English part of the Europarl corpus (Koehn, 2005) and obtained some 42 million phrase pairs for which probabilities were estimated using maximum likelihood estimation. |
| N07-1046 325 117:233 In a way, this approach is similar to those of refining the word-level alignment for SMT in (Och and Ney, 2003). |
| W06-3104 326 141:270 Initial estimates of lexical translation probabilities came from the IBM Model 4 translation tables produced by GIZA++ (Brown et al. , 1993; Och and Ney, 2003). |
| W04-1118 327 44:191 A Viterbi alignment aJ1 of a specific model is an alignment for which the following equation holds: aJ1 = argmax aJ1 Pr(fJ1 ;aJ1jeI1): (4) The alignment models are trained on a bilingual corpus using GIZA++(Och et al. , 1999; Och and Ney, 2003). |
| W04-1118 328 43:191 A detailed description of these models can be found in (Och and Ney, 2003). |
| P07-2045 329 29:103 Moses uses standard external tools for some of the tasks to avoid duplication, such as GIZA++ (Och and Ney 2003) for word alignments and SRILM for language modeling. |
| P05-1067 330 188:217 In comparison, we deployed the GIZA++ MT modeling tool kit, which is an implementation of the IBM Models 1 to 4 (Brown et al. , 1993; AlOnaizan et al. , 1999; Och and Ney, 2003). |
| P05-1057 331 46:247 In order to incorporate a new dependency which contains extra information other than the bilingual sentence pair, we modify Eq.2 by adding a new variable v: Pr(a|e,f,v) = exp[ summationtextM m=1 mhm(a,e,f,v)]summationtext aprime exp[ summationtextM m=1 mhm(aprime,e,f,v)](4) Accordingly, we get a new decision rule: a = argmax a braceleftbigg Msummationdisplay m=1 mhm(a,e,f,v) bracerightbigg (5) Note that our log-linear models are different from Model 6 proposed by Och and Ney (2003), which defines the alignment problem as finding the alignment a that maximizes Pr(f, a|e) given e. 3 Feature Functions In this paper, we use IBM translation Model 3 as the base feature of our log-linear models. |
| P05-1057 332 135:247 After that, we used three types of methods for performing a symmetrization of IBM models: intersection, union, and refined methods (Och and Ney, 2003). |
| P05-1057 333 22:247 Och and Ney (2003) proposed Model 6, a log-linear combination of IBM translation models and HMM model. |
| P05-1057 334 132:247 We used GIZA++ package (Och and Ney, 2003) to train IBM translation models. |
| P05-1057 335 14:247 Studies reveal that statistical alignment models outperform the simple Dice coefficient (Och and Ney, 2003). |
| W07-1205 336 9:213 The most successful statistical MT paradigm has been, for a while now, the so-call phrase-based MT approach (Och and Ney, 2003). |
| H05-1012 337 14:201 Although there is a modest cost associated with annotating data, we show that a reduction of 40% relative in alignment error (AER) is possible over the GIZA++ aligner (Och and Ney, 2003). |
| W08-0306 338 6:125 GIZA++ (Och and Ney, 2003), an implementation of the IBM (Brown et al., 1993) and HMM (?) |
| W08-0306 339 18:125 We show that link 1For a complete discussion of alignment symmetrization heuristics, including union, intersection, and refined, refer to (Och and Ney, 2003). |
| W08-0306 340 86:125 3.2 Evaluation Metrics AER (Alignment Error Rate) (Och and Ney, 2003) is the most widely used metric of alignment quality, but requires gold-standard alignments labelled with sure/possible annotations to compute; lacking such annotations, we can compute alignment fmeasure instead. |
| W08-0306 341 99:125 The feature weights are tuned using minimum error rate training (Och and Ney, 2003) to optimize BLEU score on a held-out development set. |
| P06-1122 342 125:211 Aligning tokens in parallel sentences using the IBM Models (Brown et al. , 1993), (Och and Ney, 2003) may require less information than full-blown translation since the task is constrained by the source and target tokens present in each sentence pair. |
| W07-0718 343 103:240 , 2003; Och and Ney, 2004). |
| W07-0718 344 101:240 The word alignments were created with Giza++ (Och and Ney, 2003) applied to a parallel corpus containing 200,000 sentence pairs of the training data, plus sets of 4,007 sentence pairs created by pairing the test sentences with the reference translations, and the test sentences paired with each of the system translations. |
| W08-0321 345 15:99 Following the guidelines of the workshop we built baseline systems, using the lower-cased Europarl parallel corpus (restricting sentence length to 40 words), GIZA++ (Och and Ney, 2003), Moses (Koehn et al., 2007), and the SRI LM toolkit (Stolcke, 2002) to build 5-gram LMs. |
| W08-0321 346 33:99 For example, in IBM Model 1 the lexicon probability of source word f given target word e is calculated as (Och and Ney, 2003): p(f|e) = summationtext k c(f|e;e k,fk) summationtext k,f c(f|e;e k,fk) (1) c(f|e;ek,fk) = summationdisplay ek,fk P(ek,fk)summationdisplay a P(a|ek,fk) (2) summationdisplay j (f,fkj )(e,ekaj) Therefore, the distribution of P(ek,fk) will affect the alignment results. |
| D07-1093 347 125:216 891 The Europarl and the biblical data were processed and aligned in the standard way, using combined GIZA++ alignments (Och and Ney, 2003). |
| W07-0725 348 13:98 2 Architecture of the system The goal of statistical machine translation (SMT) is to produce a target sentence e from a source sentence f. It is today common practice to use phrases as translation units (Koehn et al. , 2003; Och and Ney, 2003) and a log linear framework in order to introduce several models explaining the translation process: e??= argmaxp(e|f) = argmaxe {exp(summationdisplay i ihi(e,f))} (1) The feature functions hi are the system models and the i weights are typically optimized to maximize a scoring function on a development set (Och and Ney, 2002). |
| W05-0825 349 9:75 A widely practiced approach explained in details in (Koehn, 2004), (Och and Ney, 2003) and (Tillmann, 2003) is to get word alignments from two directions: source to target and target to source; the intersection or union operation is applied to get re ned word alignment with pre-designed heuristics xing the unaligned words. |
| W05-0825 350 62:75 We train IBM Model 4 with a scheme of 1720h73043 using GIZA++ (Och and Ney, 2003). |
| C08-1056 351 75:155 Giza++ (Och and Ney, 2003) is used to induce, based on statistical principles (Brown et al., 1990), an automatic word alignment of SMS tokens with their normalized counterparts; Moses (Koehn et al., 2007) is used to learn the various parameters of the phrase-based model, to optimize the weight combination and to perform the translation using a multi-stack search algorithm; the SRI language model toolkit (Stolcke, 2002) is finally used to estimate statistical language models. |
| D08-1084 352 26:232 The MT community has developed not only an extensive literature on alignment (Brown et al., 1993; Vogel et al., 1996; Marcu and Wong, 2002; DeNero et al., 2006), but also standard, proven alignment tools such as GIZA++ (Och and Ney, 2003). |
| D08-1084 353 137:232 Although we have argued (section 2) that this is unlikely to succeed, to our knowledge, we are the first to investigate the matter empirically.11 The best-known MT aligner is undoubtedly GIZA++ (Och and Ney, 2003), which contains implementations of various IBM models (Brown et al., 1993), as well as the HMM model of Vogel et al. |
| D08-1084 354 216:232 The standard approach to training a phrase-based MT system is to apply phrase extraction heuristics using wordaligned training sets (Och and Ney, 2003; Koehn et al., 2007). |
| D08-1084 355 125:232 (2005), and similar to the simple heuristic model described in (Och and Ney, 2003). |
| W05-0833 356 52:152 In order to create the necessary SMT language and translation models, they used: Giza++ (Och & Ney, 2003);2 the CMU-Cambridge statistical toolkit;3 the ISI ReWrite Decoder.4 Translation was performed from EnglishFrench and FrenchEnglish, and the resulting translations were evaluated using a range of automatic metrics: BLEU (Papineni et al. , 2002), Precision and Recall 2http://www.isi.edu/och/Giza++.html 3http://mi.eng.cam.ac.uk/prc14/toolkit.html 4http://www.isi.edu/licensed-sw/rewrite-decoder/ 185 (Turian et al. , 2003), and Wordand Sentence Error Rates. |
| W05-0833 357 77:152 Accordingly, in this section we describe a set of experiments which extends the work of (Way and Gough, 2005) by evaluating the Marker-based EBMT system of (Gough & Way, 2004b) against a phrase-based SMT system built using the following components: Giza++, to extract the word-level correspondences; The Giza++ word alignments are then refined and used to extract phrasal alignments ((Och & Ney, 2003); or (Koehn et al. , 2003) for a more recent implementation); Probabilities of the extracted phrases are calculated from relative frequencies; The resulting phrase translation table is passed to the Pharaoh phrase-based SMT decoder which along with SRI language modelling toolkit5 performs translation. |
| W04-2207 358 8:171 Indeed, it is even often diffult for a human to determine which source unit correspond to which target unit within aligned sentences (Och and Ney, 2003). |
| W08-0510 359 47:155 GIZA++ (Och and Ney 2003) is a very popular system within SMT for creating word alignment from parallel corpus, in fact, the Moses training scripts uses it. |
| W08-0313 360 14:90 2 Architecture of the system The goal of statistical machine translation (SMT) is to produce a target sentence e from a source sentence f. It is today common practice to use phrases as translation units (Koehn et al., 2003; Och and Ney, 2003) and a log linear framework in order to introduce several models explaining the translation process: e = argmaxp(e|f) = argmaxe {exp(summationdisplay i ihi(e,f))} (1) The feature functions hi are the system models and the i weights are typically optimized to maximize a scoring function on a development set (Och and Ney, 2002). |
| N07-2037 361 109:135 By combining word alignments in two directions using heuristics (Och and Ney, 2003), a single set of static word alignments was then formed. |
| W05-0816 362 77:198 One of the algorithms tested (Melamed, 1996) gave worse performance when we used a notation called ITRANS for the Hindi text, instead of the WX-notation.1 4 Evaluation in Previous Work There have been attempts to systematically evaluate and compare word alignment algorithms (Och and Ney, 2003) but, surprisingly, there has been a lack of such evaluation for sentence alignment algorithms. |
| W05-0812 363 8:102 Implementations are also freely available (Al-Onaizan et al. , 1999; Och and Ney, 2003). |
| W05-0812 364 89:102 This sequence performed well in an evaluation of the IBM Models (Och and Ney, 2003). |
| W05-0812 365 11:102 IBM Model 4 parameters are then estimated over this partial search space as an approximation to EM (Brown et al. , 1993; Och and Ney, 2003). |
| W05-0812 366 6:102 In empirical evaluations it has outperformed the other IBM Models and a Hidden Markov Model (HMM) (Och and Ney, 2003). |
| W05-0812 367 12:102 This approach yields good results, but it has been observed that the IBM Model 4 performance is only slightly better than that of the underlying HMM Model used in this bootstrapping process (Och and Ney, 2003). |
| D07-1079 368 9:289 Approaches include word substitution systems (Brown et al. , 1993), phrase substitution systems (Koehn et al. , 2003; Och and Ney, 2004), and synchronous context-free grammar systems (Wu and Wong, 1998; Chiang, 2005), all of which train on string pairs and seek to establish connections between source and target strings. |
| D07-1079 369 83:289 A superset of the parallel data was word aligned by GIZA union (Och and Ney, 2003) and EMD (Fraser and Marcu, 2006). |
| W06-3107 370 86:160 This model was trained using the GIZA++ toolkit (Och and Ney, 2003) on the material available for the different alignment tasks described in section 5.1 4.3 Search In this section, some specific details about the search are given. |
| P06-1119 371 204:233 Data sparsity is also an issue for more general state-of-the-art bilingual alignment approaches (Brown, et al. , 2000; Och & Ney, 2003; Wantanabe & Sumita, 2003). |
| W05-0806 372 82:124 Word alignments are produced using GIZA++ toolkit without symmetrisation (Och and Ney, 2003). |
| W05-0806 373 10:124 For detailed descriptions of SMT models see for example (Brown et al. , 1993; Och and Ney, 2003). |
| W08-0325 374 63:103 3The dictionary was created by merging the translation dictionary from PCEDT ((Curn and others, 2004)) and a translation dictionary extracted from a part of the parallel corpus Czeng ((Bojar and Zabokrtsky, 2006)) aligned at word-level by Giza++ ((Och and Ney, 2003)). |
| P04-1066 375 165:168 However, it is not clear that AER as defined by Och and Ney (2003) is always the appropriate way to evaluate the quality of the model, since the Viterbi word alignment that AER is based on is seldom used in applications of Model 1. |
| P04-1066 376 127:168 8 Results We report the performance of our different versions of Model 1 in terms of precision, recall, and alignment error rate (AER) as defined by Och and Ney (2003). |
| P04-1066 377 160:168 It is interesting to contrast our heuristic model with the heuristic models used by Och and Ney (2003) as baselines in their comparative study of alignment models. |
| P04-1066 378 15:168 (1993a) stopped after only one iteration of EM in using Model 1 to initialize their Model 2, and Och and Ney (2003) stop after five iterations in using Model 1 to initialize the HMM word-alignment model. |
| P04-1066 379 112:168 The trial and test data had been manually aligned at the word level, noting particular pairs of words either as sure or possible alignments, as described by Och and Ney (2003). |
| N06-1015 380 143:205 The validation and test sentences have been hand-aligned (see Och and Ney (2003)) and are marked with both sure and possible alignments. |
| N06-1015 381 149:205 We also trained on only the first 100 to make our results more comparable with the experiments of Och and Ney (2003), in which IBM model 4 was tuned using 100 sentences. |
| N06-1015 382 13:205 The standard approach to word alignment is to construct directional generative models (Brown et al. , 1990; Och and Ney, 2003), which produce a sentence in one language given the sentence in another language. |
| N06-1015 383 94:205 Generative alignment models like the HMM model (Vogel et al. , 1996) and IBM models 4 and above (Brown et al. , 1990; Och and Ney, 2003) directly model correlations between alignments of consecutive words (at least on one side). |
| I08-1030 384 9:242 In the training phase, bilingual parallel sentences are preprocessed and aligned using alignment algorithms or tools such as GIZA++ (Och and Ney, 2003). |
| P06-2037 385 60:205 We used the GIZA++ SMT Toolkit4 (Och and Ney, 2003) to generate word alignments We applied the phraseextract algorithm, as described by Och (2002), on the Viterbi alignments output by GIZA++. |
| P08-1010 386 147:204 Two different word alignment models are trained as the baseline, one is symmetric HMM word alignment model, the other is IBM Model-4 as implemented in the GIZA++ toolkit (Och and Ney, 2003). |
| P08-1010 387 139:204 By combining word alignments in two directions using heuristics (Och and Ney, 2003), a single set of static word alignments is then formed. |
| P08-1010 388 12:204 The most widely used approach derives phrase pairs from word alignment matrix (Och and Ney, 2003; Koehn et al., 2003). |
| N06-4004 389 21:70 There are systems available for these purposes, notably the GIZA++ (Och and Ney, 2003) toolkit and 265 ! " # $ % & ' (), * +, $ % '-,./ 01 &2 . It is necessary to resolutely remove obstacles in rivers and lakes . 3 4 56 78 9:, ;< => . 4 . It is necessary to strengthen monitoring and forecast work and scientifically dispatch people and materials . ! ?@ AB CD, EFGH IJ 9 : K> . It is necessary to take effective measures and try by every possible means to provide precision forecast . L M ! NO PQ RS 9: FT, A U*V W XY() . Before the flood season comes, it is necessary to seize the time to formulate plans for forecasting floods and to carry out work with clear G6aG8fG90G95G8cG9aG8c G6cG95G8eG93G90G9aG8f Figure 1: Chinese/English Parallel Corpus Aligned at the Sentence, Word, and Phrase Levels: horizontal lines denote the segmentations of a sentence alignment and arrows denote a word-level mapping. |
| N06-4004 390 8:70 Atthefinestlevel, thisinvolvesthealignment of words and phrases within two sentences that are known to be translations (Brown et al. , 1993; Och and Ney, 2003; Vogel et al. , 1996; Deng and Byrne, 2005). |
| N06-4004 391 38:70 MTTK provides implementations of various alignment, models including IBM Model-1, Model-2 (Brown et al. , 1993), HMM-based word-to-word alignment model (Vogel et al. , 1996; Och and Ney, 2003) and HMM-based word-to-phrase alignment model (Deng and Byrne, 2005). |
| J04-4002 392 78:482 (1993) and Och and Ney (2003). |
| J04-4002 393 84:482 The alignment a J 1 that has the highest probability (under a certain model) is also called the Viterbi alignment (of that model): a J 1 = argmax a J 1 p (f J 1, a J 1 | e I 1 ) (8) A detailed comparison of the quality of these Viterbi alignments for various statistical alignment models compared to human-made word alignments can be found in Och and Ney (2003). |
| W07-0406 394 6:175 1 Motivation Statistical machine translation has, for a while now, been dominated by the phrase-based translation paradigm (Och and Ney, 2003). |
| D07-1089 395 42:343 MCA is also related to the problem of word alignment in statistical machine translation (SMT) (Och and Ney, 2003). |
| D07-1089 396 178:343 The Viterbi many-to-many pairwise alignments are then generated by combining equivalent pairs of many-to-one alignments using three different standard symmetrization methods forword-alignmentunion,intersection,andtherefined method of Och & Ney (2003). |
| D07-1089 397 56:343 Och & Ney (2003) present an overview and comparison of the most common models used for SMT word alignments. |
| D08-1011 398 79:215 Following Och and Ney (2003), we use a fixed value p0 for the probability of jumping to a null state, which can be optimized on held-out data, and the overall distortion model becomes 0 0 i f s t a t e( | , ) ( 1 ) ( | , ) o t h e r w i s ep i n u l lp i i I p p i i I 3.4 Alignment normalization Given an HMM, the Viterbi alignment algorithm can be applied to find the best alignment between the backbone and the hypothesis, 1111 a r g m a x ( | , ) ( | )jJ JJ j j j aa ja p a a I p e e (7) However, the alignment produced by the algorithm cannot be used directly to build a confusion network. |
| D08-1011 399 40:215 One possible statistical model for word alignment is the HMM, which has been widely used for bilingual word alignment (Vogel et al., 1996, Och and Ney, 2003). |
| D08-1011 400 45:215 Treating the alignment as hidden variable, the conditional probability that the hypothesis is generated by the backbone is given by 1 1 1 11( | ) ( | , ) ( | )jJ JJI j j j ajap e e p a a I p e e (1) As in HMM-based bilingual word alignment (Och and Ney, 2003), we also associate a null with each backbone word to allow generating hypothesis words that do not align to any backbone word. |
| W06-3111 401 36:194 Monotone Nonmonotone Target B A Positions C D Source Positions Figure 1: Two Types of Alignment The IBM model 1 (IBM-1) (Brown et al. , 1993) assumes that all alignments have the same probability by using a uniform distribution: p(fJ1 |eI1) = 1IJ Jproductdisplay j=1 Isummationdisplay i=1 p(fj|ei) (2) We use the IBM-1 to train the lexicon parameters p(f|e), the training software is GIZA++ (Och and Ney, 2003). |
| N06-1057 402 71:205 We ran the alignment algorithm from (Och and Ney, 2003) on a Chinese-English parallel corpus of 218 million English words. |
| P06-2014 403 199:233 The second baseline, D-ITG is an ITG aligner with hard cohesion constraints, but which uses the weights 3Though it is arguably lacking one of its strongest features: the output of GIZA++ (Och and Ney, 2003) Table 2: The performance of SVM-trained aligners with various degrees of cohesion constraint. |
| P06-2014 404 182:233 For evaluation we compare to the remaining 347 gold standard pairs using the alignment evaluation metrics: precision, recall and alignment error rate or AER (Och and Ney, 2003). |
| P06-2014 405 171:233 The gold standard is divided into sure and possible link sets S and P (Och and Ney, 2003). |
| P06-2014 406 8:233 The dominant IBM alignment models (Och and Ney, 2003) use minimal linguistic intuitions: sentences are treated as flat strings. |
| E06-1005 407 52:186 The model parameters are trained iteratively in an unsupervised manner with the EM algorithm using the GIZA++ toolkit (Och and Ney, 2003). |
| I05-2012 408 124:156 We evaluate results with the alignment error rate (AER) of Och and Ney (Och et al. , 2003), which measures agreement at the level of pairs of term constituents. |
| J05-4003 409 146:416 Using this alignment strategy, we follow (Och and Ney 2003) and compute one alignment for each translation direction ( f e and e f ), and then combine them. |
| W07-0722 410 82:82 A possible solution is the implementation of interpolation techniques to smooth sharp distributions estimated on few events (Och and Ney, 2003; Zhao and Xing, 2006). |
| W07-0722 411 20:82 In this work, we present a mixture extension of the well-known HMM alignment model first proposed in (Vogel and others, 1996) and refined in (Och and Ney, 2003). |
| W07-0722 412 68:82 Regarding the components of the translation system, 5-gram language models were trained on the monolingual version of the corpora for English(En) 179 and Spanish(Es), while phrase-based models with lexicalized reordering model were trained using the Moses toolkit (P. Koehn and others, 2007), but replacing the Viterbi alignments, usually provided by GIZA++ (Och and Ney, 2003), by those of the HMM mixture model with training scheme mix15H5. |
| W07-0722 413 28:82 j > 1 (3) p(xj |aj1,xj??1,y) ?p(xj |yaj) (4) Furthermore, the treatment of the NULL word is the same as that presented in (Och and Ney, 2003). |
| D07-1007 414 129:218 The phrase bilexicon is derived from the intersection of bidirectional IBM Model 4 alignments, obtained with GIZA++ (Och and Ney, 2003), augmented to improve recall using the grow-diag-final heuristic. |
| P06-2117 415 10:221 In recent years, many researchers build alignment links with bilingual corpora (Wu, 1997; Och and Ney, 2003; Cherry and Lin, 2003; Wu et al. , 2005; Zhang and Gildea, 2005). |
| N07-2007 416 37:120 4 Preprocessing Schemes for Alignment Using a preprocessing scheme for word alignment breaks the process of applying Giza++ (Och and Ney, 2003) on some parallel text into three steps: preprocessing, alignment and remapping. |