Citation Summary
| Citing sentences |
|---|
| H89-2012 1 34:121 We have found that if we first tag every word in the corpus with a part of speech using a method such as Church (1988) or DeRose (1988), and then measure associations between tagged words, we can identify interesting contrasts between verbs associated with a following preposition to~in and verbs associated with a following infinitive marker to~to. |
| P99-1021 2 25:177 However, as Church (1988) rightly pointed out "Proper nouns and capitalized words are particularly problematic: some capitalized words are proper nouns and some are not. |
| P97-1029 3 7:190 There has been a large number of studies in tagging and morphological disambiguation using various techniques such as statistical techniques, e.g., (Church, 1988; Cutting et al. , 1992; DeRose, 1988), constraint-based techniques (Karlsson et al. , 1995; Voutilainen, 1995b; Voutilainen, Heikkil/i, and Anttila, 1992; Voutilainen and Tapanainen, 1993; Oflazer and KuruSz, 1994; Oflazer and Till 1996) and transformation-based techniques (Brilt, 1992; Brill, 1994; Brill, 1995). |
| C98-2203 4 36:111 However, the observation that our constraints are localized to a window of a small number of tokens (say at most 5 tokens in a sequence), suggests a more efficient scheme originally used by Church (1988). |
| C98-2203 5 10:111 Tagging systems have used either a statistical approach where a large corpora is employed to train a probabilistic model which then is used to tag unseen text, (e.g., Church (1988), Cutting et al. |
| W99-0621 6 98:205 This approach has been studied in (Church, 1988; Argamon et al. , 1998). |
| W99-0621 7 25:205 These problems formulations are similar to those studied in (Ramshaw and Marcus, 1995) and (Church, 1988; Argamon et al. , 1998), respectively. |
| W99-0621 8 9:205 The observation that shallow syntactic information can be extracted using local information by examining the pattern itself, its nearby context and the local part-of-speech information has motivated the use of learning methods to recognize these patterns (Church, 1988; Ramshaw and Marcus, 1995; Argamon et al. , 1998; Cardie and Pierce, 1998). |
| W02-0102 9 43:264 Their application to partof-speech tagging (Church, 1988; DeRose, 1988) kicked off the era of statistical NLP, and they have found additional NLP applications to phrase chunking, text segmentation, word-sense disambiguation, and information extraction. |
| H93-1046 10 65:104 I used Ken Church's tagger (Church 1988) to 234 assign part-of-speech probabilities to words. |
| P93-1024 11 27:150 More recently, we have constructed similar tables with the help of a statistical part-of-speech tagger (Church, 1988) and of tools for regular expression pattern matching on tagged corpora (Yarowsky, 1992). |
| H91-1037 12 51:155 In a related test, we explored the bracketings produced by Church's PARTS program (Church, 1988). |
| W94-0111 13 35:188 Brill's results demonstrate that this approach can outperform the Hidden Markov Model approaches that are frequently used for part-of-speech tagging (Jelinek, 1985; Church, 1988; DeRose, 1988; Cutting et al. , 1992; Weischedel et al. , 1993), as well as showing promise for other applications. |
| A94-1013 14 68:173 Our implementation uses a slightlymodified version of the tokenizer from the PARTS part-of-speech tagger (Church, 1988) for this task. |
| A94-1013 15 9:173 1 Introduction Labeling of sentence boundaries is a necessary prerequisite for many natural language processing (NLP) tasks, including part-of-speech tagging (Church, 1988), (Cutting et al. , 1991), and sentence alignment (Gale and Church, 1993), (Kay and RSscheisen, 1993). |
| A94-1006 16 60:178 The multi-word terms match a small set of syntactic patterns defined by regular expressions and are found by searching a version of the document tagged with parts of speech (Church, 1988). |
| A94-1006 17 52:178 Termight uses a part of speech tagger (Church, 1988) to identify a list of candidate terms which is then filtered by a manual pass. |
| P96-1030 18 19:171 (DeRose, 1988; Cutting et al. , 1992; Church, 1988). |
| P98-2208 19 11:111 Tagging systems have used either a statistical approach where a large corpora is employed to train a probabilistic model which then is used to tag unseen text, (e.g. , Church (1988), Cutting et al. |
| P98-2208 20 36:111 However, the observation that our constraints are localized to a window of a small number of tokens (say at most 5 tokens in a sequence), suggests a more efficient scheme originally used by Church (1988). |
| J99-4003 21 227:812 These are the same distributions that are needed by previous POS-based language models (Equation 5) and POS taggers (Church 1988; Charniak et al. 1993). |
| P98-1034 22 13:227 Church's PARTS program (1988), on the other hand, uses a probabilistic model automatically trained on the Brown corpus to locate core noun phrases as well as to assign parts of speech. |
| J93-2004 23 135:383 During the early stages of the Penn Treebank project, the initial automatic POS assignment was provided by PARTS (Church 1988), a stochastic algorithm developed at AT&T Bell Labs. |
| W96-0305 24 69:175 For simplicity, we adapted the method proposed by Churchl(1988) to tag the definition sentence. |
| A00-1042 25 155:178 Church, Kenneth Ward (1988) "A stochastic parts program and noun phrase parser for unrestricted text", in Proceedings of the Second Conference on Applied Natural Language Processing, pp. |
| P97-1030 26 13:188 1 Introduction The last few years have seen the great success of stochastic part-of-speech (POS) taggers (Church, 1988: Kupiec, 1992; Charniak et M. , 1993; Brill, 1992; Nagata, 1994). |
| P96-1008 27 6:165 Introduction A recent trend in natural language processing has been toward a greater emphasis on statistical approaches, beginning with the success of statistical part-of-speech tagging programs (Church 1988), and continuing with other work using statistical part-of-speech tagging programs, such as BBN PLUM (Weischedel et al. 1993) and NYU Proteus (Grishman and Sterling 1993). |
| C96-2114 28 39:184 Kallgren (1996) gives a more covering description of how XPOST is used on the Swedish material and also sketches the major differences between this algorithm and some others used for tagging, such as PARTS (Church 1988) and VOLSUNGA (DeRose 1988). |
| J02-3002 29 481:581 As Church (1988) rightly pointed out, however, Proper nouns and capitalized words are particularly problematic: some capitalized words are proper nouns and some are not. |
| C94-1025 30 104:199 4 TAGGING ALGORITHM Starting point for the implementation of a feature structure tagger was a second-0rdcr-IIMM tagger (trigrams) based on a modified version of the Viterbi algorithm (Viterbi, 1967; Church, 1988) which we had earlier implemented in C (Kempe,1994). |
| C98-1060 31 82:157 4 Morphological Disambiguation There are two kinds of methods for morphological disambiguation: on one hand, statistical methods need little effort and obtain very good results (Church, 1988; Cutting el al., 1992), at least when applied to English, but when we try to apply them to Basque we encounter additional problems; on the other hand, some rule-based systems (Brill, 1992; Voutilainen et al., 1992) are at least as good as statistical systems and are better adapted to free-order languages and agglutinative languages. |
| C94-1027 32 19:202 They report rates of correctly tagged words which are comparable to that presented by Church (1988) and Kempe (1993). |
| P97-1008 33 97:142 3.2 Data We used a statistical part-of-speech tagger (Church, 1988) and pattern matching and concordancing tools (due to David Yarowsky) to identify transitive main verbs and head nouns of the corresponding direct objects in 44 million words of 1988 Associated Press newswire. |
| J93-1007 34 229:754 We preprocessed the corpus with a stochastic part-of-speech tagger developed at Bell Laboratories by Ken Church (Church 1988). |
| J93-1007 35 157:754 Speech recognition (Bahl, Jelinek, and Mercer 1983) and text compression (e.g. , Bell, Witten, and Cleary 1989; Guazzo 1980) have been of long-standing interest, and some new applications are currently being investigated, such as machine translation (Brown et al. 1988), spelling correction (Mays, Damerau, and Mercer 1990; Church and Gale 1990), parsing (Debili 1982; Hindle and Rooth 1990). |
| J93-1007 36 224:754 Stochastic part-of-speech taggers such as those in Church (1988) and 6 This fact is being seriously challenged by current research (e.g. , Abney 1990; Hindle 1983), and might not be true in the near future. |
| H90-1055 37 1:182 Deducing Linguistic Structure from the Statistics of Large Corpora Eric Brill~ David Magerman~ Mitchell Marcus~ Beatrice Santorini Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 1 Introduction Within the last two years, approaches using both stochastic and symbolic techniques have proved adequate to deduce lexical ambiguity resolution rules with less than 3-4% error rate, when trained on moderate sized (500K word) corpora of English text (e.g. Church, 1988; Hindle, 1989). |
| H90-1055 38 163:182 The material is first processed using Ken Church's tagger (Church 1988), which labels it as if it were Brown Corpus material, and then is mapped to our tagset by a SEDscript. |
| H90-1055 39 37:182 In the last decade, research in speech recognition (Jelinek 1985), noun classification (Hindle 1988), predicate argument relations (Church & Hanks 1989), and other areas have shown that mutual information statistics provide a wealth of information for solving these problems. |
| E99-1018 40 9:133 On the other hand, according to the data-driven approach, a frequency-based language model is acquired from corpora and has the forms of ngrams (Church, 1988; Cutting et al. , 1992), rules (Hindle, 1989; Brill, 1995), decision trees (Cardie, 1994; Daelemans et al. , 1996) or neural networks (Schmid, 1994). |
| C98-2118 41 68:121 This is described in more detail in the original publication (Church, 1988). |
| C98-2118 42 62:121 Although we considered a number of algorithms, we decided to use the trigram algorithm described by Church (1988) for tagging. |
| J99-4005 43 19:174 Part-of-Speech Tagging The prototype source-channel application in natural language is part-of-speech tagging (Church 1988). |
| P89-1015 44 23:224 The most successful and comprehensive of these are based on probabilistic modeling of category sequence and word category (Church 1987; Garside, Leech and Sampson 1987; DeRose 1988). |
| W97-0902 45 49:125 3 The preprocessing stage The noun phrase parser identifies simple non-recursive noun phrases such as Det+Adj+N or N+N. The method used for this process involves an algorithm of the type described in Church (1988) which was trained on a manually marked part of our corpus. |
| J93-1001 46 132:408 As a result, the empirical approach has been adopted by almost all contemporary part-of-speech programs: Bahl and Mercer (1976), Leech, Garside, and Atwell (1983), Jelinek (1985), Deroualt and Merialdo (1986), Garside, Leech, and Sampson (1987), Church (1988), DeRose (1988), Hindle (1989), Kupiec (1989, 1992), Ayuso et al. |
| H91-1026 47 6:200 There has been quite a bit of recent work on sentence alignment, e.g., (Brown, Lai and Mercer, 1990, (Kay and Rbscheisen, 1988), (Catizone, Russell, and Warwick, to appear); we use a method described in (Gale and Church, 1991) which makes use of the fact that the length of a text (in characters) i~ 5ighly correlated (0.991) with the length of its translation. |
| J90-3003 48 621:623 Hindle (1983); lexical lookup and category disambiguation are done by tlhe stochastic parser described in Church (1988). |
| P00-1015 49 11:153 Figure 1: An example sentence with baseNP brackets A number of researchers have dealt with the problem of baseNP identification (Church 1988; Bourigault 1992; Voutilainen 1993; Justeson & Katz 1995). |
| P03-1065 50 90:210 The shallow parser constructs Verb Groups (VGs) and basic Noun Phrases (NPs), also called BaseNPs [Church 1988]. |
| W96-0205 51 88:300 Generalized Forward Backward Reestimation Generalization of the Forward and Viterbi Algorithm In English part of speech taggers, the maximization of Equation (1) to get the most likely tag sequence, is accomplished by the Viterbi algorithm (Church, 1988), and the maximum likelihood estimates of the parameters of Equation (2) are obtained from untagged corpus by the ForwardBackward algorithm (Cutting et al. , 1992). |
| P07-2053 52 65:70 As we said at the out211 set, we don?t necessarily believe HunPos to be in any way better than TnT, and certainly the main ideas have been pioneered by DeRose (1988), Church (1988), and others long before this generation of HMM work. |
| W98-0702 53 173:226 As shown in Table 7, an event recall of 67.7% was achieved by the classification rule, as compared to speech tagging (Church, 1988; Alien, 1995). |
| E91-1025 54 47:100 (Garside 1987, Marshall 1987, DeRose 1988, Church 1988, Ejerhed 1987, O'Shaughnessy 1989). |
| W98-1207 55 76:234 (Church, 1988) used a simple mechanism to mark the boundaries of NPs. |
| P99-1009 56 18:119 Much previous work has been done on this problem and many different methods have been used: Church's PARTS (1988) program uses a Markov model; Bourigault (1992) uses heuristics along with a grammar; Voutilainen's NPTool (1993) uses a lexicon combined with a constraint grammar; Juteson and Katz (1995) use repeated phrases; Veenstra (1998), Argamon, Dagan & Krymolowski(1998) and Daelemaus, van den Bosch & Zavrel (1999) use memory-based systems; Ramshaw & Marcus (In Press) and Cardie & Pierce (1998) use rule-based systems. |
| W97-1505 57 19:157 POS Tagger Wall Street Journal-trained trigram tagger (Church, 1988) extended to output and Lex Prob N-best POS sequences (Soong and Huang, 1990). |
| P91-1023 58 68:211 We used the same procedure which is used in (Church, 1988). |
| P91-1030 59 30:189 Discovering Lexical Association in Text A 13 million word sample of Associated Press new stories from 1989 were automatically parsed by the Fidditch parser (Hindle 1983), using Church's part of speech analyzer as a preprocessor (Church 1988). |
| J95-2004 60 171:471 We empirically compared our tagger with Eric Brill's implementation of his tagger, and with our implementation of a trigram tagger adapted from the work of Church (1988) that we previously implemented for another purpose. |
| J95-2004 61 15:471 Unlike stochastic approaches to part-of-speech tagging (Church 1988; Kupiec 1992; Cutting et al. 1992; Merialdo 1990; DeRose 1988; Weischedel et al. 1993), up to now the knowledge found in finite-state taggers has been handcrafted and was not automatically acquired. |
| J93-3003 62 431:479 And, for text-to-speech applications, it 7 The parbof-speech tagger employed in this analysis (Church 1988) uses a subset of the part-of-speech tags used in Francis and Ku~era (1982). |
| J93-3003 63 351:479 We also examined each item's part of speech, using Church's (1988) part-of-speech tagger. |
| W97-0213 64 50:184 (Bald and Mercer, 1976; Jelinek, 1985; Church, 1988)), with transformational role-based methods (Brill, 1993) and grammatico-statistical hybrids (e.g. |
| W97-0110 65 7:227 Recent work involves novel ways to employ annotated corpus in part of speech tagging (Church 1988) (Derose 1988) and the application of mutual information statistics on the corpora to uncover lexical information (Church 1989). |
| C96-1041 66 12:246 '\['here are three main approaches in tagging problem: rule-based approach (Klein and Simmons 1%3; Brodda 1982; Paulussen and Martin 1992; Brill et al. 1990), statistical approach (Church :1988; Merialdo 1994; Foster 1991; Weischedel et al. 1993; Kupiec 1992) and connectionist approach (Benello et al. 1989; Nakanmra et al. 1989). |
| P96-1041 67 4:170 1 Introduction Smoothing is a technique essential in the construction of n-gram language models, a staple in speech recognition (Bahl, Jelinek, and Mercer, 1983) as well as many other domains (Church, 1988; Brown et al. , 1990; Kernighan, Church, and Gale, 1990). |
| P96-1041 68 164:170 In addition, it would be interesting to see whether these results extend to fields other than language modeling where smoothing is used, such as prepositional phrase attachment (Collins and Brooks, 1995), part-of-speech tagging (Church, 1988), and stochastic parsing (Magerman, 1994). |
| P98-2123 69 70:123 This is described in more detail in the original publication (Church, 1988). |
| P98-2123 70 64:123 Although we considered a number of algorithms, we decided to use the trigram algorithm described by Church (1988) for tagging. |
| J92-1001 71 325:480 Although it may not be possible to completely disambiguate all words prior to parsing, approaches based on 18 Susan W. McRoy Using Multiple Knowledge Sources stochastic information have been quite successful (Church 1988; Garside, Leech, and Sampson 1987; de Marcken 1990). |
| W96-0206 72 22:234 This method is commonly used for part-of-speech tagging (Church, 1988). |
| P94-1032 73 239:252 In contrast, Church (1988) tests a text and extracts the simple noun phrases only. |
| P94-1032 74 34:252 Previous Works Church (1988) proposes a part of speech tagger and a simple noun phrase extractor. |
| W99-0608 75 54:185 2.2 STT: A Statistical Tree-based Tagger The aim of statistical or probabilistic tagging (Church, 1988; Cutting et al. , 1992) is to assign the most likely sequence of tags given the observed sequence of words. |
| C00-2105 76 12:248 Church's noun phrase tagger (Church, 1988), one of the first; noun ehunkers, was based on a Hidden Markov Model (HMM) similar to those used * Thanks to Mats Rooth and Uli IIeid for many helpflfl comirlonts. |
| J99-2004 77 90:445 Supertags Part-of-speech disambiguation techniques (POS taggers) (Church 1988; Weischedel et al. 1993; Brill 1993) are often used prior to parsing to eliminate (or substantially reduce) the part-of-speech ambiguity. |
| J99-2004 78 207:445 The words are first assigned standard parts of speech using a conventional tagger (Church 1988) and then are assigned supertags according to the unigram model. |
| A94-1024 79 38:168 A completely different approach to tagging uses statistical methods, (e.g. , (Church, 1988; Cutting et al. , 1993)). |
| P97-1032 80 10:153 1987: Church 1988), Hidden Markov models (cf. |
| P97-1032 81 89:153 (Church 1988), (DeRose 1988) and numerous other articles. |
| W00-0721 82 10:97 Our goal is to come up with a mechanism that, given an input string, identifies the phrases in this string, this is a fundamental task with applications in natural language (Church, 1988; Ramshaw and Marcus, 1995; Mufioz et al. , 1999; Cardie and Pierce, 1998). |
| J01-4004 83 32:460 Our part-of-speech tagger is a standard statistical tagger based on the Hidden Markov Model (HMM) (Church 1988). |
| W95-0101 84 6:188 Almost all of the work in the area of automatically trained taggers has explored Markov-model based part of speech tagging \[Jelinek, 1985; Church, 1988; Derose, 1988; DeMarcken, 1990; Cutting et al. , 1992; Kupiec, 1992; Charniak et al. , 1993; Weischedel et al. , 1993; Schutze and Singer, 1994; Lin et al. , 1994; Elworthy, 1994; Merialdo, 1995\]. |
| C92-1033 85 60:255 Perhaps more imlxmantly, elimination of word-level lexical ambiguity allows the parser to make projection about the input which is yet to be parsed, using a simple lookabead; in particular, phrase boundaries can be determined with a degree of confidence (Church, 1988). |
| W00-1211 86 8:64 In English BNP (base noun phrase) is defined as simple and non-nesting noun phrases, i.e. noun phrases that do not contain other noun phrase descendants (Church, 1988). |
| C00-1046 87 7:140 M~tior works done to create English POS taggers (henceforth, "taggers"), for example, include (Church 1988), (Kupicc 1992), (Brill 1992)and (Voutilaincn et al. 1992). |
| A94-1011 88 93:156 Typical examples of linguistically sophisticated annotation include tagging words with their syntactic category (although this has not been found to be effective for 1R), lemma of the word (e.g. "corpus" for "corpora"), phrasal information (e.g. identifying noun groups and phrases (Lewis 1992c, Church 1988)), and subject-predicate identification (e.g. Hindle 1990). |
| A94-1011 89 95:156 This is achieved in a manner similar to Church's (1988) PARTS algorithm used by Lewis (1992bc), in the sense that its main properties are robustness and corpus sensitivity. |
| W96-0209 90 33:170 PART:OF-SPEECH TAG SEQUENCE GRAMMAR We utilised the ANLT metagrammatical formalism to develop a feature-based, declarative description of part-of-speech (PoS) label sequences (see e.g. Church, 1988) for English. |
| N03-1035 91 163:205 Simplex or complex NPs (e.g. , Church 1988; Hindle and Rooth 1991; Wacholder 1998) identify simplex or base NPs NPs which do not have any component NPs -at least in part because this bypasses the need to solve the quite difficult attachment problem, i.e., to determine which simpler NPs should be combined to output a more complex NP. |
| W93-0111 92 13:372 Excellent methods have been developed for part-of-speech (POS) tagging using stochastic models trained on partially tagged corpora (Church, 1988; Cutting, Kupiec, Pedersen & Sibun, 1992). |
| J00-4004 93 349:655 6 Similar baselines for comparison have been used for many classification problems (Duda and Hart 1973), e.g., part-of-speech tagging (Church 1988; Allen 1995). |
| J97-3003 94 263:319 When we removed from the lexicon all the hapax words and, following the recommendation of Church (1988), all the capitalized words with frequency less than 20, we obtained some 51,522 unknown word-tokens (25,359 wordtypes) out of more than a million word-tokens in the Brown Corpus. |
| J97-3003 95 17:319 As argued in Church (1988), who proposes a more elaborated heuristic, Dermatas and Kokkinakis (1995) proposed a simple probabilistic approach to unknown-word guessing: HCRC, Language Technology Group, University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK. |
| P91-1027 96 114:189 212 (Church, 1988) to distinguish between to as an infinitive marker and to as a preposition. |
| J01-2002 97 311:621 The HMM approach to tagging is by far the most studied and applied (Church 1988; DeRose 1988; Charniak 1993). |
| N06-1042 98 55:255 In the statistical approach a hand tagged corpus is used to train a probabilistic model which is then used to select the best tags in unseen text (Church, 1988; Hakkani-Tcurrency1ur et al. , 2002). |
| P97-1023 99 36:180 3 Data Collection For our experiments, we use the 21 million word 1987 Wall Street Journal corpus 4, automatically annotated with part-of-speech tags using the PARTS tagger (Church, 1988). |
| J98-3005 100 460:614 After tagging the corpus using the POS part-of-speech tagger (Church 1988), we used a CREP (Duford 1993) regular grammar to first extract all possible candidates for entities. |
| C00-1044 101 55:188 We use a shallow parser to retrieve from a large corpus tagged for part-of-speech with Church's PARTS tagger (Church, 1988) all adjectives and their modifiers. |
| J93-2002 102 334:359 Finally, the lexical ambiguity problem could probably be reduced substantially in the applied context by using a statistical tagging program (Brill 1992; Church 1988). |
| P92-1032 103 19:229 Much of this work offers the prospect that a disambiguation system might be able to input unrestricted text and tag each word with the most likely sense with fairly reasonable accuracy and efficiency, just as part of speech taggers (e.g. , Church (1988)) can now input unrestricted text and assign each word with the most likely part of speech with fairly reasonable accuracy and efficiency. |
| J95-3004 104 97:411 Methods using the short context of a word in order to resolve ambiguity (usually categorical ambiguity) are very common in English and other languages (DeRose 1988; Church 1988; Karlsson 1990). |
| J95-3004 105 91:411 A much simpler problem occurs in English, where for some words the correct syntactic tag is necessary for pronunciation (Church 1988). |
| P96-1010 106 66:192 1 We calculate P(W) using the tag sequence of W as an intermediate quantity, and summing, over all possible tag sequences, the probability of the sentence with that tagging; that is: P(W) = ~ P(W, T) T where T is a tag sequence for sentence W. The above probabilities are estimated as is traditionally done in trigram-based part-of-speech tagging (Church, 1988; DeRose, 1988): P(W,T) = P(WIT)P(T ) (1) = HP(wi\[ti) HP(t, lt,_2t,_l)(2) i i where T = tltn, and P(ti\]tl-2ti-1) is the prob ability of seeing a part-of-speech tag tl given the two preceding part-of-speech tags ti-2 and ti-1. |
| J96-2001 107 32:177 In working taggers, a common approach is simply to apply a uniform small probability to the various senses of unseen or low-frequency forms: this was done in the tagger discussed in Church (1988), for example. |
| J96-2001 108 31:177 So, in the UdB corpus, lopen occurs 92 times as an infinitive and 43 times as a finite plural, so the MLE 1 Even models of disambiguation that make use of context, such as statistical n-gram taggers, often presume some estimate of lexical priors, in addition to requiring estimates of the transition probabilities of sequences of lexical tags (Church 1988; DeRose 1988; Kupiec 1992), and this again brings up the question of what to do about unseen or low-frequency forms. |
| C90-3010 109 13:125 Stochastic approaches to linguistic analyses have been strongly reevaluated in the past few years, either for syntactic analysis (Gm'side et al. 1987, Church 1988), or for NLP applications (Brown et al. 1988), or for semantic analysis (Zemik 1989, Smadja 1989). |
| E95-1022 110 30:171 157 ena or the linguist's abstraction capabilities (e.g. knowledge about what is relevant in the context), they tend to reach a 95-97% accuracy in the analysis of several languages, in particular English (Marshall 1983; Black et aL 1992; Church 1988; Cutting et al. 1992; de Marcken 1990; DeRose 1988; Hindle 1989; Merialdo 1994; Weischedel et al. 1993; Brill 1992; Samuelsson 1994; Eineborg and Gamb~ick 1994, etc.). |
| P91-1037 111 75:158 We test these possibilities by examining part-of-speech in a window of four words surrounding each potential phrase break, using Church's part-of-speech tagger (1988). |
| P91-1037 112 154:158 In future, we plan to extend the set of variables for analysis to include counts of stressed syllables, automatic NP-detection (Church, 1988), MUTUAL INFORMATION, GENERALIZED MUTUAL INFORMATION scores can serve as indicators of intonational phrase boundaries (Magerman and Marcus, 1990). |
| C00-2089 113 57:178 The lirst probabilistic approach was proposed in (Church, 1988). |
| C00-2089 114 124:178 , 1993), considering only the NP chunts (lefine~l by (Church, 1988) and using tile models that we have presented above. |
| C00-2089 115 22:178 In tiffs case, the language inodel can be estimated from a labelled corpus (supervised methods) (Church, 1988)(Weisehedel et al. , 1.993) or from a nonlabelled corpus (unsupervised methods) (Cutting et 21. |
| C98-1034 116 13:225 Church's PARTS program (1988), on the other hand, uses a probabilistic model automatically trained on the Brown corpus to locate core noun phrases as well as to assign parts of speech. |
| J93-1005 117 40:297 VING part-of-speech analyzer as a preprocessor (Church 1988), a combination that we will call simply "the parser". |
| W96-0207 118 13:304 Part-of-speech tagging systems have used either a statistical approach where a large corpora has been used to train a probabilistic model which then has been used to tag new text, assigning the most likely tag for a given word in a given context (e.g. , Church (1988), Cutting et al. |
| A00-1026 119 35:141 Several approaches provide similar output based on statistics (Church 1988, Zhai 1997, for example), a finite-state machine (AitMokhtar and Chanod 1997), or a hybrid approach combining statistics and linguistic rules (Voutilainen and Padro 1997). |
| C90-3030 120 208:219 The most successful achievements so far in the domain of large-scale morphological disambiguation of running text have been those for English reported by Garside, Leech, and Sampson (1987), on tagging the LOB corpus, and Church (1988), on assigning part-of-speech labels and parsing noun phrases. |
| P96-1042 121 49:240 2This version of the method uses Bayes' theorem ~ (Church, 1988). |
| P96-1042 122 23:240 In statistical NLP, probabilistic classifiers are often used to select a preferred analysis of the linguistic structure of a text (for example, its syntactic structure (Black et al. , 1993), word categories (Church, 1988), or word senses (Gale, Church, and Yarowsky, 1993)). |
| W00-0737 123 6:107 1 Introduction The idea of using statistics for chunking goes back to Church(1988), who used corpus frequencies to determine the boundaries of simple nonrecursive noun phrases. |
| W00-1320 124 23:288 2.2 Motivation from previous work 2.2.1 Parsing In recent years, the success of statistical parsing techniques can be attributed to several factors, such as the increasing size of computing machinery to accommodate larger models, the availability of resources such as the Penn Treebank (Marcus et al. , 1993) and the success of machine learning techniques for lowerlevel NLP problems, such as part-of-speech tagging (Church, 1988; Brill, 1995), and PPattachment (Brill and Resnik, 1994; Collins and Brooks, 1995). |
| P95-1039 125 7:109 1 Motivation Statistical part-of-speech disambiguation can be efficiently done with n-gram models (Church, 1988; Cutting et al. , 1992). |
| W01-0712 126 28:210 The first work on this topic was done back in the eighties (Church, 1988). |
| H91-1065 127 54:189 We replicated the result (Church 1988) that this process is able to predict the parts of speech with only a 3-4% error rate when the possible parts of speech of each the words in the corpus are known. |
| H91-1065 128 10:189 The effectiveness of such models is well known (Church 1988) and they are currently in use in parsers (e.g. de Marcken 1990). |
| W97-0314 129 65:235 In (Church,1988) a well-know purely statistical method for POS tagging is applied to the derivation of simple noun phrases that are relevant in the underlying corpus. |
| C90-3086 130 59:85 This will partly be based on another important step in the process, namely the construction of constituents, in particular noun phrases and prepositional phrases (Church 1988, Kfillgren 1984c), and partly on a more general algorithm that for pairs or longer sequences of tags calculates the relative probability of alternative tag assignments. |
| W01-0706 131 9:129 Thus, over the past few years, along with advances in the use of learning and statistical methods for acquisition of full parsers (Collins, 1997; Charniak, 1997a; Charniak, 1997b; Ratnaparkhi, 1997), significant progress has been made on the use of statistical learning methods to recognize shallow parsing patterns syntactic phrases or words that participate in a syntactic relationship (Church, 1988; Ramshaw and Marcus, 1995; Argamon et al. , 1998; Cardie and Pierce, 1998; Munoz et al. , 1999; Punyakanok and Roth, 2001; Buchholz et al. , 1999; Tjong Kim Sang and Buchholz, 2000). |
| J00-3003 132 213:607 When applied to a discourse model with locally decomposable likelihoods and Markovian discourse grammar, it will therefore find precisely the DA 348 Stolcke et al. Dialogue Act Modeling sequence with the highest posterior probability: U* = argmaxP(UIE ) (4) u The combination of likelihood and prior modeling, HMMs, and Viterbi decoding is fundamentally the same as the standard probabilistic approaches to speech recognition (Bahl, Jelinek, and Mercer 1983) and tagging (Church 1988). |
| C92-2085 133 219:221 (199i) \[4\] Kenneth Ward Church: A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text 2nd Uon/erer~ce on A.N.L.P (1988) \[5\] Donald llindle anti Mats Rooth: Structural Ambiguity and l,exical Relations ~9th Conference of the A.C.L. |
| P02-1055 134 126:174 task (Church, 1988; Brill, 1993; Ratnaparkhi, 1996; Daelemans et al. , 1996), and reported errors in the range of 26% are common. |
| P89-1010 135 102:176 There are concordancing programs (see Figure 1 at the end of this paper), which are basically KWIC (key word in context \[Aho, Kernighan, and Weinberger (1988), p. 122\]) indexes with additional features such as the ability to extend the context, sort leftwards as well as rightwards, and so on. |
| P89-1010 136 90:176 We have found that if we first tag every word in the corpus with a part of speech using a method such as \[Church (1988)\], and then measure associations between tagged words, we can identify interesting contrasts between verbs associated with a following preposition to~in and verbs associated with a following infinitive marker to~to. |
| J95-4004 137 144:404 Part-of-speech tagging is an active area of research; a great deal of work has been done in this area over the past few years (e.g. , Jelinek 1985; Church 1988; Derose 1988; Hindle 1989; DeMarcken 1990; Merialdo 1994; Brill 1992; Black et al. 1992; Cutting et al. 1992; Kupiec 1992; Charniak et al. 1993; Weischedel et al. 1993; Schutze and Singer 1994). |
| J95-4004 138 154:404 Almost all recent work in developing automatically trained part-of-speech taggers has been on further exploring Markovmodel based tagging (Jelinek 1985; Church 1988; Derose 1988; DeMarcken 1990; Merialdo 1994; Cutting et al. 1992; Kupiec 1992; Charniak et al. 1993; Weischedel et al. 1993; Schutze and Singer 1994). |
| J95-4004 139 10:404 A number of part-of-speech taggers are readily available and widely used, all trained and retrainable on text corpora (Church 1988; Cutting et al. 1992; Brill 1992; Weischedel et al. 1993). |
| A97-1033 140 57:187 After tagging the corpus using the POS part-of-speech tagger (Church, 1988), we used a CREP (Duford, 1993) regular grammar to first extract all possible candidates for entities. |
| A97-1029 141 5:168 More recently, the natural language processing community has effectively employed these models for part-ofspeech tagging, as in the seminal (Church, 1988) and other, more recent efforts (Weischedel et al. , 1993). |
| J98-1003 142 177:551 For simplicity, we adopted the method proposed by Church (1988) to tag definition sentences. |
| P94-1041 143 172:309 The algorithm runs in lockstep with a part-of-speech tagger (Church, 1988), which is used for deciding possible word replacements. |
| P94-1041 144 199:309 299 Part-of-speech tagging is the process of assigning to a word the category that is most probable given the sentential context (Church, 1988). |
| W97-0318 145 134:178 3Similar baselines for comparison have been used for many classification problems (Duda and Hart, 1973), e.g., part-of-speech tagging (Church, 1988; Allen, 1995). |
| C90-3063 146 44:88 This problems is consid~ ered very important when dealing with a corpus: it was the re,Leon for the substantial human intervention in the procedure of \[Grishman et al. 1986\], and it is the reason why other techniques use manually tagged corpora (e.g. \[Church 1988\]). |
| A92-1020 147 19:128 These include the availability of efficient and high precision word segmentation methods for Chinese text \[Chang et al. , 1991; Sproat and Shih, 1990; Wang et al. , 1990\], the availability of statistical analysis of a Chinese corpus \[Liu et al. , 1975\] and large-scale electronic Chinese dictionaries with partof-speech information \[Chang et al. , 1988; BDC, 1992\], the corpus-based statistical part-of-speech tagger \[Church, 1988; DeRose, 1988; Beale, 1988\], as well as phrasal and clausal analyzers \[Church 1988; Ejerhed 1990\] 2. |
| J90-1003 148 100:232 We have found that if we first tag every word in the corpus with a part of speech using a method such as Church (1988), and then measure associations between tagged words, we can identify interesting contrasts between verbs associated with a following preposition to~in and verbs associated with a following infinitive marker to~to. |
| J90-1003 149 124:232 These are concordancing programs (see Figure 1), which are basically KWIC (key word in context; Aho et al. 1988) indexes with additional features such as the ability to extend the context, sort leftward as well as rightward, and so on. |
| W00-1309 150 32:149 precision + recall 1 HMM-based Chunk Tagger The idea of using statistics for chunking goes back to Church(1988), who used corpus frequencies to determine the boundaries of simple non-recursive noun phrases. |
| J97-2002 151 145:474 The Satz tokenizer is implemented using the UNIX tool LEX (Lesk and Schmidt 1975) and is modeled on the tokenizer used by the PARTS part-of-speech tagger (Church 1988). |
| J97-2002 152 266:474 In initial experiments we used the extensive lexicon from the PARTS part-of-speech tagger (Church 1988), which contains 30,000 words. |
| W95-0107 153 40:265 At about the same time, Ejerhed (1988), working with Church, performed comparisons between finite state methods and Church's stochastic models for identifying both non-recursive clauses and non-recursive NPs in English text. |
| W95-0107 154 37:265 Using statistical methods, Church's Parts program (1988), in addition to identifying parts of speech, also inserted brackets identifying core NPs. |
| J94-1002 155 91:484 POS labels were given by Church's tagger (Church 1988) and syntactic constituents by Hindle's parser (Hindle 1987). |
| W96-0303 156 34:172 The frequency with which a given word form is associated with a particular lexical entry (i.e. sense or grammatical realization) is often highly skewed; Church (1988) points out that a model of part-of-speech assignment in context will be 90% accurate (for English) if it simply chooses the lexically most frequent part-of-speech for a given word. |
| I05-2022 157 22:163 The earliest work on chunking based on machine learning goes to (Church K, 1988) for English. |
| P99-1004 158 51:249 That is, the data consisted of the verb-object cooccurrence pairs in the 1988 Associated Press newswire involving the 1000 most frequent nouns, extracted via Church's (1988) and Yarowsky's processing tools. |
| P03-1003 159 20:185 Being inspired by the success of noisy-channel-based approaches in applications as diverse as speech recognition (Jelinek, 1997), part of speech tagging (Church, 1988), machine translation (Brown et al. , 1993), information retrieval (Berger and Lafferty, 1999), and text summarization (Knight and Marcu, 2002), we develop a noisy channel model for QA. |
| J96-1001 160 572:576 For example, previous work has addressed low-level tasks such as tagging a free-style corpus with part-of-speech information (Church 1988), aligning a bilingual corpus (Gale and Church 1991b; Brown, Lai, and Mercer 1991), and producing a list of collocations (Smadja 1993). |
| W94-0107 161 16:116 This is what the parser is expected to do for disambiguating the standard POS, unless a separate POS disambiguation module is used (Church, 1988). |
| W97-0201 162 7:114 The high accuracy achieved by a corpus-based approach to part-of-speech tagging and noun phrase parsing, as demonstrated by (Church, 1988), has inspired similar approaches to other problems in natural language processing, including syntactic parsing and word sense disambiguation (WSD). |
| H90-1069 163 6:150 Research efforts at IBM \[Chodorow, et al. 1988; Neff, et al. 1989\], Bell Labs \[Church, et al. 1989\], New Mexico State University \[Wilks 1987\], and elsewhere have used mechanical processing of on-line dictionaries to infer at least minimal syntactic and semantic information from dictionary definitions. |
| J96-2003 164 39:395 Secondly, we can construct automatic part-ofspeech taggers and process untagged corpora (Kupiec 1992; Black, Garside, and Leech 1993); this method boasts a high degree of accuracy, although often the construction of the automatic tagger involves a bootstrapping process based on a core corpus which has been manually tagged (Church 1988). |
| W06-1701 165 78:143 But even if the MA is tight, a considerable proportion of ambiguous tokens will come from legitimate but rare analyses of frequent types (Church, 1988). |
| N03-1033 166 4:202 Regardless of whether one is using HMMs, maximum entropy conditional sequence models, or other techniques like decision trees, most systems work in one direction through the sequence (normally left to right, but occasionally right to left, e.g., Church (1988)). |
| C96-2136 167 54:224 It is used,as tagging mode\[ in English (Church, 1988; Cutting et al. , 1992) and morphological analysis nlodel (word segmentation and tagging) in Japanese (Nagata, 1994). |
| C96-1058 168 53:239 , we folk)w standard prn(-tice (Church, 1988) in n-gram tagging hy using (3) to al)proxitllate the lit'st term in (2). |
| C96-1058 169 37:239 2.1 Model A: Bigram lexieal affinities N-gram tatters like (Church, 1988; .lelinek 1985; Kupiec 1992; Merialdo 1990) take the following view of \]row ~/, tagged sentctrce enters the worhl. |
| E95-1003 170 34:150 This is quite feasible using statistical taggers like those of Garside (1987), Church (1988) or Foster (1991) which achieve performance upwards of 97% on unrestricted text. |
| W05-0706 171 96:188 (Church, 1988; Charniak et al. , 1993). |
| J95-2001 172 14:410 Stochastic taggers use both contextual and morphological information, and the model parameters are usually defined or updated automatically from tagged texts (Cerf-Danon and E1-Beze 1991; Church 1988; Cutting et al. 1992; Dermatas and Kokkinakis 1988, 1990, 1993, 1994; Garside, Leech, and Sampson 1987; Kupiec 1992; Maltese * Department of Electrical Engineering, Wire Communications Laboratory (WCL), University of Patras, 265 00 Patras, Greece. |
| C98-1063 173 64:250 A Church-style tagger (Church, 1988) lln a few cases, tile loss of prepositions presents a problem. |
| W07-0813 174 41:199 For example, (Church, 1988) uses the simple heuristic of predicting proper nouns from capitalization. |
| W96-0208 175 163:170 Similar comparisons of a range of algorithms should also be performed on other natural language problems such as part-of-speech tagging (Church, 1988), prepositional phrase attachment (Hindle & Rooth, 1993), anaphora resolution (Anoe ~ Bennett, 1995), etc Since the requirements of individual tasks vary, different algorithms may be suitable for different sub-problems in natural language processing. |
| W99-0634 176 23:181 Our part-of-speech tagger is a standard sta285 tistical bigram tagger based on the Hidden Markov Model (HMM) (Church, 1988). |
| H94-1034 177 74:235 Part.of-Speech Tagging Part-of-speech tagging is the process of assigning to a word the category that is most probable given the sentential context (Church, 1988). |
| W00-1201 178 4:100 1 Introduction Ever since the success of HMMs' application to part-of-speech tagging in (Church, 1988), machine learning approaches to natural language processing have steadily become more widespread. |
| H93-1045 179 152:170 Church, K. A (1988), "Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text," Proceedings of the Second Conference on Applied Natural Language Processing, ACL, 1988, 136-143. |
| A00-2013 180 5:145 1 Full Morphological Tagging English Part of Speech (POS) tagging has been widely described in the recent past, starting with the (Church, 1988) paper, followed by numerous others using various methods: neural networks (Julian Benello and Anderson, 1989), HMM tagging (Merialdo, 1992), decision trees (Schmid, 1994), transformation-based error-driven learning (Brill, 1995), and maximum entropy (Ratnaparkhi, 1996), to select just a few. |
| H90-1052 181 34:195 Discovering Lexical Association in Text A 13 million word sample of Associated Press new stories from 1989 were automatically parsed by the Fidditch parser (Hindle 1983), using Church's part of speech analyzer as a preprocessor (Church 1988). |
| P97-1059 182 26:219 The training of the HMM can be done on either a tagged or untagged corpus, and is not a topic of this paper since it is exhaustively described in the literature (Bahl and Mercer, 1976; Church, 1988). |
| C98-1077 183 60:183 485 3 The Model Instead of employing the source-channel paradigm for tagging (more or less explicitly present e.g. in (Merialdo, 1992), (Church, 1988), (HajiS, HladkA, 1997)) used in the past (notwithstanding some exceptions, such as Maximum Entropy and rule-based taggers), we are using here a "direct" approach to modeling, for which we have chosen an exponential probabilistic model. |
| W00-0726 184 132:150 By then, Church (1988) had already reported on recognition of base noun phrases with statistical methods. |
| P98-1066 185 64:247 A Church-style tagger (Church, 1988) tin a few cases, the loss oI prepositions presents a problem. |
| J93-1004 186 124:365 We used the same procedure that is used in Church (1988). |
| W98-1205 187 22:205 The training of the HMM can be done on either a tagged or untagged corpus, and is not a topic of this paper since it is exhaustively described in the literature (Bahl and Mercer, 1976; Church, 1988). |
| A00-1024 188 67:236 Vv'e use an in-house statistical tagger (based on (Church, 1988)) to tag the text in which the unknown word occurs. |
| C00-2141 189 11:197 Church(1988) proposed a silnple stochastic technique for lecognizing the non-recursive base noun phrases in English. |
| W96-0102 190 18:379 Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). |
| H91-1077 191 28:173 For the purposes of the present paper, it will be assumed that only content words are at issue, and that the syntactic category of all content words in the text that is under study can be determined automatically (Church, 1988; DeRose, 1988). |
| W98-1117 192 147:161 As far as coverage is concerned, our parser can handle recursive structures, which is an advantage compared to simpler techniques such as that described by Church (1988). |
| W98-1117 193 21:161 An archetypal representative of this approach is the method described by Church (1988), who used corpus frequencies to determine the boundaries of simple non, recursive NPs. |
| J93-2006 194 40:573 The effectiveness of such models is well known (DeRose 1988; Church 1988; Kupiec 1989; Jelinek 1985), and they are currently in use in parsers (e.g. de Marcken 1990). |
| J93-2006 195 154:573 Some well-known previous efforts (Church 1988; de Marcken 1990) have dealt with unknown words using various heuristics. |
| J93-2006 196 349:573 In a related test, we explored the bracketings produced by Church's PARTS program (Church 1988). |
| J93-2006 197 28:573 Statistical models based on local information (e.g. , DeRose 1988; Church 1988) might operate effectively in spite of sentence length and unexpected input. |
| P98-1080 198 63:188 485 3 The Model Instead of employing the source-channel paradigm for tagging (more or less explicitly present e.g. in (Merialdo, 1992), (Church, 1988), (Hajji, Hladk~, 1997)) used in the past (notwithstanding some exceptions, such as Maximum Entropy and rule-based taggers), we are using here a "direct" approach to modeling, for which we have chosen an exponential probabilistic model. |
| A94-1009 199 30:191 With the availability of large corpora and fast computers, there has been a recent resurgence of interest, and a number of variations on and alter53 natives to the FB, Viterbi and BW algorithms have been tried; see the work of, for example, Church (Church, 1988), Brill (Brill and Marcus, 1992; Brill, 1992), DeRose (DeRose, 1988) and gupiec (Kupiec, 1992). |
| A94-1009 200 159:191 Kenneth Ward Church (1988). |
| W03-1706 201 89:142 Like baseNP chunking(Church, 1988; Ramshaw & Marcus 1995), content chunk parsing is also a kind of shallow parsing. |
| P94-1013 202 58:225 Decision trees (Brown, 1991) have been usefully applied to word-sense ambiguities, and HMM part-of-speech taggers (Jelinek 1985, Church 1988, Merialdo 1990) have addressed the syntactic ambiguities presented here. |