SI 760 / EECS 597 / LING 702 Language and Information
Mondays, 1:10-3:55 PM
412 West Hall
Dragomir R. Radev
3080 West Hall Connector
Office Hours: TBA
1. Manning and Schütze. Foundations of Statistical Natural Language Processing.
MIT Press. 1999.
2. Oakes. Statistics for Corpus Linguistics. Edinburgh University Press 1998.
1. Jurafsky and Martin. Speech and Language Processing. Prentice-Hall 2000.
2. Cover and Thomas. Elements of Information Theory. John Wiley and Sons 1991.
Several research articles as well as some software documentation will be handed out.
1. The computational study of Language. Linguistic Fundamentals.
2. Mathematical and Probabilistic Fundamentals. Descriptive Statistics. Measures of central tendency. The z score. Hypothesis testing.
3. Information theory. Entropy, joint entropy, conditional entropy. Relative entropy and mutual information. Chain rules. The entropy of English.
4. Working with corpora. N-grams.
5. Language models. Hidden Markov Models. Noisy channel models. Applications to Part-of-speech tagging and other problems.
6. Cluster analysis. Distributional clustering.
7. Collocations. Syntactic criteria for collocability.
8. Literary detective work. The statistical analysis of writing style.
9. Text summarization. Cross-document structure theory.
10. Lexical semantics. WordNet
11. Information Extraction. Question Answering.
12. Word sense disambiguation
13. Lexical acquisition.
14. Paraphrase acquisition
15. Possible additional topics: Text alignment. Statistical machine translation. Discourse segmentation.
Survey paper (15%)