New NSF rule

… NSF will require that either the version of record or the final accepted manuscript in peer-reviewed scholarly journals and papers in juried conference proceedings or transactions must:

“be available for download, reading and *analysis* free of charge no later than 12 months after initial publication”


URLs (December 29, 2010)

Scientists map what factors influence the news agenda

Algorithms unplugged

Linguistics Society of America Annual Awards

IBM’s Watson

New n-gram datasets from Google


Qatar 2022

Proposed stadiums for Qatar 2022

Climbing Mount Publishable

“Untranslateable” words

Story telling (by Aaron Clauset)

Lunch line redesign

Tony Judt on New York

Houellebecq wins Goncourt Prize for his new book “The Map and the Territory”

Change or Perish

SNL’s TSA spoof

Handful of U.S. Schools Claim Larger Share of Output

Decoding the value of Computer Science

In 500 Billion Words, New Window on Culture

NACLO 2008 announced: the North American Computational Linguistics Olympiad

Registration is open for the Second Annual North American
Computational Linguistics Olympiad

Please inform high school students in your area of the the second annual
North American Computational Linguistics Olympiad Open competition, which
will be held on February 5, 2007. Students may participate at one the host
sites listed below or in the internet category. The contest targets high
school students, but middle school students may also participate.

Students can register at:

Top scorers in the Open competition will be eligible to compete in the NACLO
Invitational competition in March, 2007. Top scorers in the Invitational
will be eligible to compete in the International Linguistics Olympiad in
Bulgaria in the summer of 2007. Two US teams competed in the International
Computational Linguistics Olympiad in St. Petersburg in 2007 with great
results, achieving the top score in the individual competition and tying for
first place in the team competition.

Brandeis University
Carnegie Mellon University/University of Pittsburgh
Columbia University
Cornell University
Middle Tennessee State University
San Jose State University
University of Michigan
University of Oregon
University of Pennsylvania
University of Toronto
University of Wisconsin/Edgewood college

If you are not listed here, and you would like to host the contest at
your university, contact Lori Levin, lsl at-symbol

In addition, any student may participate in the Internet category by
finding a local high school or university teacher to facilitate the

About Linguistics Olympiads:

The North American Computational Linguistics Olympiad (NACLO) is the
direct descendant of the Olympiad in Linguistics and Mathematics
founded in 1965 in Moscow, Russia. High school students compete by
solving linguistics and logic problems based on natural
languages. This program is credited with introducing thousands of
Russian students to the field of linguistics, many of whom have gone
on to become prominent professional linguists. NACLO includes
traditional Olympiad problems as well as some computational problems.
This is not a competition that deals with computer technology, but
with all aspects of natural language structure and function, including
computational thinking as it relates to natural language processing.

Thank you very much for your help in raising the profile of our
discipline among secondary school students. Please contact any of the
executive team members below if you have any questions or would like
to be involved in some way, including possibly hosting a competition
in your area and/or submitting a problem for future competitions.

Lori Levin – Co-chair
Thomas E. Payne – Co-chair
Dragomir R. Radev – Program chair and team coach

My favorite corpora

Here are my favorite corpora:

Enron email
CIA world factbook
DBLP: papers in CS
US congressional speeches
AOL queries
Netflix recommendations
PUBMED: biomedical paper abstracts
ACL Anthology
DOTGOV: download of .GOV
biocreative: biomedical papers
WT100G: 100GB download of the web
Google n-grams
SMS corpus
corpus of paraphrases
multilingual parallel parliamentary proceedings
textual entailment corpus
question answering corpus
summarization corpus
various text classification corpora (Reuters-21578, 20NG)