Computational Linguistics and Information Retrieval (CLAIR)

University of Michigan

CSTBank Phase I

Characteristics of families

Family Source(s) No. Clusters Clustering method Publicly available?
duc01 DUC01 data 60 automatic No
duc01trial DUC01 sample data 4 automatic No
duc02 DUC02 data 60 automatic No
duc03 DUC03 data 60 automatic No
hknews HKNews corpus 40 automatic No
manual various online news agencies 10 manual manual.tar.gz
manual2 usenet groups 2 semi-manual manual2.tar.gz
mds online news agencies 6 manual mds.tar.gz
nie NewsInEssence 50 automatic nie.tar.gz
novelty02 TREC2002 Novelty Track 53 automatic No
other misc. 1 automatic No
tdt-pilot Topic Detection and Tracking pilot data 25 automatic No
tdt2 Topic Detection and Tracking 2 100 automatic No