Relational Classification Dataset

This classification dataset contains 380 scientific publications from AAN manually classified into three research areas ("Machine Translation", "Dependency Parsing" and "Summarization"). This is a relational dataset since we have included metadata information for the papers which includes citation information, authorship information, venue information and year of publication.
Here is a description of the files included.
	|-----metadata.txt    Contains the id, title, authorship, venue and the class information for all the papers.  
	|-----papers_text     This directory contains the full text of the 380 papers. We obtained this text by converting 
	|		      the PDF of the paper to text using PDFBox.  
	|-----citations.txt   The file contains citations between ALL the papers in the AAN data set not just the citations 
			      between the 380 papers in the dataset. This is because many link/citation similarity 
			      measures like cocitation or coupling compute similarity between two papers using citations
			      between other papers.

Here is a complete README which explains the selection process for the publications, annotation process and the format of the different files.

