Summary
The ACL Network was originally created by Mark Thomas Joseph and is currently being maintained by Pradeep Muthukrishnan under the supervision of Professor Dragomir R. Radev.
Acknowledgements
A number of students from the University of Michigan's CLAIR Group helped with the work involved to create the data, network, and webpages. We would like to thank:
- YoungJoo (Grace) Jeon
- Mark Schaller
- Mark Joseph
- Ben Nash
- Bryan Gibson
- John Umbaugh
- Tunay Gur
- Jahna Otterbacher
- Arzucan Ozgur
- Li Yang
- Anthony Fader
- Joshua Gerrish
- Stephen Hufnagel
- Dr. Igor Markov
- Nayeoung Kim
- Pradeep Muthukrishnan
- Vahed Qazvinian
- Paul Hartzog
- Chen Huang
- Samantha Boylan
- Richard Caneba
This work has been partially supported by the National Science Foundation grant "Collaborative Research: BlogoCenter - Infrastructure for Collecting, Mining and Accessing Blogs", jointly awarded to UCLA and UMich as IIS 0534323 to UMich and IIS 0534784 to UCLA and by the National Science Foundation grant "iOPENER: A Flexible Framework to Support Rapid Learning in Unfamiliar Research Domains", jointly awarded to UMd and UMich as IIS 0705832.
About the Data
The ACL Anthology Network was built from the original pdf files available from the ACL Anthology. Using open source OCR technologies, in-house clean-up scripts, and often tedious manual labor, a web interface was developed that allowed for the annotation of individual references from each paper. A team of student research assistants manually matched references to existing ACL ID's returned using a keyword matching algorithm. Those citations deemed to refer to ACL papers but which were not automatically matched were marked for post-processing.
Publications using the AAN data
- Dragomir R. Radev, Pradeep Muthukrishnan, and Vahed Qazvinian. The ACL anthology network corpus. In Proceedings, ACL Workshop on Natural Language Processing and Information Retrieval for Digital Libraries, Singapore, 2009.
- Saif Mohammad, Bonnie Dorr, Melissa Egan, Ahmed Hassan, Pradeep Muthukrishan, Vahed Qazvinian, Dragomir R. Radev, and David Zajic. Generating surveys of scientific paradigms. In Proceedings of HLT-NAACL 2009, Boulder, CO, June 2009.
- Vahed Qazvinian and Dragomir R. Radev. The evolution of scientific title networks. In Proceedings of ICWSM 2009 poster session, San Jose, CA, 2009.
- Aaron Elkiss, Siwei Shen, Anthony Fader, Güneş Erkan, David States, and Dragomir Radev, Blind men and elephants: What do citation summaries tell us about a research article?, Journal of the American Society for Information Science and Technology, 59(1):51-62, 2008.
- Vahed Qazvinian and Dragomir R. Radev. Scientific paper summarization using citation summary networks. In COLING 2008, Manchester, UK, 2008.
- Steven Bird, Robert Dale, Bonnie Dorr, Bryan Gibson, Mark T. Joseph, Min-Yen Kan, Dongwon Lee, Brett Powley, Dragomir R. Radev, and Yee Fan Tan. The ACL anthology reference corpus: a reference dataset for bibliographic research. In LREC, Marrakesh, Morocco, May 2008.
Using the Data
To use this data, please follow the following guidlines:
- For research only.
- Do not re-distribute.
- If you publish using this work, you should acknowledge its creators. Please use the following bibtex:
@inproceedings{Radev&al.09a, author = {Radev, Dragomir R. and Muthukrishnan, Pradeep and Qazvinian, Vahed}, title = {The {ACL} Anthology Network Corpus}, year = "2009", address = "Singapore", booktitle = "Proceedings, ACL Workshop on Natural Language Processing and Information Retrieval for Digital Libraries" }@article{Radev&al.09, author = {Dragomir R. Radev, Mark Thomas Joseph, Bryan Gibson, Pradeep Muthukrishnan}, year = "2009", title = {{A} {B}ibliometric and {N}etwork {A}nalysis of the field of {C}omputational {L}inguistics}, journal= {Journal of the American Society for Information Science and Technology}, publisher = {John Wiley & Sons} } - Please inform us if you publish as we are interested in the output of this work.
A Note About the PageRank Centrality
Because of the nature of PageRank values, we have adjusted the results to make them more human readable. The actual value of any PageRank on this website can be found by dividing the numbers given by 1,000,000. We also truncate the decimal points, leaving instead only the integer value. So, for example, if a paper has a computed PageRank of 0.003456789, We would print that PageRank as 3456 after dropping the .789.
Contact
If you have any questions regarding this website or its contents, or have comments, misspellings, or other general mistakes to report, please fill out the form on our Contact Us page.