Author Identification
http://www.uni-weimar.de/medien/webis/research/events/pan-13/pan13-web/autho... Author Identification Authorship attribution is an important problem in many areas including information retrieval and computational linguistics, but also in applied areas such as law and journalism where knowing the author of a document (such as a ransom note) may be able to save lives. The most common framework for testing candidate algorithms is a text classification problem: given known sample documents from a small, finite set of candidate authors, which if any wrote a questioned document of unknown authorship? It has been commented, however, that this may be an unreasonably easy task. A more demanding problem is author verification where given a set of documents by a single author and a questioned document, the problem is to determine if the questioned document was written by that particular author or not. This may more accurately reflect real life in the experiences of professional forensic linguists, who are often called upon to answer this kind of question. Given a small set (no more than 10, possibly as few as one) of "known" documents by a single person and a "questioned" document, the task is to determine whether the questioned document was written by the same person who wrote the known document set. One problem comprises a set of known documents by a single person and a questioned document. There will be several such problems covering English, Greek, and Spanish (about 20 cases per language) and a varying number of known documents (1-10). All documents within a single problem will be in the same language and best efforts will be applied to assure that within-problem documents are matched for genre, register, theme, and date of writing. The documents will possibly be fragmentary, with a minimum length of 1,000 words. View details ; Download data (Release by mid-December) Participants are asked to provide a simple "yes/no" binary answer for each problem. Grading will be based on the percentage of correct answers. Beyond the accuracy on the entire corpus, separate rankings will be provided for the subsets of problems for each language. In addition, participants may also provide a score, a real number in the set [0,1] inclusive, where 0 corresponds to NO and 1 to YES. In that case, ROC curves will be produced and the area under the curve will be used to grade participant systems. We refer you to: PAN @ CLEF'12 (overview paper), PAN @ CLEF'11 (overview paper), Patrick Juola. Authorship Attribution. In Foundations and Trends in Retrieval, Volume 1, Issue 3, December 2006. Moshe Koppel, Jonathan Schler, and Shlomo Argamon. Computational Methods Authorship Attribution. Journal of the American Society for Information Science and Technology, Volume 60, Issue 1, pages 9-26, January 2009. Efstathios Stamatatos. A Survey of Modern Authorship Attribution Methods. of the American Society for Information Science and Technology, Volume 60, Issue 3, pages 538-556, March 2009. We ask you to prepare your software so that in can be executed via a command line. However, you can choose freely among the available programming languages and among the operating systems Microsoft Windows 7 and Ubuntu 12.04. We will ask you to deploy your software onto a virtual machine that will be made accessible to you after registration. You will be able to reach the virtual machine via ssh and via remote desktop.
participants (1)
-
Eugen Leitl