中国科学院自动化研究所   设为首页   加入收藏  联系我们
 
English
网站首页     实验室概况     研究队伍     组织机构     学术交流     科研成果     人才培养     开放课题     创新文化     资源共享     联系我们
    学术讲座

2012年7月5日:模式识别系列讲座

模式识别系列讲座

Lecture Series in Pattern Recognition

题    目(TITLE):Applying NLP Technologies to the Collection and Analysis of Language Data to Aid Linguistic Research
讲 座 人(SPEAKER): Dr. Fei Xia,the University of Washington (UW)
主 持 人 (CHAIR):Prof. Chengqing Zong
时    间 (TIME):10:00AM, JULY 5 (Thursday), 2012
地    点 (VENUE):1115 Meeting Room 

 

报告摘要ABSTRACT):

As a vast amount of language data has become available electronically, linguistics is gradually transforming itself into a discipline where science is often conducted using corpora. In this talk, we review the process of building ODIN, the Online Database of Interlinear Text, a multilingual repository of linguistically analyzed language data. ODIN is built from interlinear text that has been harvested from scholarly linguistic documents posted to the Web, and it currently holds more than 200,000 instances of interlinear text representing annotated language data for more than 1,000 languages (representing data from more than 10% of the world's languages). ODIN's charter has been to make these data available to linguists and other language researchers via search, providing the facility to find instances of language data and related resources (i.e., the documents from which data was extracted) by language name, language family, and even linguistic constructions. Further, we have sought to enrich the collected data and extract "knowledge" from the enriched content. This work demonstrates the  benefits of using natural language processing technology to create resources and tools for linguistic research, allowing linguists to have easy access not only to language data embedded in existing linguistic papers, but also to automatically generated language profiles for hundreds of languages.

 

报告人简介(BIOGRAPHY)

Fei Xia is an Associate Professor at the Linguistics Department at the University of Washington (UW) and an adjunct faculty at the Department of Biomedical Informatics and Medical Education at the UW Medical School. Her research covers a wide range of NLP tasks including morphological analysis, part-of-speech tagging, grammar extraction and grammar generation, treebank development, machine translation, information extraction, and bio-NLP. Her current research focuses on building NLP systems that combine linguistic knowledge and machine learning techniques. She is also interested in collecting data and building tools to assist linguistic study. Her work is supported by several grants from NSF, NIH, IARPA, Microsoft, and UW, including the prestigious NSF CAREER Award.

Fei Xia received her Bachelor's degree from Peking University, and Ph.D. from the University of Pennsylvania (UPenn). At UPenn, she led the effort in building the Chinese Penn Treebank, which currently has 1.2 million words and is one of the most commonly used corpora for Chinese NLP. After graduation, she worked at the IBM T. J. Watson Research Center at Yorktown Heights, New York before joining UW.

 

 

承办单位:模式识别国家重点实验室 

 

 

友情链接
 
中科院自动化研究所 模式识别国家重点实验室 事业单位  京ICP备14019135号-3
NLPR, INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES