Digital Language Resources
- Regularly integrate ILAS personnel's demands for language resources.
- Assist in handling text-related technology.
- Help with sustainable maintenance and development of ILAS's various language data and resources.
- Handle the display issue of Academia Sinica's XXZT Tangut font.
- Organize data storage and backup.
History and Goals
When ILAS was established in 1997, three research groups – including "Text Corpora Research Group"– were formed to build a common tool platform and to facilitate resource sharing. "Text Corpora Research Group", originally a research group at Academia Sinica's Institute of History and Philology. At the beginning, members of the Research Group collaborated with research fellows of Institute of Information Science and technical personnel of Computing Center, Academia Sinica to promote not only the construction of corpora and lexical databases, but also various corpus-based linguistic studies. "Text Corpora Research Group" was changed into "Research Group of Corpus and Computational Linguistic Research Group" when ILAS reorganized research groups at the end of 2009 in response to the increasing prevalence of interdisciplinary linguistic studies. Collecting language data is an inseparable part of linguistic fieldwork, sociolinguistic survey, cognitive neurolinguistic research or the development of language technology. With a rapid development in software, hardware and technology in information science, many linguistic research involve the compilation of corpora systematically and studies in corpus linguistics. The sharing of corpus resources is gaining more focus in linguistic studies. The simulation and calculations of computational linguistics have provided tools for verification and finding rules, facilitating the development of various linguistic issues. The "Research Group of Corpus and Computational Linguistic Research Group" was transformed into the "Digital Language Resources" in 2020 with the goal to sustainably preserve and maintain ILAS's corpora and related linguistic data.