1. Introduction to Language Resources
(1) Properties of the Resources
* Definition of Language Resource
Language resource is the spoken and written languages that are the processed and developed natural language, which is the collection of pronouncements resulting from human language activities, into formats that can be decoded by a machine (computer). Corpus, which stores and organizes diverse language data; CoreNet, which classifies each word according to its meanings and arranges them according to their relations; and electronic dictionaries used for internal computer tasks in natural language processing are a few examples of the resource.
*High Usability of Language Resource
Once a language resource is digitized and stored as a computer file or in DB form, easy replaying, sending, processing, and reusing of the resource are guaranteed. Therefore, it can be considered to have a comparatively higher usability than other resources.
* Specialized Language Processing Ability Required
Language is a crucial resource that urgently needs to be exchanged and distributed as information technology develops further. A variety of language information is now being circulated through the web. Therefore, language specialists with knowledge in Korean or in linguistics should get involved in the stages of development, from the resource construction to multi-lingual processing, linguistic analysis, and natural language processing stages.
* The Area Requires Long-term Approaches
The construction of language resources is a task that progresses gradually, particularly compared to developments in other areas that require focused capital and manpower. Considering the fact that in reality it takes approximately 10 years to compile a language dictionary, and additional time and effort for follow-up tasks such as collecting and adding neologisms, it can be expected that constructing other language resources would take a similar amount of time and effort. However, the pervasive effect that will be produced when the resources thus constructed are effectively used makes the resources worthy of the efforts.
* Constant Improvement Required to Meet the Demands of the Changing Time
Like an organism, language evolves by continuously being created, changed, and destroyed. To retain its value as a resource, language needs to be supplemented so that it can adapt to changing circumstances as time changes. The simple multi-lingual information format language resources that are currently being used separately in each specialized area and in each country need to evolve again to fit the ubiquitous computing environment in the upcoming ubiquitous era.
(2) Purpose of the BORA
* Functions as the Basis for a Knowledge Information System
Language resources are used widely as the basis for knowledge information-based systems such as information search, machine translation, information extraction, automatic summarizing, QA system, and knowledge search and knowledge processing. Let us imagine a robot talking to a foreigner. The robot searches for information relevant to the person’s question, finds an answer through a conversation system, and generates sentences with machine-translated words. In all of these processes, language resources are used as the fundamental resources.
* Essential Component in Natural Language Processing
Let us use a parser, a core program used in most natural language processing systems, as an example. To develop the parser, co-occurrence information is extracted from a raw corpus, a morphological analyzer probability model is learned, and co-occurrence information is extracted from a morphological analysis corpus and used. Likewise, to develop a single piece of software, a number of different language resources are needed.
* Fundamental Resource for the Internet Industry
The main concepts of the next-generation network environment include the next generation networking (NGN) and semantic web. The semantic web was introduced for the purpose of assigning semantics to data on–line, and to create a logical web using the relations between the meanings. Currently, global research on web ontology OWL, Rule, Logic, and Inferencing, and standardizations are being actively carried out to enable automatic inter-communication between machines. Although the concept of ontology has a long history, its application in computer science is relatively new. Thus, it is an important high-quality language resource used in semantic web-based technology which demands vigorous development.
Considering the possibility of natural language processing systems being applied not only in computer, network, and information technologies but also in robotics, biotics, space science, and other future industries as their foundation or on an applied basis, a language resource’s range of use could be immensely broad.
2. The Bank of Resource for Language and Annotation
(1) Operation Objectives and Direction
* Constant Maintenance of Language Resource and Resource Users
The goal of the BORA is to diligently prepare for the construction of a high-quality language resource, a core resource in the highly information-oriented society, for applications in the language information industry. To achieve these goals, the bank will adopt mid-to-long term strategies for the development of language resources, effectively and systematically maintain and provide the research results to members and users, and actively promote the correct conditions for further research and development by hosting seminars and technology presentations.
* Develop Consumer-Oriented Service
With the publication of 17 volumes of language-related publications and the provision of research results, the bank has focused on educational and research services that motivate researchers to pursue further studies and enable the public to use educational materials. The bank aspires to further strengthen public awareness on language resources and to activate the services users demand. Currently, the bank provides 10% of the resources it hold on CD as a sample for free. It is adopting strategic goals to continuously reflect user opinions, encouraging participation in user groups, and setting price policies for users.
*Domestic and International Channels
Connected with various foreign language research organizations, BORA plays the role of a supply channel for domestic and international exchanges and circulation of language resources. It aspires to conduct diverse research activities, including the maintenance of a collaborative response system for language processing technology development through domestic and international channels, and actively organizing international academic conferences.
*Responses to International Standardization
As an ISO/TC37 member country, Korea is in an advantageous position, and therefore, with the long-term objective of standardizing language resources, the country will participate in and lead international standardization.
* Provide Resources Respective to Each Industrial Objective
In order to implement resources in industries, the BORA aspires to make the resources usable for different objectives through constructing a cooperative system between the industrial and academic sectors, and by promoting the development and distribution of customized research materials.
- - As stated above, the bank operates as an information provider and a language maintenance and distribution center, with clear objectives and direction.
(2) Operation System