植基於本體論之文件摘要系統之研究－以中文股市新聞為例

THUIR > College of Engineering > Department of Computer Science and Information > Master's Theses > Item 310901/5035

Please use this identifier to cite or link to this item: http://140.128.103.80:8080/handle/310901/5035

Title:	植基於本體論之文件摘要系統之研究－以中文股市新聞為例
Other Titles:	A Study on Ontology-based Document Summarization System for Chinese Stock News
Authors:	徐銘忠 Hsu, Ming-Chung
Contributors:	呂芳懌 Leu, Fang-Yie 東海大學資訊工程學系
Keywords:	文件摘要;Extraction;Abstraction;本體論 Document summarization;Extraction;Abstraction;Ontology
Date:	2004
Issue Date:	2011-05-19T07:32:41Z (UTC)
Abstract:	隨著網際網路之快速發展，人們取得資訊的管道也越來越方便，但也因此造成資訊過量(Information Overloading)及使用者不知如何面對龐大資料的問題，如何有效率且快速地取得正確所需的資訊，已成為資訊領域一項重要的課題。文件摘要(Document Summarization)技術，正好可用來過濾文章內不重要之訊息，提供較簡潔的資訊內容，方便人們在短時間內快速閱讀以尋求所需要的資訊，俾進一步的深入閱讀全文資訊，因此成為近些年來資訊探索的重要研究方向之一。文件摘要技術計有兩種作法：摘錄(Extraction)和摘要(Abstraction)。以往文件摘要的研究，多以採取單一作法為主，本文提出以本體論(Ontology)建立股市新聞方面之領域知識(Domain knowledge)，再以AFE(Abstraction From Extraction in a domain-specific, AFE)做兩段式之摘要方法。其做法是，首先利用統計方法計算每一篇文章中各個句子的權重，並依權重高低排序，以取出其中權重較高的句子作為特徵語句(Feature Sentences)；再將特徵語句中所含之詞組與其詞性，依句型樣板(Sentence Pattern)重新組合成語句，淬取出文章之精華，當做摘要之結果，提供是否閱讀全文之參考，以利使用者能快速的吸收及尋找所需要之資訊。 Under the rapid evolution of the Internet, people can conveniently gather the information needed by using browsers. This results information overloading and users do not know how to deal with such a massive data. So how to get correct information efficiently and effectively becomes an important issue. However, document summarization technologies are capable of providing concise and compact content by filtering redundant and less important information existing in the document with thm, people can catch the key meaning of a document in a very short period of time rather then spend a lot of time to read the full text. Thus they have attracted the researcher’s eyes, especially in the area of information retrieval. Conceptually, document summarization techniques can be classified into two classes：Extraction and Abstraction. In the past, most of the researches focus on only one of them. In this paper, we propose a combination of the two classes named Abstraction From Extraction(AFE) in a specific domain based on domain ontologies. In this combination, extraction is performaced first, by invoking statistical methods to rank each sentence in the document concerned. The sentences with the highest ranks are the feature sentences of the document. The structures of the most important feature sentences selected are then compared with sentence patterns previously prepared based on the characteristics of the domain concerned. Those matched the sentence patterns will be summarized providing users to decide whether they want to read the full text of the document or not. Users can then save their time to choose the correct information.
Appears in Collections:	[Department of Computer Science and Information ] Master's Theses

Files in This Item:

File	Size	Format
092THU00394012-001.pdf	2552Kb	Adobe PDF	0	View/Open

Loading...