本研究擬整合傳統資料庫及能處理大量文字資料的文字資料庫,以建立一個竊盜犯罪偵查資料庫系統,該系統內含竊盜犯罪案件資料庫、屬性文章-有效詞矩陣及相似性比對模組。竊盜犯罪案件資料庫是一個關聯式資料庫,其中記錄著每位前科犯之犯罪事實。而犯罪偵查資料庫系統內部所儲存之資料都是警務人員從事犯罪偵查、分析、查訪、約談、破案及法院定罪的重要依據。當刑案發生時,警務人員得以現場偵查記錄及相關人員之約談筆錄,利用文字資料庫之技術,比對犯罪手法及失贓證物相似之前科犯,列為優先偵查對象。本研究係在這個前題下,整合詞庫斷詞與統計式斷詞的技術,從事筆(記)錄之斷詞,並建立屬性文章-有效詞矩陣以為各犯罪案件之語意表示,最後以相似性比對模組比對兩竊盜案件之相似性。在文章相似性比對方面,則融入了向量空間模式、屬性文章中各有效詞之權重、竊盜犯案手法與失贓證物之權重組合等技術,俾擷取竊盜犯案手法及所偷竊之贓物相似程度較高的嫌疑人,以為員警從事犯罪偵查的參考。 In this article, we integrate traditional database and text database that can process mass of data to implement a criminal investigation database system for theft. The system consists of criminal events database, attribute-term matrix and similarity comparison module. The criminal events database, a relational database, keeps the criminal facts for each lawbreaker. Policeman can retrieve previously-recorded people who have used the similar criminal model and stolen the similary boody as those of the underlying event from the text database as the possible suspects to be investigated. We also promote a new way to separate sentences of a description into terms by looking up a term dictionary and involving a statistical theory. An attribute-keyword matrix is then constructed as the semantics model of the investigation system. Besides, a vector space model, term weighting and the composed weighting are used to evaluate the similarities between the boody and criminal model of a criminal record and underlying event. Therefore, police can first investigate the N suspects who commit most similar suspects suggested by the system so as the investigation resources and human emergy can be dragmatically saved. Since thefts often use the same criminal model to steal similar boody.