|
English
|
正體中文
|
简体中文
|
Items with full text/Total items : 21921/27947 (78%)
Visitors : 4231564
Online Users : 657
|
|
|
Loading...
|
Please use this identifier to cite or link to this item:
http://140.128.103.80:8080/handle/310901/3827
|
Title: | ?用群集分析與最適判定鄰近法結合?線距?在特徵辨?問題上的應用 |
Other Titles: | Using Cluster and DANN with Tangent Distance on Character Recognition Problems |
Authors: | 陳嘉惠 Chen, Jia-Huei |
Contributors: | 鄭順林 Jeng, Shuen-Lin 東海大學統計學系 |
Keywords: | 群集分析;切線距離;最適判定鄰近法;特徵辨識 Cluster Analysis;Tangent distance;Discriminant adaptive nearest neighbor;Character recognition |
Date: | 2006 |
Issue Date: | 2011-05-06T03:47:30Z (UTC)
|
Abstract: | 隨著時代的進步,辨?的技術已經到達?某種出色的成果;在一些分辨結果的顯示甚至是比由人?肉眼辨?出?的結果還要正確。就統計?域上?看,辨?的問題其實是一種分?上的問題。在我們的研究中,除?希望有一定的分?正確?外,最主要的目的是要去增加分?上的速?,也就是減少分?上所花費的時間。所以在?文中,我們運用?Simard et al. (1993) 提出的一個經由轉換後所得到的距?,命名為 「?線距?」 (Tangent Distance, TD);此方法主要的概?是經由?同的七種轉換,像是x-translation、y-translation、rotation、scaling、parallel hyperbolic transformation、diagonal hyperbolic transformation、thickening,由這些?線向?去找出?個圖像中最短的距?。而另外運用到的?個方法則是「最適判定鄰近法」(Discriminant Adaptive Nearest Neighbor, DANN),以及群集分析。使用這?個分析方法與TD的結合提出一些有效的分?方法,使得可以增加我們的分?速?。 ?文中,我們使用?一個典型的手寫?字辨?資?集?進?所提出的新方法。此資?是?自於美國郵件上的手寫郵遞區號 (ZIP data),它是以16*16的灰階影像呈現,而此資?集包含?7291筆訓?資?和2007筆測試資?。在我們新方法所得到的分?結果中,最好的誤判?為2.59%;其次是3.93%,當中所花費的分?時間,對於每個?字??,只用到?0.43秒。而最快的分?時間,對於每個?字??,則是只花?0.0159秒,而其誤判?則只上升?一些,為4.18%。 Following the advancement of epoch, the recognizing technology has attained excellent result already. Some results are even more formidable than human recognition ability. In the field of statistics, the recognizing question is a kind of classification problem actually. In this research, the main purpose is to have recognizing ability up to certain level and to decrease the time spending on recognition at the same time. For character recognition problems, we consult several methods to proceed in this thesis. One of those methods that we follow closely is “ Tangent Distance (TD)“ proposed by Simard et al. (1993). The key idea is to estimate the minimum distance between two patterns by using different direction of tangent vectors. These tangent vectors include x-translation, y-translation, rotation, scaling, parallel hyperbolic transformation, diagonal hyperbolic transformation and thickening. Another method we considered is “Discriminant Adaptive Nearest-Neighbor “ (DANN) proposed by Hastie and Tibshirani (1996). They consider the variation in every clusters of observations near the target point to find outdirection of variation. Combining the idea of TD and DANN, we use also different linkage clustering methods to reduce prototypes and retain representatives from training data set. We combine these concepts and propose some new methods.Here, we use ZIP data set which is one of typical handwritten digit recognition data set to demonstrate our methods. The data set comes from handwritten zip codes that appeared on some envelopes of U.S. mail passing through the Buffalo, NY post office. The digits were written by many different people with a great variety of writing styles and instruments. Each digit is converted into a 16 by 16 pixel image after some preprocessing. There are 7291 training data and 2007 testing data in the data set. By our new methods that we proposed, the best prediction error is 0.0259. The second best prediction error is 0.0393 and the classification time spends 14.25 minutes for classifying 2007 digits. That is 0.43 second for each digit. For another faster processing method we proposed, it only took 32 seconds to classify 2007 digits. That is 0.0159 second for each digit. However, the error rate increases a little bit to 0.0418. |
Appears in Collections: | [統計學系所] 碩博士論文
|
Files in This Item:
File |
Description |
Size | Format | |
094THU00337001-001.pdf | | 692Kb | Adobe PDF | 353 | View/Open | 094THU00337001-002.pdf | | 69Kb | Adobe PDF | 137 | View/Open |
|
All items in THUIR are protected by copyright, with all rights reserved.
|