使用異質叢集電腦預測脊椎動物基因啟動區

Tunghai University Institutional Repository > 工學院 > 資訊工程學系所 > 碩士論文 > Item 310901/5480

Please use this identifier to cite or link to this item: http://140.128.103.80:8080/handle/310901/5480

Title:	使用異質叢集電腦預測脊椎動物基因啟動區
Other Titles:	Predicting Vertebrate Promoters with Heterogeneous Cluster Computing
Authors:	楊倫倪 Yang, Lun-Ni
Contributors:	呂芳懌 Leu, Fang-Yie 東海大學資訊工程學系
Keywords:	啟動區預測;生物資訊;叢集電腦;K-gram Promoter Prediction;Bioinformatics;Cluster Computing;K-gram
Date:	2005
Issue Date:	2011-05-19T08:18:56Z (UTC)
Abstract:	DNA上的啟動區（promoter）通常位於轉錄起始點transcriptional starting site（TSS）的上游。藉由預測啟動區，我們可以間接的預測TSS的位置。到目前為止，所有儲存於GenBank上經過確認的啟動區都是分子生物學家經由實驗獲得的結果。但是經由分子生物實驗既浪費時間又浪費金錢。一些研究人員發展高效能的預測基因工具用來處理未知的基因功能和未知的DNA序列。不過，這些工具通常獲得較低的準確率或是產生很多的錯誤預測。因此，產生的結果很難作為基因分析鑑定的結果。本篇文章提出一個脊椎動物啟動區預測系統Vertebrate Promoter Prediction System（VPPS），VPPS運用統計的技巧提供預測脊椎動物啟動區的一個方法。我們分析包含啟動區的DNA序列，並試圖找出啟動區特有的基因片段或是已知的轉錄因子結合位置。VPPS是一個以基因為基礎並且可以萃取核酸長度6-20bp的啟動區和非啟動區特有序列的程式。透過加權矩陣（Weighed-Matrix），我們可以方便的運用已經找到的啟動區和非啟動區特有的序列，來精確的預測未知DNA序列中是否包含啟動區及其所在位置。並且，我們使用叢集電腦來加速加權矩陣及剖析K-gram的運算。根據實驗結果顯示，與其他的預測工具相比較，VPPS可達到更好的準確度，並降低了很多錯誤預測，也大幅地減少了程式花費。 Promoter is a DNA sequence usually locating on the upstream of a transcriptional starting site (TSS). To identify a promoter, we can indirectly position the potential TSS and then recognize the possible promoter region. Up to present, all known promoters deposited in the GenBank were identified by molecular biologists in their biological laboratories. However, experimental discovery is costly and time-consuming. Some researchers exploit high throughput analytical bioinformatics tools to deal with functionally unknown and newly unveiled DNA sequences. Nevertheless, these tools often produce low true positives and/or high false positives. Therefore, their results are hard to consult as gene identification. This article proposes a system, named Vertebrate Promoter Prediction System (VPPS), which employs a new approach to predict vertebrate promoters by using statistical techniques. We analyze a putative promoter sequence by investigating the presence of short promoter-specific sequences and known transcription factor binding sites. A gene-based consensus sequence-extracting program (GCSEP) is developed to extract promoter and non-promoter specific patterns, 6-20 bps in length. By applying a weighed-matrix, we can more easily manipulate the extracted patterns and accurately predict whether and where an unknown DNA sequence contains promoters. Furthermore, cluster computing is deployed to accelerate the weighed-matrix manipulation and K-gram parse. Our experimental results show that the VPPS has better true positive and lower false positive rates than other prediction tools.
Appears in Collections:	[資訊工程學系所] 碩士論文

Files in This Item:

File	Size	Format
index.html	0Kb	HTML	194	View/Open

Loading...