中文摘要多源性基因體學(metagenomics)是近年來快速發展的一門研究領域,其目的是研究環境中微生物。而由於高通量定序技術的發展,更促使多源性基因體學的快速進展,使得以定序為主的多源性基因體學 (sequence-based metagenomics)的方法廣泛用於多源性基因體學的研究上。以定序為主的多源性基因體學:直接從環境樣本中萃出的多源性基因體會被定序,經由定序所得到的DNA片段(稱為reads)會更進一步地被組合成contigs等較長的片段再更進一步作序列分析,或是藉由以基因為中心的方法直接進行序列分析。blastx廣泛被應用在以基因為中心的方法來分析多源性基因體。然而,blastx的敏感度和特異度偏低,而大多數的研究人員無法完全意識到這個缺點,因此,我們提出了數個問題來檢測主要影響blastx準確度的參數。為了模擬以序列分析為主的多源性基因體研究,我們取得Escherichia coli BL21-Gold(DE3)pLysS AG基因體並在電腦上模擬高通量定序法,並進行blastx。我們測試了序列長度、不同期望值、不同原核生物基因密度對blastx表現的影響,目前的結果顯示序列長度、期望值、原核生物基因密度都是影響blastx敏感度和特異度的重要因素。 外文摘要Metagenomics is a discipline that studies environmental microbes. Sequence-based methods are widely adopted in metagenomics. In sequence-based metagenomics, genomes are extracted from environmental samples and DNA fragments are then sequenced. The sequenced DNA fragments (reads) are subjected to be further assembled to generate contigs for further analyzing or they can be directly analyzed by gene-centric methods, in which functional genes are annotated of reads and species diversity postulated. Blastx is a tool widely adopted in gene-centric analyzed. However, its sensitivity and specificity are usually low and most of researchers are not fully aware of the drawbacks. Therefore, we address several questions to check potential parameters that affect the performance of blastx. To mimic sequence-based metagenomic studies, we used Escherichia coli BL21-Gold(DE3)pLysS AG genome and simulated high-throughput sequencing in silico followed by blastx analysis. We first tested whether sequence lengths affect blastx performance. Second, we tested whether e-value affect blastx performance. Third, we checked whether the gene density is a critical factor to affect blastx performance. Recent results showed that sequence lengths, e-values and gene densities are important factors to affect the sensitivity and the specificity of blastx.