Abstract: | 中風是全台十大死因之一,也是造成人們失能的主要原因,且只有百分之十五的人可以完全康復,因此本研究欲利用全民健保資料庫來整理與分析,希望能找尋一些與中風相關的疾病模型,和相關之疾病關係圖。本研究使用了GA-PLS的方法,提高準確度,並增加了關聯規則和貝氏網路,用常在商業中探討商品之間的關聯規則,替換成疾病,探討疾病之間的關係,之後再建構有向無循環圖,套到貝氏網路中計算該圖準確度為多少,增加說服力。針對的目標為較少樣本的缺血性中風轉出血性中風,或者是較多樣本的中風的轉化因素,探討在不同大小的的資料中,是否都能有個很好的結果。資料是使用2000年到2013年健保資料庫中病人看病結果的資料,根據ICD-9-CM編碼篩選出有得過中風和沒得過中風的病患,有得過中風病患病人的資料只擷取當他們得到中風前的看診紀錄,用疾病當作變數,找尋病患患有那些疾病時較容易得到中風。最後結果為當使用缺血轉出血的資料時,GA-PLS的預測率可達0.855,其中會影響的疾病有頭暈、便祕、慢性腎衰竭、高血壓、糖尿病、高脂血症、焦慮、肌肉痛、前列腺肥大等,使用中風的資料時,GA-PLS的預測率可達0.828,在貝氏網路中平均ROC曲面下面積為0.8274其中會影響的疾病有腸胃疾病、其他上呼吸道疾病、皮膚疾病、口腔唾液腺及頷骨之疾病、高血壓和心臟病等。將結果和目前醫學所公布的相關疾病相比,可以發現找出來的這些疾病有符合醫學結果,且其中還有發現一些目前不在這其中的疾病,但他們又呈現高度相關,如缺血轉出血的相關影響疾病有頭暈、便祕、慢性腎衰竭等,中風的相關影響疾病有口腔唾液腺及頷骨之疾病、食道胃及十二指腸之疾病等,後續可以針對這些疾病作更深入的研究,探討這些疾病是因為甚麼原因導致他們之間是有相關的。 Stroke is one of the top ten causes of death in Taiwan, and it is also the main cause of people's disability, and only 15% of people can fully recover. Therefore, this study wants to use the National Health Insurance database to organize and analyze, hoping to find some disease model associated with stroke, and a related disease map.This study used the GA-PLS method to improve accuracy, and increased association rules and Bayesian networks. We used the rules of association between commodities, which often use in in business, and replaced them with various diseases, explored the relationship between every diseases. Then we create the directed non-cyclic graph, and calculate the accuracy of the graph in the Bayesian network to increase the persuasiveness.The target is less sample of ischemic stroke to hemorrhagic stroke, or a more sample of stroke conversion factors, to explore whether there are good results in different sizes of data.The data is based on the results of patient visits in the health care database from 2000 to 2013. We pick out the patient who had stroke before and who had not, according to the ICD-9-CM code. The data of patient who had stroke before, we only pick their visit record before they had stroke, using the disease as a variable, it is easier to find out which disease they have that will make them easier to get stroke.The final result is that when using ischemic hemorrhage data, the predictive rate of GA-PLS can reach 0.855, among which the diseases that may be affected are dizziness, constipation, chronic renal failure, hypertension, diabetes, hyperlipidemia, anxiety, muscle pain, prostatic hypertrophy, etc. When using the stroke data, the prediction rate of GA-PLS can reach 0.828, and the area under the ROC curved surface in the Bayesian network is 0.8274. The diseases that will be affected are gastrointestinal diseases, other upper respiratory diseases, and skin diseases, diseases of the oral salivary glands and tibia, hypertension and heart disease. Comparing the results to the related diseases published by the current medical science, it can be found that the diseases found are in line with medical results, and some of the diseases that are not currently found in list, but they are highly correlated to ischemic hemorrhage. Such as dizziness, constipation, chronic renal failure, etc. The related diseases affecting stroke include diseases of oral salivary glands and tibia, diseases of the esophagus and the duodenum, etc., and further research work can be conducted to explore what cause these diseases to be related. |