English  |  正體中文  |  简体中文  |  Items with full text/Total items : 21921/27947 (78%)
Visitors : 4238354      Online Users : 375
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version


    Please use this identifier to cite or link to this item: http://140.128.103.80:8080/handle/310901/26510


    Title: 在Xeon Phi上使用平行迴圈自我排程改善工作負載平衡
    Other Titles: Improvement of Workload Balancing Using Parallel Loop Self-Scheduling on Xeon Phi
    Authors: 黃昭為
    HUANG CHAO-WEI
    Contributors: 楊朝棟
    YANG CHAO-TUNG
    資訊工程學系
    Keywords: 平行迴圈;MPI;OpenMP;Many-core;Xeon Phi;自我排程
    Parallel Loop;MPI;OpenMP;Many-core;Xeon Phi;Self-Scheduling
    Date: 2015
    Issue Date: 2015-12-24T03:48:29Z (UTC)
    Abstract: 在本論文中,我們將研究如何改善計算機群的工作負載平衡,透過平行迴圈自我排程方法,我們使用混合MPI和OpenMP的C語言平行編程。根據計算節點性能權重為基礎分割迴圈的區塊。這個研究是使用Xeon Phi實施平行迴圈自我排程,藉由平行迴圈自我排程的特性改善異質節點之間的工作負載均衡。平行迴圈自我排程是由靜態排程和動態排程兩個部分所組成,在靜態的部分我們依照權重分配工作量的演算法,在動態的部分我們使用幾個知名的排程方法。Intel近年來推出它們的新產品Xeon Phi,它是類似x86架構的輔助處理器,它擁有大約60個核心且可以被當作單個計算節點,且擁有的計算能力不能忽視。在我們的實驗中我們將會使用多個計算節點。我們實驗四個應用,包括矩陣相乘、稀疏矩陣相乘、曼德博集合、電路滿足。結果將會列出使用平行迴圈自我排程,如何分配權重及排程方案能夠達到最好的性能。
    In this paper, we will examine how to improve workload balancing on a computing cluster by a parallel loop self-scheduling scheme. We use hybrid MPI and OpenMP parallel programming in C language. The block partition loop is according to the performance weighting of compute nodes. This study implements parallel loop self-scheduling use Xeon Phi, with its characteristics to improve workload balancing between heterogeneous nodes. The parallel loop self-scheduling is composed of the static and dynamic allocation. A weighting algorithm is adopted in the static part while the well-known loop self-scheduling scheme is adopted in the dynamic part. In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. In our experiment, we will use a plurality of computing nodes. We compute four applications, i.e., matrix multiplication, sparse matrix multiplication, Mandelbrot set computation, and the circuit satisfiability problem. Our results will show how to do the weight allocation and how to choose a scheduling scheme to achieve the best performance in the parallel loop self-scheduling.
    Appears in Collections:[資訊工程學系所] 碩士論文

    Files in This Item:

    File SizeFormat
    103THU00394021-001.pdf2376KbAdobe PDF193View/Open


    All items in THUIR are protected by copyright, with all rights reserved.


    本網站之東海大學機構典藏數位內容,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback