Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight

在线阅读 下载PDF 导出详情
摘要 Withtheadventofthebigdataera,theamountsofsamplingdataandthedimensionsofdatafeaturesarerapidlygrowing.Itishighlydesiredtoenablefastandefficientclusteringofunlabeledsamplesbasedonfeaturesimilarities.Asafundamentalprimitivefordataclustering,thek-meansoperationisreceivingincreasinglymoreattentionstoday.Toachievehighperformancek-meanscomputationsonmodernmulti-core/many-coresystems,weproposeamatrix-basedfusedframeworkthatcanachievehighperformancebyconductingcomputationsonadistancematrixandatthesametimecanimprovethememoryreusethroughthefusionofthedistance-matrixcomputationandthenearestcentroidsreduction.Weimplementandoptimizetheparallelk-meansalgorithmontheSW26010many-coreprocessor,whichisthemajorhorsepowerofSunwayTaihuLight.Inparticular,wedesignataskmappingstrategyforload-balancedtaskdistribution,adatasharingschemetoreducethememoryfootprintandaregisterblockingstrategytoincreasethedatalocality.Optimizationtechniquessuchasinstructionreorderinganddoublebufferingarefurtherappliedtoimprovethesustainedperformance.Discussionsonblock-sizetuningandperformancemodelingarealsopresented.Weshowbyexperimentsonbothrandomlygeneratedandreal-worlddatasetsthatourparallelimplementationofk-meansonSW26010cansustainadouble-precisionperformanceofover348.1Gflops,whichis46.9%ofthepeakperformanceand84%ofthetheoreticalperformanceupperboundonasinglecoregroup,andcanachieveanearlyidealscalabilitytothewholeSW26010processoroffourcoregroups.Performancecomparisonswiththepreviousstate-of-the-artonbothCPUandGPUarealsoprovidedtoshowthesuperiorityofouroptimizedk-meanskernel.
机构地区 不详
出版日期 2019年01月11日(中国期刊网平台首次上网日期,不代表论文的发表时间)
  • 相关文献