摘要
Withtheadventofthebigdataera,theamountsofsamplingdataandthedimensionsofdatafeaturesarerapidlygrowing.Itishighlydesiredtoenablefastandefficientclusteringofunlabeledsamplesbasedonfeaturesimilarities.Asafundamentalprimitivefordataclustering,thek-meansoperationisreceivingincreasinglymoreattentionstoday.Toachievehighperformancek-meanscomputationsonmodernmulti-core/many-coresystems,weproposeamatrix-basedfusedframeworkthatcanachievehighperformancebyconductingcomputationsonadistancematrixandatthesametimecanimprovethememoryreusethroughthefusionofthedistance-matrixcomputationandthenearestcentroidsreduction.Weimplementandoptimizetheparallelk-meansalgorithmontheSW26010many-coreprocessor,whichisthemajorhorsepowerofSunwayTaihuLight.Inparticular,wedesignataskmappingstrategyforload-balancedtaskdistribution,adatasharingschemetoreducethememoryfootprintandaregisterblockingstrategytoincreasethedatalocality.Optimizationtechniquessuchasinstructionreorderinganddoublebufferingarefurtherappliedtoimprovethesustainedperformance.Discussionsonblock-sizetuningandperformancemodelingarealsopresented.Weshowbyexperimentsonbothrandomlygeneratedandreal-worlddatasetsthatourparallelimplementationofk-meansonSW26010cansustainadouble-precisionperformanceofover348.1Gflops,whichis46.9%ofthepeakperformanceand84%ofthetheoreticalperformanceupperboundonasinglecoregroup,andcanachieveanearlyidealscalabilitytothewholeSW26010processoroffourcoregroups.Performancecomparisonswiththepreviousstate-of-the-artonbothCPUandGPUarealsoprovidedtoshowthesuperiorityofouroptimizedk-meanskernel.
出版日期
2019年01月11日(中国期刊网平台首次上网日期,不代表论文的发表时间)