Thelargeamountofrepeats,especiallyhighcopyrepeats,inthegenomesofhigheranimalsandplantsmakeswholegenomeassembly(WGA)quitedifficult.Inordertosolvethisproblem,wetriedtoidentifyrepeatsandmaskthempriortoassemblyevenatthestageofgenomesurvey.Itisknownthatrepeatsofdifferentcopynumberhavedifferentprobabilitiesofappearanceinshotgundata,sobasedonthisprinciple,weconstructedastatisticalmodelandinferredcriteriaformathematicallydefinedrepeats(MDRs)atdifferentshotguncoverages.Accordingtothesecriteria,wedevelopedsoftwareMDRmaskertoidentifyandmaskMDRsinshotgundata.Withrepeatsmaskedpriortoassembly,thespeedofassemblywasincreasedwithlowererrorprobability.Inaddition,clone-insertsizeaffectstheaccuracyofrepeatassemblyandscaffoldconstruction.Wealsodesignedlengthdistributionofclone-insertsusingourmodel.Inoursimulatedgenomesofhumanandrice,thelengthdistributionofrepeatsisdifferent,sotheiroptimallengthdistributionsofclone-insertswerenotthesame.Thuswithoptimallengthdistributionofclone-inserts,agivengenomecouldbeassembledbetteratlowercoverage.