摘要
Inthispaper,anovelframework,namedasglobal-localfeatureattentionnetworkwithrerankingstrategy(GLAN-RS),ispresentedforimagecaptioningtask.Ratherthanonlyadoptingunitaryvisualinformationintheclassicalmodels,GLAN-RSexplorestheattentionmechanismtocapturelocalconvolutionalsalientimagemaps.Furthermore,weadoptrerankingstrategytoadjustthepriorityofthecandidatecaptionsandselectthebestone.TheproposedmodelisverifiedusingtheMicrosoftCommonObjectsinContext(MSCOCO)benchmarkdatasetacrosssevenstandardevaluationmetrics.ExperimentalresultsshowthatGLAN-RSsignificantlyoutperformsthestate-of-the-artapproaches,suchasmultimodalrecurrentneuralnetwork(MRNN)andGoogleNIC,whichgetsanimprovementof20%intermsofBLEU4scoreand13pointsintermsofCIDERscore.
出版日期
2017年06月16日(中国期刊网平台首次上网日期,不代表论文的发表时间)