Speaker adapted dynamic lexicons containing phonetic deviations of words

(整期优先)网络出版时间:2009-10-20
/ 1
Speakervariabilityisanimportantsourceofspeechvariationswhichmakescontinuousspeechrecognitionadifficulttask.Adaptingautomaticspeechrecognition(ASR)modelstothespeakervariationsisawell-knownstrategytocopewiththechallenge.AlmostallsuchtechniquesfocusondevelopingadaptationsolutionswithintheacousticmodelsoftheASRsystems.Althoughvariationsoftheacousticfeaturesconstituteanimportantportionoftheinter-speakervariations,theydonotcovervariationsatthephoneticlevel.Phoneticvariationsareknowntoformanimportantpartofvariationswhichareinfluencedbybothmicro-segmentalandsuprasegmentalfactors.Inter-speakerphoneticvariationsareinfluencedbythestructureandanatomyofaspeaker'sarticulatorysystemandalsohis/herspeakingstylewhichisdrivenbymanyspeakerbackgroundcharacteristicssuchasaccent,gender,age,socioeconomicandeducationalclass.Theeffectofinter-speakervariationsinthefeaturespacemaycauseexplicitphonerecognitionerrors.Theseerrorscanbecompensatedlaterbyhavingappropriatepronunciationvariantsforthelexiconentrieswhichconsiderlikelyphonemisclassificationsbesidespronunciation.Inthispaper,weintroducespeakeradaptivedynamicpronunciationmodels,whichgeneratedifferentlexiconsforvariousspeakerclustersanddifferentrangesofspeechrate.Themodelsarehybridsofspeakeradaptedcontextualrulesanddynamicgeneralizeddecisiontrees,whichtakeintoaccountwordphonologicalstructures,rateofspeech,unigramprobabilitiesandstresstogeneratepronunciationvariantsofwords.EmployingthesetofspeakeradapteddynamiclexiconsinaFarsi(Persian)continuousspeechrecognitiontaskresultsinworderrorratereductionsofasmuchas10.1%inaspeaker-dependentscenarioand7.4%inaspeaker-independentscenario.