简介:Inthispaper,anewmethod,namedasL-treematch,ispresentedforextractingdatafromcomplexdatasources.Firstly,basedondataextractionlogicpresentedinthiswork,anewdataextractionmodelisconstructedinwhichmodelcomponentsarestructurallycorrelatedviaageneralizedtemplate.Secondly,adatabase-populatingmechanismisbuilt,alongwithsomeobject-manipulatingoperationsneededforflexibledatabasedesign,tosupportdataextractionfromhugetextstream.Thirdly,top-downandbottom-upstrategiesarecombinedtodesignanewextractionalgorithmthatcanextractdatafromdatasourceswithoptional,unordered,nested,and/ornoisycomponents.Lastly,thismethodisappliedtoextractaccuratedatafrombiologicaldocumentsamountingto100GBforthefirstonlineintegratedbiologicaldatawarehouseofChina.