The Semantic Extraction Service is part of the DLCS Roadmap. Proof-of-concept and ‘alpha’ services are in progress.
The Search Service can search free text and annotations across any object within the DLCS. However, the Search Service only enables users of the DLCS to search for content using text that they already know to search for.
More powerful applications of search can be built if the information that the Search Service indexes can be combined with services that know richer information about the semantics of the content within the text.
The Semantic Extraction Service makes use of natural language processing tools to extract information about entities — people, places, dates, organisations, etc — from the full text provided for images by the Text Service.
Custom matchers can also extract information about custom entities or using custom taxonomies and vocabularies required for particular use cases.
The Semantic Extraction Service can make use of the coordinate service of the Text Service to turn entities extracted from text into W3C Web Annotation Model compliant tagging annotations that can be stored in Annotation Server and indexed by Search Service.
Semantic information extracted in this way can drive rich discovery interfaces that would not be possible with raw text searching.
For example, geographic entities extracted from text can be used to place objects on maps, and dates extracted from text can be used to place items in timelines.
For more information about Digirati’s work in this area, see:
The ‘alpha’ phase of development on the Semantic Extraction Service is also exploring machine-generated links with sources of linked data such as DBPedia, GeoNames, and Viaf. Work is also ongoing on basic taxonomy management and custom vocabularies for entity extraction.