Search Service

Hosting of content is important, but content is only useful if it can be found. At the core of the DLCS is the ability to search text (such as OCR or transcriptions) and search annotations on content.

Annotations are a powerful and flexible way of adding rich description, structure, and relationships about, and between, content. Commenting, tagging, transcribing, linking, classifying are all operations that can be effectively modelled as annotations.

So search across annotations, and faceted aggregation of annotations is a key part of building a discovery interface that supports search and serendipitous browsing across content.

The Mathmos (Github), the DLCS Search Service offers a IIIF Content Search API compliant search service that can be used to build flexible discovery experiences.

Benefits of the DLCS Search Service

  1. Support for  IIIF Content Search API, which means support for a range of open-source client applications.
  2. Search of annotations (tags, comments, transcriptions, etc), not just text.
  3. Integration with other DLCS services such as the Text Service and Annotation Server.

Technical Overview

search-service

The DLCS Annotation/Text Search Service comprises two core services.

Indexing:

Annotation Indexer (Pygar):

Reads messages from an SQS queue containing W3C/OA Annotations to be created, updated or deleted. These annotations are then indexed or deleted in Elasticsearch.

Text Indexer (Barbarella):

Reads messages from an SQS queue containing text from images and indexes them in Elasticsearch. Typically the Text Indexer will receive full-text for images via pre-existing OCR text or newly generated OCR text provided by the Text Service, but can also index manually created or crowd-sourced transcription.

Search:

Accepts search requests conforming to the IIIF Content Search API, and can return W3C Web Annotation or Open Annotation model formatted search results which can be used to create annotations and highlights on DLCS resources.

The server can provide autocomplete on W3C and OA annotation searches. The search results for annotations searches can be returned as paginated Annotation Lists, when required, with pagination for full text search on the development roadmap.

The search server can search full-text (from Text Server (Starsky)) and annotations (from the Annotation Server (Elucidate)). The search service makes use of the DLCS Text Server’s coordinate service to turn search hits into bounding boxes that can be used to create annotations on images (or other media).

Roadmap:

  • Autocomplete on non-annotation sources.
  • Integration with DLCS Structure Service.
  • Unified search across text, annotations.
  • Pagination of text search results.