The DLCS incorporates a number of ingest and orchestration services to manage the flow of images, audio, video, and text into the DLCS and the integration of content across the component DLCS services.
Benefits of DLCS ingest architecture
The DLCS ingest and orchestration process takes care of:
- Upload of content: images, audio, video, and text
- Transformation of content
- Long-term storage of content
This way the DLCS ensures that files are in the correct formats, and available for:
- External delivery via the service layer
- Integration with other DLCS services
The orchestration process manages the pipeline between: initial content ingest, transformation, and storage; and any other DLCS services such as image and media delivery, text indexing, semantic extraction, and content search.
Users of the DLCS don’t need to manage the complexities of these processes, and can make use of a simple API to add content while remaining sure that all of the associated tasks happen in a reliable, scalable way.
Ingest into the DLCS can operate in a “push and forget” way, until you need it, when the content will be available in all of the formats and via all of the DLCS services that you need.
Media & Text Ingest: Technical Overview
DLCS accepts images, audio, video, and metadata, and orchestrates the ingest of content via two pipelines:
- media pipeline
- text pipeline
The media transformation services can transform a variety of incoming media into consistent formats for delivery via the DLCS. Supported formats include most video, audio, and image formats.
The image transformation service accepts a wide range of incoming images in various formats. The media pipeline downloads these images and places them into a shared filesystem and passes a message to the image transformation service with the location and other requirements. The images are encoded as JPEG2000 files for efficient transformation by the delivery service and a set of fixed resolution jpg files in particular sizes are created for use as thumbnails and to enable very fast image rendering. Upon completion the transformation service passes a message back to the pipeline orchestration with the results of the operation.
Images are then made available for fast, efficient delivery via the DLCS service delivery layer.
Audio and video transformation
The audio and video transformation service uses AWS Elastic Transcoder to perform the required transcoding. A message queue based system translates transcoding requests from the DLCS orchestration service and passes these to Elastic Transcoder. Another component of the system listens to the responses from Elastic Transcoder and transforms these back into DLCS media pipeline responses. In addition to this translation the transformation service also allows the re-encoding of updated inputs to the same output location which isn’t usually supported in Elastic Transcoder.
Both the audio/video transformation and image transformation pipelines are highly parallel, and can scale to accept very large collections as input.
Text Server (Starsky) operates at the individual image level and accepts IIIF Image API endpoint URIs — which may be existing DLCS images — and (optionally) transcriptions and exposes a set of services to transform and present the textual content of those images. Additional services at the multi-image whole object level are provided by Text Server (River).
Image-to-Text Ingest Architecture
Input to the Text Server (Starsky) is via Amazon SQS messages containing the image URI, optional text transcription, and any other flags or hints that may be useful to the text processing pipeline. These can come from a variety of sources including from within other parts of the DLCS, or from an external integration which can be with an existing customer system.
At ingest, if a transcription or OCR full-text was not supplied the Text Server (Starsky) will retrieve a version of the image and OCR it to hOCR format and the supplied or generated text transcription is then stored (by default storage is in an Amazon S3 bucket). Where possible an index is also created from the transcription, containing the positions and text of each word in the transcription. This is stored as a blob of JSON data, also in an S3 bucket. A normalised plaintext representation of the transcription is then optionally sent to an SQS queue for further processing by the rest of the text pipeline, the next stage of which would typically be the indexer.
Core per-image functions:
- text ingest and storage (from existing OCR or transcription)
- image-to-text via OCR (if no existing OCR or transcription)
- text indexing and index storage
Images-to-Object Integration Architecture
River is implemented as a set of independent services:
Accepts a IIIF Presentation API manifest which it splits into individual images and parallel processes through Text Server (Starsky) to generate text transcriptions for entire objects.
has a plugin architecture to process content using Text Server (Starsky) and generate line-by-line transcriptions for an object which can be returned as an annotation list suitable for viewing in a IIIF compatible viewer and which could be embedded as otherContent within a IIIF Presentation API manifest.
Core per-object functions:
- parallel OCR images for multi-page objects (if no existing text)
- store text for multi-page objects
- parallel index text for multi-page objects
- generate IIIF annotation lists from stored text and text index.