Knora (Knowledge Organization, Representation, and Annotation) is a a content management system for the long-term preservation and reuse of humanities data. It is designed to accommodate data with a complex internal structure, including data that could be stored in relational databases.
Knora aims to solve key problems in the long-term preservation and reuse of humanities data:
First, traditional archives preserve data, but do not facilitate reuse. Typically, only metadata can be searched, not the data itself. You have to first identify an information package that might be of interest, then download it, and only then can you find out what’s really in it. This is time-consuming, and makes it impractical to reuse data from many different sources.
Knora solves this problem by keeping the data alive. You can query all the data in a Knora repository, not just the metadata. You can import thousands of databases into Knora, and run queries that search through all of them at once.
Another problem is that researchers use a multitude of different data formats, many of which are proprietary and quickly become obsolete. It is not practical to maintain all the programs that were used to create and read old data files, or even all the operating systems that these programs ran on.
Instead of preserving all these data formats, Knora supports the conversion of all sorts of data to a small number of formats that are suitable for long-term preservation, and that maintain the data’s meaning and structure:
- Non-binary data is stored as RDF, in a dedicated database called a triplestore. RDF is an open, vendor-independent standard that can express any data structure.
- Binary media files (images, audio, and video) are converted to a few specialised archival file formats and stored by Sipi, with metadata stored in the triplestore.
Knora then makes this data available for reuse via its generic, standards-based application programming interfaces (APIs). A virtual research environment (VRE) can then use these APIs to search, link together, and add to data from different research projects in a unified way.
Each project creates its own data model (or ontology), describing the types of items it wishes to store, using basic data types defined in Knora’s base ontology. This gives projects the freedom to describe their data in a way that makes sense to them, while allowing Knora to support searching and linking across projects.
Knora has built-in support for data structures that are commonly needed in humanities data, and that present unique challenges for any type of database storage.
In the humanities, a date could be based on any sort of calendar (e.g. Gregorian, Julian, Islamic, or Hebrew). Knora stores dates using a calendar-independent, astronomical representation, and converts between calendars as needed. This makes it possible to search for a date in one calendar, and get search results in other calendars.
Commonly used text markup systems, such as TEI/XML, have to represent a text as a hierarchy, and therefore have trouble supporting overlapping markup. Knora supports Standoff/RDF markup: the markup is stored as RDF data, separately from the text, allowing for overlapping markup. Knora’s RDF-based standoff is designed to support the needs of complex digital critical editions. Knora can import any XML document (including TEI/XML) for storage as standoff/RDF, and can regenerate the original XML document at any time.
Knora’s API provides a search language, Gravsearch, that is designed to meet the needs of humanities researchers. Gravsearch supports Knora’s humanites-focused data structures, including calendar-independent dates and standoff markup, as well as fast full-text searches. This allows searches to combine text-related criteria with any other criteria. For example, you could search for a text that contains a certain word and also mentions a person who lived in the same city as another person who is the author of a text that mentions an event that occurred during a certain time period.
The RDF standards do not include any concept of permissions. Knora’s permission system allows project administrators and users to determine who can see or modify each item of data. Knora filters search results according to each user’s permissions.
RDF does not have a concept of data history. Knora maintains all previous versions of each item of data. Ordinary searches return only the latest version, but you can obtain and cite an item as it was at any point in the past.
RDF triplestores do not implement a standardised way of ensuring the consistency of data in a repository. Knora ensures that all data is consistent, conforms the project-specific data models, and meets Knora’s minimum requirements for interoperability and reusability of data.
Knora supports publishing data online as as Linked Open Data, using open standards to allow interoperability between different repositories on the web.
Knora can be used with a general-purpose, browser-based VRE called SALSAH. Using the Knora API and Knora-ui, a set of reusable user-interface components, you can also create your own VRE or project-specific web site.