Skip to content

#4 - Documentation

Documentation issues

Documentation is another key element when dealing with HSAs. These are the most frequent issues highlighted by the H-SeTIS survey:

  • Lack of consistent documentation. A primary issue is that documentation for HSAs is often poor or completely absent. if any documentation exists, it is published through academic papers. However, these publications are not a substitute for proper technical documentation, which should provide detailed insights that a scientific paper cannot accommodate. HSAs documentation should be always available online (e.g. without paywalls) in an adequate format. The lack of documentation hinders or even prevents the reuse of these semantic artefacts.

  • Insufficient implementation of embedded documentation. HSAs rarely provide detailed embedded documentation (i.e. embedded metadata), which is crucial for ensuring they are FAIR (Findable, Accessible, Interoperable, and Reusable). Embedded metadata, which should be integrated within the artefact itself, is often missing. This includes licensing information, as an ambiguous or undeclared license is a significant barrier to reuse, contact and authorship details, as information that helps assess an artefact's responsiveness status, such as contact information is often missing, and community engagement information, as links to code repositories or issue trackers are seldom included as well.

  • Local and development documentation is often missing as well. Local documentation stands for metadata used to manage individual components within the artefact, such as documenting changes or improvements to specific classes or properties. Tools like skos:editorialNote exist but are not widely used (provide possible solution). This type of documentation is crucial for maintenance and tracking the evolution of the artefact over time. Development documentation means those resources that provide insight into the methodologies used for the development of the HSAs, their update policies, or how author contributions are tracked. This documentation should provide details on the processes behind the creation of the semantic artefact.

  • Failure to distinguish documentation from other outputs. There is a tendency to merge project deliverables with user documentation. User documentation provides guidelines for implementation and use cases, and it must be regularly updated to remain functional. In contrast, project deliverables are static research outputs and do not fulfill this role.

  • User negligence. This problem is twofold. Not only is there a lack of sufficient documentation from creators, but users themselves often fail to regularly and carefully consult the existing documentation. This neglect undermines the principle of reusability and hinders the effective, long-term sustainability of a semantic artefact. This issue could have deeper consequences in the Heritage field, considering that many users might not be familiar with IT documentation and the development of SAs, and that subjective interpretations are very frequent.

Proposed Solutions and Best Practices

To address these documentation issues, the ‹H/RADIOSA› initiative suggests several solutions and best practices, such as the use standardized frameworks. When possible, templates like the MIRO (Minimum Information for the Reporting of an Ontology)(2) guidelines should be adopted to produce consistent and comprehensive documentation. The Metadata for Ontology Description and Publication Ontology (MOD) is a metadata vocabulary created to improve the FAIRness of ontologies and other semantic artifacts by making them easier to find, access, reuse, and interoperate with on the Web(1). It was designed to address the lack of precise ontology metadata, moving beyond general vocabularies like Dublin Core, and to harmonize ontology descriptions across platforms, thereby reducing costs and encouraging reuse.

  1. Dutta, B., Nandini, D., & Shahi, G. K. (2015). MOD: Metadata for Ontology Description and Publication. Proceedings of the 2015 International Conference on Dublin Core and Metadata Applications, DC-2015 in São Paulo, Brazil, 1-4 September 2015. https://doi.org/10.23106/DCMI.952136974
  2. Matentzoglu, N., Malone, J., Mungall, C., & Stevens, R. (2018). MIRO: Guidelines for Minimum Information for the Reporting of an Ontology. Journal of Biomedical Semantics, 9(1), 6. https://doi.org/10.1186/s13326-017-0172-7

Since its first release in 2015, MOD has evolved through several versions, most recently MOD 3.2.2, expanding its structure to include dozens of classes and properties for describing semantic artifacts such as taxonomies, terminologies, thesauri, and ontologies in various digital formats. Its development has been guided by a two-pronged methodology that combined high-level conceptual facets with user-driven needs for ontology search and evaluation, while adhering to principles of clarity, simplicity, authority, and interoperability. MOD is deeply aligned with Linked Data practices, ensuring interoperability through extensive reuse of elements from other metadata vocabularies, while addressing the shortcomings of earlier approaches such as OMV. As a DCAT-based extension, it plays a central role in implementing the FAIR principles, particularly by defining minimum metadata schemas and ensuring semantic artifacts and catalogues can be systematically described and shared. Widely adopted in projects like FAIRsFAIR, FAIR-IMPACT, AgroPortal, EcoPortal, and several OntoPortal-based catalogues, MOD has already enhanced interoperability across platforms and continues to serve as a reference standard, with future plans focused on automation, integration with major ontology libraries, and the release of MOD-aligned knowledge bases as Linked Open Data.

An alternative for good embedded metadata documentation is the following metadata set:

Parameter Description
dcterms:title Title. A human-readable name of the ontology or vocabulary that clearly identifies it.
dcterms:alternative Alternative title. Any known abbreviations, acronyms, or alternative names for the ontology to improve discoverability.
dcterms:author Author. The main individual(s) or organization(s) responsible for creating the ontology.
dcterms:contributor Contributor. Other individuals or organizations that contributed to the development, review, or maintenance of the ontology.
dcterms:description Description. A concise summary of the ontology’s purpose, scope, and intended use cases, making it easier for others to understand and evaluate its relevance.
dcterms:created Date created. The date on which the ontology was originally created, ideally in ISO 8601 format (YYYY-MM-DD).
dcterms:modified Date modified. The most recent date the ontology was updated, useful for tracking changes and version management.
dcterms:license License. A clear reference (usually via a URI) to the license under which the ontology is distributed, ensuring clarity about reuse rights and restrictions.
dcterms:bibliographicCitation Bibliographic citation. The recommended citation to use when referencing the ontology in scholarly or technical works.
vann:preferredNamespacePrefix Preferred namespace prefix. The short prefix (e.g., foaf, skos) that should be used when referencing ontology terms in RDF or linked data.
vann:preferredNamespaceUri Preferred namespace URI. The persistent URI that identifies the ontology namespace, ensuring unambiguous reference and linking.
owl:versionIRI Versioned IRI. The IRI identifying the specific version of the ontology, enabling precise reference to a given release.
pav:version Version. A human-readable version string (e.g., 1.0, 2.1-beta) that complements the owl:versionIRI.
sw:status Status. The current maturity or lifecycle status of the ontology (e.g., draft, stable, deprecated), helping users assess reliability and suitability for production use.
foaf:homepage Homepage. A URL to the project or ontology homepage, providing additional documentation and resources.
foaf:logo Logo. An image representing the ontology or its project, often useful in portals or catalogues for better recognition.

Another solution is to adopt community-driven practices. HSAs should be community-driven and address evolving needs, which can only be achieved through a constant interaction with users and experts. Good documentation is a cornerstone of this interaction. Separate documentation types could help as well; following the OBO Foundry guidelines, documentation should be categorized into four types: embedded, local, user, and development, each serving a distinct purpose.