Data pod for a decentralized Web network

Living Document,

This version:
https://mellonscholarlycommunication.github.io/spec-datapod
Editors:
(IDLab - Ghent University)
(meemoo - Flemish Institute for Archives)
(IDLab - Ghent University)

Abstract

This specification describes the implementation requirements for a Solid based Data Node component.

1. Set of documents

This document is one of the specifications produced by the Mellon and ErfgoedPod project:

  1. Overview

  2. Orchestrator

  3. Data Pod (this document)

  4. Rule language

  5. Artefact Lifecycle Event Log

  6. Notifications

  7. Collector

2. Introduction

In a Solid decentralized network, data is stored in a distributed network of data pods. Data stored on these data pods is made available over the Web with unique identifiers, enabling other actors and applications on the network to interact with the available resources without the need for a centralized service. If we want to keep track of the lifecycle and interactions for the published resources on a data pod, requirements have to be specified for any data pod implementation to support this functionality. In this document, we define the required functionality for a data pod implementation that can incorporate Event information for all published resources on the data pod.

3. Definitions

This document uses the following defined terms from [spec-overview]:

4. High-level overview

A Solid Data pod is a personal data space on the Web. An actor can use this data space to store and share resources over the Web, and receive notifications from other actors in an advertised inbox resource directory. To track the the lifecycle and its interactions of resources published on a data pod in decentralized networks, a data pod implementation must provide functionality for the storage and discovery of these events for their respective resources. In § 6 Resource storage, we define the base requirements for the storage of resources on the network. In § 7 Resource Versioning, we define how resource versioning can be handled on the data pod. In § 8 Resource Event Information, the requirements are defined for the storage and discovery of an Event Log for a given resource on the data pod. In § 9 Notifiations, the requirement is defined for the data pod to be a Linked Data Notification Receiver.

5. Creating a data pod

A Solid data pod MUST be deployable as a local background process or as a remote web service. In the case that a data pod is deployed as a local process, the data pod instance should be connected to the Web at all times if the data pod should be discoverable permanently in the network. In the case of the latter, an actor in the network MUST be able to create their own data pod using this web service.

6. Resource storage

6.1. indexing

Any resource stored on the data pod should be included in an index on the data pod to enable discovery of the resources by applications and actors in the network. A first indexing method is to make use of a Type Index. By indexing the resources in a public type index, actors and applications in the network can discover resources of a specific type. A second method is to index the resources using a Shape Tree. The shape tree specification defines where resources matching a specific shape are stored on the data pod. By parsing the available shape tree, any actor or application on the network can discover resources for a specific shape.

6.2. metadata

The storage of resource metadata for non-RDF resources is desirable on a Solid Pod implementation. Resource metadata can be added to non-RDF resources by creating a resource with the .meta extension, according to the Solid specification.

7. Resource Versioning

The data pod may provide functionality to support versioning of stored resources. Support for such versioning is not built in to the Solid specification. Resource versioning in a Solid pod environment can be implemented according to the definition by the Fedora API specification which is based on the memento protocol. In cases where it is important to be able to reference specific versions of a resource, versioning may be handled by generating a new URI for different resource versions. Version linking can be handled using the DCAT vocabulary, by adding the versioning information directly to the resource in the case of an RDF resourcem, or to the metdata resource in case of non-RDF resources.

8. Resource Event Information

The data pod may provide functionality to store Event data related to resources stored on the data pod. To provide this functionality, the data pod must implement the Event Log specification. This specification dictates how event related data must be stored on the data pod, and how it can be discovered by external actors in the network.

9. Notifiations

In the Solid ecosystem, notifications serve as the main communication mechanism in a network of Solid Pods. These notifications follow the Linked Data Notifications Specification (LDN). Any Solid pod in the network serves as an LDN Receiver, and consequently has the ability to receive notifications from any actor in the network. By defining an inbox on a resource, notifications for different resources can be directed towards different inboxes defined on the data pod. The actor managing the Solid pod may choose to manually process the incoming notifications or can automate this process through an external Orchestrator service. A Linked Data Notification has no guarantee of any response being given or any action being undertaken. Any action required on receiving specific notifications has to be defined and enforced separately.

10. Spec roadmap

10.1. September

  1. work out requirements for resource versioning in Solid.

    • Is time based versioning useful for our usecases?

    • What is the best approach to version taggin for specific resources and discovery?

    • How can we make sure events can be tagged for specific resource versions even on remote actors?

  2. Work out data discovery more.

    • work out requirement for shape trees according to the interop spec

11. Acknowledgement

We thank Herbert Van de Sompel, DANS + Ghent University, hvdsomp@gmail.com for the valuable input during this project.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119

Informative References

[SPEC-OVERVIEW]
Miel Vander Sande. Overview of the ResearcherPod specifications. Editor’s Draft. URL: http://mellonscholarlycommunication.github.io/spec-overview/