Overview of the ResearcherPod specifications

Living Standard,

This version:
https://mellonscholarlycommunication.github.io/spec-overview
Editors:
(meemoo - Flemish Institute for Archives)
(IDLab - Ghent University)
(IDLab - Ghent University)
(IDLab - Ghent University)

Abstract

This document introduces a set of technical reports that facilitate the implementation of a decentralized data exchange ecosystem using Solid.

1. Set of documents

This document is one of the specifications produced by the ResearcherPod and ErfgoedPod projects:

  1. Overview (this document)

  2. Orchestrator

  3. Data Pod

  4. Rule language

  5. Artifact Lifecycle Event Log

  6. Notifications

  7. Collector

2. Introduction

This document introduces a set of technical reports that facilitate the implementation of a decentralized data exchange ecosystem using [solid-protocol]. The intent is not to establish a network design for shipping data. Rather, it should enable (human or machine) agents to make artifacts available in the network, add value to the available artifacts, and exchange messages about these activities. Essentially, these reports focus on two generic problems common to decentralized Web networks:

This discovery of information that is stored on a decentralized Web can be quite generic. In this project there is a particular focus on the discovery of lifecycle events to which the artifacts are subject. Examples of such a lifecycle events in the realm of scholarly publication is information about when and where articles have been cited, or peer-reviewed, or published in a journal, archived in a web archive.

This work aims to complement the [solid-tr] with a concrete framework for building a semi-automated decentralized network for a specific use case or domain. It is joint output from two aligned projects:

3. Work items

The following table provides an overview of all technical reports and subject matter that is being worked on. The reports incorporated have been discussed among the project members and are constructed as project deliverables. During the course of these projects, the information in these documents may be subject to change, therefore please see each document’s publication status and versions for further details. Of course, you are invited to contribute any feedback, comments, or questions you might have.

Technical Reports
Work Item Repository Current Stage Next Stage Target Expected Completion
This document. spec-overview Technical Report Technical Report Technical Report Q1 2022
Specification of the Orchestrator component. spec-orchestrator Working Draft Draft Technical Report Technical Report Q4 2022
Implementation guidelines and additional requirements for Solid data pods. spec-datapod Working Draft Draft Technical Report Technical Report Q4 2022
Specification of the rule language to create executable business processes. spec-rulelanguage Working Draft Draft Technical Report Technical Report Q4 2022
Implementation requirements for the Artifact Lifecycle Event Log. spec-datapod Working Draft Draft Technical Report Technical Report Q2 2022
Specification of the possible notifications that can be used in the network. spec-notifications Draft Technical Report Technical Report Technical Report Q1 2022
Specification of the Collector component. spec-collector Working Draft Draft Technical Report Technical Report Q4 2022

4. Terminology

Terminology used in the specifications.

The following terms are used across all specifications:

Agent

An agent is a person, an organization, or software identified by a URI; e.g., a WebID denotes an agent [webid] (cited from [solid-protocol]).

Agents actively participate in the network: they perform actions, are identified by a [webid], and addressed through notification.

Artifact

An artifact is a Web resource identified by an URI that serves as the main focus of interaction between agents. Examples are a digitized image in an archive, the webpage of a scientific publication, a digital representation of a cultural heritage object, or a data set in a repository.

Artifacts can be atomic or arbitrarily complex. How they are organized is outside the scope of this document and is dependent on the implementing community.

Notification

A notification is a message delivered from one agent to another using the Linked Data Notifications (LDN) protocol.

Notifications are expressed in Activity Streams 2.0 (AS2.0) and their payload describes an activity.

Activity

An activity describes some form of past or present action that is directly or indirectly related to an artifact.

Activities are conform with the AS2.0 Activities.

Value-added service

A service that somehow increases the social valuation of an artifact without modifiying it.

Examples are getting promoted, getting certification, gettin a good review ...

Service Result

A service result is any outcome of providing a value-added service for an artifact.

Service results can be Web resources that are made available at a network address or mentioned inline in an activity addressed to the agent owning the data node where the artifact is hosted.

Node

A node is the main component in the network. It provides artifacts or artifact-related services to the network, and is able to communicate with other nodes.

A node is a HTTP Server that hosts two types of resources: an inbox and a Lifecycle Event Log. It is also an Linked Data Notifications Sender, Receiver and Consumer.

Data Node

A data node is a node that hosts artifacts.

Service Node

A service node is a node that provides value-added services for artifacts hosted by data nodes, but does not host artifacts itself. It can host service results.

A service node minimally meets the node requirements, but can also be a (partial) implementation of the [solid-protocol].

Note that when a service node creates a new artifact as a result of providing a value-added service for an artifact, then this new artifact can be made available in the network in its own right and can itself become the subject of activities. At that point, however, the service node must have an associated data node via which the new artifact is made available.

Inbox

An inbox is a web resource conform with the Linked Data Notifications specification. In case the node that hosts the inbox is a data pod, this is usually implemented as an LDP Container.

Agents can POST an activity in order to send information pertaining to an artifact to another agent, in order to inform the other party or request a service.

Lifecycle Event

A life cycle event is an activity pertaining to an artifact that impacts the artifact’s lifecycle and is made public. It is communicated via a notification and that the node sending or receiving the notification deemed relevant to disclose.

Lifecycle Event Log

A lifecycle event log is an append-only public web resource that exposes a series of lifecycle events related to artifacts stored by data nodes and services provided by service nodes.

It constructs a view over an inbox's contents that is determined by a selector, allowing a selected subset of activities (eg. grouped per artifact) to be consumed. A node is responsible for making the lifecycle event log publicly discoverable.

Selector

A selector is a boolean function that decides whether or not a activity belongs to the lifecycle event log or not.

Collector

An automated Web application that collects information from lifecycle event logs hosted by any node.

The following additional terms are specific to data nodes that, in addition to meeting the minimal requirements, also implement the [solid-protocol]:

Terminology specific to data nodes implemented with Solid.
Data Pod

A data pod is a place for storing documents, with mechanisms for controlling who can access what (cited from [solid-protocol]).

A data pod can be used to construct a data node that is conform to the [solid-protocol]. It stores artifacts that are made available to the network and is a Linked Data Notifications Receiver.

Data Pod owner

An owner is a person or a social entity that is considered to have the rights and responsibilities of a data storage. An owner is identified by a URI, and implicitly has control over all data in a storage. An owner is first set at storage provisioning time and can be changed (cited from [solid-protocol]).

A data pod owner is an agent that is responsible for maintaining the data pod and its artifacts. The owner can be identified by its [webid] and reached via an inbox.

For example: in case the data pod stores scholarly artifacts, the owner is typically an author or contributor to these artifacts. In case the data pod stores digital heritage objects, the owner is typically the institution or the institutional employee that curates and maintains these collections.

Service Node owner

A service node owner is the agent responsible for maintaining the service node and the service it provides.

The owner can be identified by its [webid] and reached via an inbox.

Dashboard

The dashboard is a user-facing Web application that enables an agent to manually interact with other agents or nodes in the network.

The dashboard is a Linked Data Notifications Sender and Consumer. In case it is used to manage a data pod, it will also be a compliant Solid App implementation.

Orchestrator

The orchestrator is an autonomous Web application that operates on behalf of an agent agent and has access to a data pod. It interprets and executes business rules described in a rulebook.

The orchestrator is also a Linked Data Notifications Sender and Consumer.

Rulebook

A set of machine-readable business rules that instruct the orchestrator on what actions to take in response to incoming notifications.

5. Agents, Artefacts and Service Results

Artifacts are the main focus of this decentralized network design. They are pieces of data (eg. a file, a document, ...) produced by the Agents that participate in the network and hosted on data nodes The goal is, however, not to enable the exchange the artifacts, but rather to facilitate discourse about them.

Artifacts hold potential value in the domain in which the Agents operate. Over the network, Agents can send messages to other Agents to request value-added services. Each executed service increases the value-chain of an Artifact, which is published in an event log. The contents of an artifact are considered out-of-scope, and therefore, they are viewed as black-boxes to the network’s design.

Services are performed by service nodes and can produce a Service Result after execution. Service results are hosted on service nodes and provide metadata about the service run. The difference with artifacts is that service results lack a relevant value-chain. If interesting, a service result can be promoted to artifact, eg. when a review of a document itself becomes subject to review. In that case, however, the service node takes up the role of data node as well.

The following example illustrates a basic interaction for which this network design can be applied.

  1. An Agent called Alice creates an artifact: a document. The document is stored in Alice’s Pod.

  2. Alice offers the artifact to a Reviewing Service (a Service Node and Agent) in order to review it.

  3. The Reviewing Service performs the service and produces a Service Result.

  4. The Reviewing Service informs Alice that the Service was performed.

  5. The value-chain of the document is enhanced with a reviewed event.

example workflow.

6. Dashboard and Orchestrator

In a Solid implementation of these specifications, two software applications interact directly with the Data Pod with different privileges: the Orchestrator and the Dashboard. While these applications could overlap or might not be required at all (in some use cases), in our specifications they are treated as separate entities to help the discussion. Some reasons why these applications could be treated as separate entities:

Typically, the Dashboard runs in a browser and responds to feedback from an agent. It can be in an online or offline modus, running on the computer of the agent in one of the browser tabs. The Orchestrator, however, runs as a always-on background-process or remote Web service. Both can read the Inbox of the Data Pod and append to the Lifecycle Event Log. Both communicate with the network using the Notifications. Only the Orchestrator is guided by a Rulebook and operates with little human intervention.

The network below demonstrates the CRUD privileges imagined for the different agents in this document. The first network demonstrates a typical Solid setup where the dashboard is a single page application that has direct access to the data pod. The Orchestrator could be a process that is part of the dashboard, but is runs here as a separate network service. It is always online and works on behalf of the data pod owner. Finally, there can be other applications in the network (indicated by Something) that can read data from the data pod or send notifications to the data pod, but without requiring direct privileges to change artifacts on the data pod.

CRUD operations in case the Dashboard is single-page application and Orchestrator a background task

The second network demonstrates a more classic setup with a browser Dashboard controlled by a server application which uses a data pod as backend storage.

CRUD operations in case the dashboard and a service node (which also runs to an Orchestrator component) is a classic client/server application

7. Communication between Data Pods and domain-specific networks

7.1. Communication between Data Pod and Scholarly Community network

Notifications can be sent from the researcher environment to Service Node environments. For instance, in case of a request to review an artifact that resides in the Data Pod, an appropriate notification] can be sent to a review Service Node. The Service Node can respond, for example, accepting or rejecting the review request, and, in the latter case, to relay the result of the review.

The Orchestrator sends notifications in response to triggers that result from the execution of the rulebook is associated with the Data Pod. The Orchestrator receives notifications in response to the ones it sent. The Orchestrator selectively records information contained in both outgoing and incoming notifications in the Lifecycle Event Log.

The notifications are regarded as a high-level approach to automatically coordinate the distributed execution of the crucial functions of scholarly communication. The notifications merely ensure that the respective functions are in effect executed as prescribed by the rulebook, but do not attempt to automate the actual fulfillment of the function itself. For instance, when an Offer is sent to a Review Service we don’t envision that message contains all the steps to fully automate the submission process. It could contain enough metadata to enable simple workflows. In general out of band communication could be needed to perform all required steps.

7.2. Communication between Data Pod and Cultural Heritage network

In the cultural heritage domain, institutions that manage collections, such as museums or libraries, and service providers, such as object registrars, user-facing portals or digital archives, participate in a joint network with the goal of sharing digital metadata and objects effectively. Institutions maintain a Data Pod as primary exchange hub for metadata about their collections. End-user portals aims to collect thematic subsets from this selection of pods. In between, there is a layer of facilitating services that

Institutions use notifications to request services from the designated Service Node, eg. they can Offer a dataset to a enrichment service. In addition, notifications are used by all parties to inform each other about relevant changes. A Service Node can respond with, for example, an accept or reject of the request for digital preservation, or simply take note. The result of a service, ie. the object is preserved digitally for the long term, is a new piece of metadata that augments the object’s lifecycle. This event can be communicated with a new notification to the institution that made the requested or other interested agents in the network.

Driven by a mix of policy derived from institutional and cooperation requirements, institutions follow processes that dictate when to request certain services, contact other agents and in what order. These processes can be declared in a rulebook and executed in an automated manner by an Orchestrator.

Services that help the chasm between the Data Pod and data consuming portals, can maintain a Data Pod of their own, for storing and exposing derivatives to upper layers in the network. For instance, a Service Node that collects datasets from institution’s Data Pods to generate dataset summaries and enrichments can store these results. In turn, an end-user portal can use the data of this Service Node to find Data Pods with relevant data.

8. Acknowledgement

We thank Herbert Van de Sompel, DANS + Ghent University, hvdsomp@gmail.com for the valuable input during this project.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. URL: https://w3c.github.io/activitystreams/vocabulary/
[LDN]
Sarven Capadisli; Amy Guy. Linked Data Notifications. URL: https://linkedresearch.org/ldn/
[LDP]
Steve Speicher; John Arwe; Ashok Malhotra. Linked Data Platform 1.0. URL: https://www.w3.org/2012/ldp/hg/ldp.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SOLID-PROTOCOL]
Sarven Capadisli; et al. The Solid Protocol. Published. URL: https://solidproject.org/TR/protocol

Informative References

[SOLID-TR]
Solid Technical Reports. URL: https://solidproject.org/TR/
[WEBID]
Andrei Sambra; Henry Story; Tim Berners-Lee. WebID 1.0. Editor’s Draft. URL: https://dvcs.w3.org/hg/WebID/raw-file/tip/spec/identity-respec.html