Orchestrator for a decentralized Web network

1. Set of documents

This document is one of the specifications produced by the ResearcherPod and ErfgoedPod project:

2. Introduction

In a decentralized network containing many Data Pods and Service Hubs, data and services do not resort in one place, but are intentionally distributed. As a result of this decentralization, the actors in the network (such as researchers, institutions or service providers), need to potentionally involve and communicate with multiple services in order to execute a desired business process or workflow. Workflows can be tasks such as the registration of datasets, the certification of research, publishing in a journal, indexing data in a search engine and the archivation of data. Each of these tasks can in principle be executed by a dedicated service component. To keep track of the interactions between Artefacts (the data) and the Service Hubs, provenance trails in the form of Artefact Lifecycle Event Logs are generated and the actors are notified about the new events in the network. To avoid excessive manual work by Data Pod and Service Hub mantainers, an Orchestrator component is introduced in this document to automate part of the required interactions between the actors and all other components of the network.

Network actors maintain their data in a Data Pod, which is by design a "passive" component. The Data Pod can offer secure data access via Linked Data Platform [LDP] to others, but cannot perform actions such as invoking remote services on the network or reading and writing network content. These capabilities are delegated to Dashboard applications where actors can manually interact with the Data Pod and network and Orchestrators that can work on behalf of actors in an automated way.

On behalf of the actor, an Orchestrator responds to Triggers by executing a number of Actions dictated by a machine-readable Policy. Possible triggers are incoming notifications, perceived changes in Artefacts, or manual invocation by an actor. Possible actions are:

Sending notifications to other actors;
Requesting access to resources in a Data Pod;
Reading data from a Data Pod;
Writing or appending data to a Data Pod.

The Orchestrator implements the Autonomous Agent model: an intelligent software instance that operates on an actor’s behalf but without any interference of that actor.

In practice, an Orchestrator is dedicated to a single Data Pod for which it has access rights to all relevant resources, including the Linked Data Notification [LDN] inbox and the Artefact Lifecycle Event Log.

Its autonomy is supplied by the Policy, which dictates business rules in a declarative manner using a policy language.

The remainder of this document specifies the requirements for implementing an Orchestrator component.

3. Document Conventions

Within this document, the following namespace prefix bindings are used:

Prefix	Namespace
acl	http://www.w3.org/ns/auth/acl#
as	https://www.w3.org/ns/activitystreams#
ex	https://www.example.org/
fno	https://w3id.org/function/ontology#
foaf	http://xmlns.com/foaf/0.1/
ldp	http://www.w3.org/ns/ldp#
pol	https://www.example.org/ns/policy#
solid	http://www.w3.org/ns/solid/terms#

4. High-level overview

An Orchestrator instance is an Autonomous Agent dedicated to a single Data Pod, Service Hub, or any other actor hosting the Artefact Lifecycle Event Log and Inbox resources. It interprets and executes business rules described in one or more Policy documents. The Orchestrator watches the Inbox for possible triggers, while it records the actions it takes in the Artefact Lifecycle Event Log.

4.1. Perspectives

4.1.1. Data Pod

From a Data Pod perspective, a Maintainer operates the Data Pod with help of one or more Dashboard applications. A Scholarly Dashboard has the capability to present the Inbox and Artefact Lifecycle Event Log in a human friendly way to the Maintainer.

When a Maintainer wants to send a notification to the network, this message is first sent to the Orchestrator. With help of Policies, the Orchestrator will forward the notification to an external Data Pod or Service Hub and update the Artefact Lifecycle Event Log.

The Orchestrator watches the Data Pod Inbox for incoming notifications.

Based on incoming notifications the Orchestrator executes the Policies rules that are contained in one or more Policy documents.

Notifications are one type of Trigger that will start the Orchestrator executing Policy rules. See the Triggers section to find other types of Tiggers.

There are two main sources of Triggers:

Outgoing notifications that are sent by the Maintainer to the LDN Inbox of the Orchestrator.
Incoming notifications that are sent from the network to the LDN Inbox of the Data Pod.

When a trigger arrives, the Data Pod Orchestrator consults the Policy documents for zero or more rules matching the trigger. Each matching rule will result in zero or more Actions.

Basic Actions can involve reading resources from or sending notifications to Service Hubs or other Data Pods (including the one it’s connected to). In general, an Orchestrator is free to implement any kind of local defined Action. This document will provide the minimum set of actions that are shared and supported for all Orchestrator implementations to support the ResearchPod and ErfgoedPod network.

All actions taken by the Orchestrator are recorded in the Data Pod Artefact Lifecycle Event Log. When the Data Pod Orchestrator requires manual input from the Maintainer, it can communicate this via the LDN Inbox of the Data Pod. The Dashboard presents this event to the Maintainer in an actionable way. These actions by the Maintainer could result in a new Trigger.

4.1.2. Service Hub

From a Service Hub perspective, a Service Hub Orchestrator can work on behalf of the Service Hub to establishes automated response to notifications from other network actors and orchestrators in context of the provided service.

As a possible side-effect, it can also actively consults additional actors in order to complete the service.

The Service Hub Orchestrator responds by delivering a new notification in the Inbox of the actor that invoked the service. It is the Policy that dictates what response to construct and what consecutive Actions (reading resources from or sending notifications to Data Pods or other Service Hubs) need to be performed.

Also the Service Hub Orchestrator maintains a Service Hub Artefact Lifecycle Event Log of all actions taken on behalf of the Service Hub.

In this way Orchestrators mimic the services provided to the Data Pod and Service Hub.

4.2. Common interaction pattern

A common activity starts when the Maintainer of a Data Pod requires a service provided by a Service Hub. Both actors can be considered operating an Orchestrator to automate their participation in the network. Hence, a common interaction pattern is as follows:

The maintainer at Data Pod A performs an action that affects or interests other actors in the network (eg. adding a new artefact in the data pod).
The maintainer uses the Dashboard to reflect this action by sending trigger event A to the Inbox of Orchestrator A.
This trigger event A results in Orchestrator A to take consecutive actions.
Orchestrator A consults its Policy A for the received trigger, which, for instance, dictates that it needs to inform the Service Hub B about the event.
Orchestrator A sends a notification to Inbox B of Service Hub B.
The Orchestrator B of the Service Hub B monitors Inbox B and is triggered by the new notification in Inbox B to take consecutive actions.
Orchestrator B consults its Policy B for the received trigger, which, for instance, dictates that the event A needs to be appended to the Service Hub B Artefact Lifecycle Event Log.
Service Hub B processes the contents of the received notification using an internal process (e.g. manual evaluation the contents, adding metadata, creating a new artefact in the repository). How this is done is not specified.
When Service Hub B has completed the process it sends a trigger event B to the Orchestrator B to notify the maintainer of Data Pod A.
Orchestrator B consults its Policy B which requires not only to send the notification to the Inbox A of the Data Pod A, but also to append event B to the Service Hub B Artefact Lifecycle Event Log.
The Orchestrator A of the Data Pod A monitors Inbox A and is triggered by the new notification in Inbox A to take consecutive actions.
Orchestrator A consults its Policy A, which dictates that the new event should be added to the Data Pod A Artefact Lifecycle Event Log.
The Dashboard displays Artefact Lifecycle Event Log A to the maintainer of Data Pod A to show that the necessary actions have been taken.

5. Data Pod Initialization

To operate autonomously, an Orchestrator has to obtain access to some resources in the Data Pod, respectively in the Service Hub (in case the Orchestrator is working on behalf of the Service Hub). These resources are made available in a secure matter to the Orchestrator using the Web Access Control specification. Orchestrators that are compliant with the The Solid Protocol § solid-app Client-Side Implementation requirements can gain access to these resources and manage them on behalf of the maintainer.

The following section will use the acl: prefix to specify Web Access Control settings.

The minimum setup the Data Pod (or Service Hub) MUST have:

An LDN Inbox with:
- acl:Read access for the Orchestrator.
- acl:Append access for the Orchestrator and the Network.
An Artefact Lifecycle Event Log with:
- acl:Read access for the Orchestrator and the Network.
- acl:Append access for the Orchestrator.

The maintainer of the Data Pod (or Service Hub) SHOULD have:

A WebID profile entry for:
- The location of the LDN Inbox
- The location of the Artefact Lifecycle Event Log
- The WebId of the Orchestrator

A Data Pod MAY have:

An LDN Inbox for the Orchestrator with:
- acl:Read access for the Orchestrator.
An LDP Container with one or more Policy rules.

The latter requirements are for use cases where the Data Pod (or Service Hub) shares resources with its Orchestrator.

All actors in the network have a WebId with a profile document that document the web locations of the resources they manage.

An example WebId profile of a Data Pod maintainer can be:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix as: <http://www.w3.org/ns/activitystreams#>.
@prefix ldp: <http://www.w3.org/ns/ldp#>.
@prefix ex: <https://www.example.org/>.

<>
    a foaf:PersonalProfileDocument;
    foaf:maker <https://alice.institution.org/profile/card#me>;
    foaf:primaryTopic <https://alice.institution.org/profile/card#me>.

<https://alice.institution.org/profile/card#me>
    foaf:name "Alice";
    ldp:inbox <https://alice.institution.org/inbox/>;
    as:outbox <https://alice.institution.org/lifecycle_events/>;
    ex:orchestrator <https://my.institution.org/orchestrator/profile/card.ttl#me>;
    ex:policies <https://alice.institution.org/policies/>;
    a foaf:Person.

The WebId profile of the Orchestrator for the Data Pod above can look like:

@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix solid: <http://www.w3.org/ns/solid/terms#>.
@prefix as: <http://www.w3.org/ns/activitystreams#>.
@prefix ldp: <http://www.w3.org/ns/ldp#>.
@prefix ex: <https://www.example.org/>.

<>
    a foaf:PersonalProfileDocument;
    foaf:maker <https://my.institution.org/orchestrator/profile/card.ttl#me>;
    foaf:primaryTopic <https://my.institution.org/orchestrator/profile/card.ttl#me>.

<https://my.institution.org/orchestrator/profile/card.ttl#me>
    foaf:name "Alice’s Orchestrator";
    ldp:inbox <https://alice.institution.org/orchestrator/inbox/>;
    a foaf:Service.

In the two examples above we have specified:

An LDN Inbox for the Data Pod maintainer Alice at: https://alice.institution.org/inbox/.
An Artefact Lifecycle Event Log for Alice at: https://alice.institution.org/lifecycle_events/.
An Orchestrator for Alice defined by: https://my.institution.org/orchestrator/profile/card.ttl#me.
Policies that Alice maintains at: https://alice.institution.org/policies/.

For the Orchestrator of Alice we specified:

An LDN Inbox on the Data Pod of Alice at: https://alice.institution.org/orchestrator/inbox/.

Note: In a decentralized network the inbox of the Orchestrator can be at any network location and doesn’t need to live in the Data Pod of Alice.

For the Service Hub a similar WebId profile can be created. The decentralized location of all the Inbox-s , Artefact Lifecycle Event Log-s, Orchestrator-s and Policy documents can be discovered by knowing the WebId of all the actors in the network.

An Orchestrator MAY expose an initialization interface to assist the Data Pod Maintainer in setting up the required resources, WebId profile and Web Access Control settings.

6. Rulebook

To execute business logic for handling notifications, the Orchestrator makes use of one or more policy documents that are written by the actors in the network. These policies should be written in a policy language that the Orchestrator understands. In the Rule language examples are provided how policies can be written in a Rule language with possible implementations using SHACL, SPARQL or Notation3 .

Each policy dictates what should happen when trigger occurs on behalf of the maintainer of a Data Pod or Service Hub. The Rule language provides an implementation neutral way to express possible policy rules.

In the example below we send a notification to Bob when Alice creates a new artefact.

rule "Notify Bob about new created artefacts"

when

   ?notification a as:Create .

then

   ?notification as:target <http://bob.institution.org/profile/card#me> .

   [ a fno:Execution ;
     fno:executed ex:sendNotification
        ex:notification ?notification
   ] .

When Alice provides this policy document to her Orchestrator, then each time when the Orchestrator is triggered with an as:Create notification , this policy will be activated and Bob will receive a copy of the notification.

In practise, policy documents can originate from many source and are a composition of procedures imposed by:

personal preferences (ie. defined the maintainer);
service preferences (ie. the orchestrator works on behalf of a Service Hub);
institutional requirements (ie. the employer of the maintainer and the owner of the artefacts);
domain rules (ie. the broader collaboration context the institution is situated in);
legislation (ie. the legal obligations);

6.1. Publication of Policies

Policy documents are published as resources in a Data Pod or Service Hub LDP Container. This container can be under the control of the Orchestrator or a Data Pod/Service Hub maintainer. The WebID profile document of maintainer SHOULD contain the location of the policies that should be made available to the orchestrator.

In the Example 1 above, the policy documents of Alice were made available at the location https://alice.institution.org/policies/.

The Orchestrator MUST have acl:Read permissions in order to execute the business logic specified in these documents.

When a trigger occurs the Orchestrator will consult all the policies in the supplied locations and follows the instructions.

The Orchestrator MAY offer only a limited number of policy execution types to the maintainer.

When a policy can’t be executed due to errors, the Orchestrator SHOULD send a notification to the maintainer about this fact.

7. Triggers

A trigger is an event to which an orchestrator can respond by taking actions. An orchestrator MUST respond to the following four types of triggers:

a new incoming Linked Data Notification [LDN] in the inbox of the Data Pod
- This inbox should be appendable by the whole network
a new incoming Linked Data Notification [LDN] in the inbox of the Orchestrator
- This inbox can be private to the Maintainer and Orchestrator
an observed state changes to watched data pod resources
a scheduled trigger from the internal time-based event scheduler

A trigger MUST be identifyable by a [URI], such that the rules written in the [spec-rulelanguage] can refer to its occurence.

An Orchestrator MUST be a compliant [LDN] Consumer. The Orchestrator MAY advertise multiple inboxes. In this case, the Orchestrator MUST retrieve incoming Linked Data Notifications from all advertised inboxes. Inbox security is discussed in the security considerations sections.

An Orchestrator SHOULD be able to differentiate between Linked Data Notification sent by the Maintainer of the Data pod, a Service Hub and by the Orchestrator itself. This should prevent sending the Orchestrator in a loop responding to its own triggers.

An Orchestrator SHOULD have some form data validation of incoming triggers and only respond to triggers that correspond to a RDF data shape. Technologies such as SHACL, https://shex.io/), [SPARQL CONSTRUCT or Notation3 can be used to validate triggers.

An Orchestrator MAY have some mechanism in place to validate the origin of the triggers. Technologies such as Linked Data Proofs can be used to sign all Linked Data Notifications that are sent over the network, and are validated by the Orchestrator.

An Orchestrator SHOULD be able to process any notification described in the List of Notifications.

An Orchestrator SHOULD be able to execute one or more Policy documents that are defined by the Maintainer. An example how Policy documents can be written is available in the Rule language for decentralized business processes.

Example of a Create trigger using Linked Data Notifications:

POST /orchestrator/inbox HTTP/1.1
Host: alice.institution.org 
Content-Type: application/ld+json;profile="https://www.w3.org/ns/activitystreams"
Content-Language: en
{
    "@context": "https://www.w3.org/ns/activitystreams",
    "id": "urn:uuid:AD02A16E-2F5C-408E-8A4D-D596C6421969",
    "type": "Create",
    "summary": "Alice created an artefact",
    "actor": {
      "id": "https://alice.institution.org/profiles/card#me",
      "type": ["Person"],
      "inbox": "https://alice.institution.org/inbox",
      "name": "Alice"   
    },
    "origin": {
      "id": "https://acme.net/shinyapps/DashBoard123",
      "type": "Application",
      "name": "Dashboard of Alice" 
    },
    "object": "http://alice.institution.org/artefacts/1",
    "published": "2014-09-30T12:34:56Z"
}

Example of a Policy to forward all Create events from Alice to Bob using a Policy rule:

rule "Forward to Bob"

as:   <https://www.w3.org/ns/activitystreams#>
pol:  <https://www.example.org/ns/policy#>
fno:  <https://w3id.org/function/ontology#>
ex:   <https://www.example.org/>
alice:        <https://alice.institution.org/profiles/card#me>
bob:          <https://bob.institution.org/profiles/card#me>
orchestrator: <https://instutition.org/orchestrator/profile/card#me>

when

 ?notification a as:Create  
 ?notification as:actor alice:

then

 ?notifiction as:target bob:

 [ pol:policy  [
      a fno:Execution ;
      fno:executes ex:sendTarget [
        ex:notification ?notification
      ]
   ]
 ]

7.1. Observing resource state changes

Note: PHOCHSTE: This needs some more work to explain how this could work.

An Orchestrator MAY accept resource stage changes of Linked Data Platform Resources as triggers for policy actions. In this case, the Orchestrator MUST be granted READ-access all observed Linked Data Platform Resources. In the case that an observed Linked Data Platform Resources is also a Linked Data Platform Container, the Orchestrator SHOULD observe state changes for all Linked Data Platform Resources that are contained by the observed Linked Data Platform Container. Issue: define something like a trigger description?

7.2. Receving Linked Data Notifications

Sending a Linked Data Notification is the primary way to provoke action from the Orchestrator. Common senders of notifications are:

a maintainer by using the Dashboard, who performed a manual operation on an artefact (eg. creating a new artefact) and wants to trigger consecutive action (eg. announcing that artefact);
a service hub or other actor who has performed an operation related to an artefact (eg. created a comment about that artefact) stored in the maintainer’s data pod and is therefore of potential interest to the orchestrator.

To be able to read the notifications from an inbox, An orchestrator MUST be a compliant Linked Data Notifications § consumer. It MAY watch zero or more advertise inboxes, as mentioned in § 5 Data Pod Initialization and MUST retrieve incoming Linked Data Notifications from all advertised inboxes. Inboxes MAY be authenticated according to Linked Data Notifications § authenticated-inboxes, which is discussed further in § 10 Security considerations.

The definitions of all possible [LDN] notifications using the [ACTIVITYSTREAMS-VOCABULARY] are listed in the [spec-notifications]. An orchestrator MUST at least support the following subset:

What notifications do NOT trigger the orchestrator?

When receiving a notification, the orchestrator validates the received notification to all notification-based triggers mentioned in its policy. Technologies such as [SHACL] and SHEX can be used to validate notifications.

7.3. Scheduled trigger

Note: PHOCHSTE: This needs some more work to explain how this could work.

An Orchestrator MAY accept time scheduled triggers for policy actions.

7.4. Observing resource state changes

An orchestrator can also watch LDP resources ([Linked Data Platform 1.0 § ldpr) (e.g, by means of polling) whose state changes issue a trigger. Hence, an orchestrator MUST accept resource stage changes of Linked Data Platform 1.0 § ldpr as triggers for rulebook actions. In case the observed [Linked Data Platform 1.0 § ldpr is also a LDP container (Linked Data Platform 1.0 § ldpc), the orchestrator MUST observe state changes for all Linked Data Platform 1.0 § ldpr that are contained by the observed Linked Data Platform 1.0 § ldpc. The Orchestrator MUST request acl:Read access all observed [Linked Data Platform 1.0 § ldpr as noted in § 5 Data Pod Initialization.

At least following state changes MUST issue a trigger:

Update of a resource by observing a change in the Last-Modified or ETag headers.
Deletion of a resource by observing a 4XX status code

Make resource state changes more concrete: how exactly using HTTP, eg. last modified

In case of an Linked Data Platform 1.0 § ldpc, the creation or deletion of a container member MUST also issue a trigger. Thus, the orchestrator SHOULD observe a difference in the set of ?resource bindings by matching the triple pattern ?container ldp:contains ?resource on the container’s response.

7.5. Scheduled trigger

Finally, some triggers might be configured as recurrent and activate the orchestrator on scheduled intervals. Therefore, an Orchestrator MUST accept triggers from a time-based job scheduler such as [cron].

Scheduled trigger can be configured using the crontab; an orchestrator MUST be able to interpret the pattern syntax defined in [cron]. A trigger MUST invoke an action on every matching pattern during the time the trigger is active and the orchestrator is running.

# Issue trigger every weekday morning at 3:15 am

ex:trigger ex:pattern "15 3 * * 1-5"^^ex:crontab

How can you communicate a scheduled trigger from a institutional perspective? for instance, researchers all apply the institutional

8. Actions

An action is a form of interaction with other actors or resources in the network. An Orchestrator performs such actions on behalf of a network actor.

There are three types of actions that an orchestrator MUST support:

sending Linked Data Notifications [LDN] to an inbox resource, likely belonging to a Service Hub or Data Pod.
manipulating [LDP] resources of a Data Pod.
reading arbitrary [HTTP11] resources

8.1. Sending Linked Data notifications

Sending a Linked Data Notification is the primary way to provoke action from other actors in the network. Hence, an orchestrator MUST be a compliant Linked Data Notifications § sender.

From the list of possible [LDN] notifications in [spec-notifications], an orchestrator MUST at least be able to send the following subset:

8.2. Reading HTTP resources

A second type of action is reading [HTTP11] resources. Thus, an orchestrator MUST be able to construct a GET request.

does this make sense?

8.3. Manipulating LDP resources

A final action is performing create, read, update and delete operations on a Linked Data Platform 1.0 § ldpr and Linked Data Platform 1.0 § ldpc Therefore, an Orchestrator MUST be a Linked Data Platform 1.0 § dfn-ldp-client and implement at least the verbs PUT, PATCH and DELETE.

TODO

Read: GET

8.4. Action descriptions

When a rulebook rule executed in response to a trigger, it produces zero or more actions. Each of these actions is captured in an action description, which uses a simple vocabulary:

Class: act:Action
Subclasses: act:NotifyAction | act:HTTPAction
Properties: act:payload | act:target | act:description

This vocabulary MUST be interpretable by the orchestrator and MUST result in an executed action. The specific requirements are discussed per action type below.

{
  "@context": "https://mellonscholarlycommunication.github.io/vocabulary/act/context",
  "type": "NotifyAction",
  "description": "Notify service hub of artefact creation.",
  "target": "https://servicehub.org/inbox",
  "payload": {
	  "@type": "as:Create",
	  ...
  }
}

{
  "@context": "https://mellonscholarlycommunication.github.io/context",
  "type": "HttpAction",
  "description": "Notify service hub of artefact creation.",
  "target": "https://servicehub.org/resource",
  "payload": {
	  "@type": "http:Request", 
	  "http:methodName": "POST",
	  ...
  }
}

9. Deploying an orchestrator

An Orchestrator MUST be deployable as a local background process or as a remote web service. In case of the latter, an actor SHOULD be able to spawn, initialize and trigger the instance over [HTTP11], as defined in § 9 Deploying an orchestrator and § 7 Triggers. The Orchestrator MAY also serve an inbox for communicating with third-parties using Linked Data Notifications [LDN] .

Example of spawning an orchestrator using Linked Data Notifications:

POST /inbox HTTP/1.1
Host: example.org
Content-Type: application/ld+json;profile="https://www.w3.org/ns/activitystreams"
Content-Language: en

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "summary": "Spawn orchestrator",
  "type": "Create",
  "actor": "http://kb.nl#me",
  "object": "http://example.org/orchestrator/1"
}

If deployed as a local background process, an (custom) API MUST be present that is able perform these actions.

10. Security considerations

10.1. Authenticated Inboxes

In case the Orchestrator supports triggers from incoming Linked Data Notifications, the Orchestrator SHOULD make use of authenticated inboxes as described by the Linked Data Notifications specification. Requiring authentication on the pod inbox can prevent unwanted parties from forging notifications to be processed by the Orchestrator.

10.2. Signed notifications

Instead of requiring authentication to post notifications to the pod inbox, the Orchestrator may require notifications to be signed by the sender before accepting notifications. There was an upcoming panel on signed notifications - TODO

Appendix A: Implementation details

Retrieving inbox notifications

Observing LDP resource state updates

Time based trigger implementations

11. Acknowledgement

We thank Herbert Van de Sompel, DANS + Ghent University, hvdsomp@gmail.com for the valuable input during this project.