|
|
This page contains some examples of typical usage patterns that can be defined using the B2SAFE service rule sets.
|
|
|
|
|
|
#### Acronyms:
|
|
|
|
|
|
**RoR**: Repository of Records, <span class="st">the _repository_ where data was stored first</span>.
|
|
|
**PID**: Peristent identifier associated to a digital object or to a whole collection.
|
|
|
**PPID**: Parent PID, the persistent identifier associated to the source object in a replication chain. If the chain has only two elements, the master copy and the first replica, then the PPID = RoR.
|
|
|
|
|
|
***
|
|
|
|
|
|
### Register a PID for a DO
|
|
|
|
|
|
Client side orchestration of the process:
|
|
|
|
|
|
![create PID](https://raw.githubusercontent.com/wiki/EUDAT-B2SAFE/B2SAFE-core/b2safe_flows_create-PID1.png)
|
|
|
|
|
|
Server side orchestration of the process:
|
|
|
|
|
|
![create PID 2](https://raw.githubusercontent.com/wiki/EUDAT-B2SAFE/B2SAFE-core/b2safe_flows_create-PID2_v2.png)
|
|
|
|
|
|
The boxes with red borders and labels represent new patterns which rely on b2safe metadata stored on json files.
|
|
|
|
|
|
**System metadata**
|
|
|
|
|
|
data_name: data-object.dat
|
|
|
data_id: 20449
|
|
|
coll_id: 19547
|
|
|
data_repl_num: 0
|
|
|
data_version:
|
|
|
data_type_name: generic
|
|
|
data_size: 2151618
|
|
|
resc_group_name:
|
|
|
resc_name: cinecaRes1
|
|
|
data_path : /phisycal/path/to/data-object.dat
|
|
|
data_owner_name: proirod1
|
|
|
data_owner_zone: CINECA01
|
|
|
data_repl_status: 1
|
|
|
data_status:
|
|
|
data_checksum : 076aad9622fa3118f006927f05222817
|
|
|
data_expiry_ts (expire time): : None
|
|
|
data_map_id: 0
|
|
|
r_comment:
|
|
|
create_ts: 01430831283: 2015-05-05.15:08:03
|
|
|
modify_ts: 01430831283: 2015-05-05.15:08:03
|
|
|
|
|
|
**B2SAFE metadata**
|
|
|
|
|
|
attribute: eudat_dpm_checksum_date:cinecaRes1
|
|
|
value: 01431422755
|
|
|
units:
|
|
|
----
|
|
|
attribute: PID
|
|
|
value: 842/f5188714-f8b8-11e4-a506-fa163e62896a
|
|
|
units:
|
|
|
|
|
|
**PID record metadata**
|
|
|
|
|
|
URL irods://130.186.13.14:1247/cinecaDMPZone/home/claudio/datum.txt
|
|
|
10320/LOC <locations><location href="irods://hostname:1247/Zone/home/claudio/datum.txt" id="0"/></locations>
|
|
|
CHECKSUM sha2:nmDjK/7k1D5jjMUFoWHjX5qZmke9vpQbR6FaY9sk6eI=
|
|
|
|
|
|
**client side rule: pid creation**
|
|
|
|
|
|
PID_DO_reg {
|
|
|
*iCATCache = bool("true");
|
|
|
EUDATCreatePID(*parent_pid, *source, *ror, *iCATCache, *newPID);
|
|
|
writeLine("stdout","PID: *newPID");
|
|
|
}
|
|
|
INPUT *source="/My collection of registered data/datum.txt",*parent_pid="None",*ror="None"
|
|
|
OUTPUT ruleExecOut
|
|
|
|
|
|
It is possible to get back some basic metadata even with file system only browser clients like those supporting the protocols WebDAV and GridFTP. In fact the following json file is stored within the special collection ".metadata" for each registered object:
|
|
|
{
|
|
|
"**checksum**": "[sha2:nmDjK/7k1D5jjMUFoWHjX5qZmke9vpQbR6FaY9sk6eI=](http://sha2nmDjK)",
|
|
|
"**ror**": "None",
|
|
|
"**pid**": "842/6cff9eb8-47ef-11e5-a889-fa163e62896a",
|
|
|
"**checksum_timestamp**": "01440065311"
|
|
|
}
|
|
|
|
|
|
The json file is stored with the following path:
|
|
|
|
|
|
**<span style="color: rgb(128,0,0);">/path/to/data set/</span>.metadata/** _object-name__metadata.json
|
|
|
|
|
|
|
|
|
***
|
|
|
|
|
|
### Register PIDs for a whole collection recursively
|
|
|
|
|
|
The flow is the same as depicted above for a single digital object, however there is no way to get back the list of PIDs yet.
|
|
|
|
|
|
### Replication
|
|
|
|
|
|
Replication process triggered client-side without PID registration ( **registered data boolean flag** = False):
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_replication_v2.png"></ri:attachment></ac:image>
|
|
|
|
|
|
The red box with the label "Messaging system" is an experimental feature to provide back the results in case of asynchronous (server side triggered) process. The messages are posted to a queue which can be accessed via HTTP interface. The queue is provided by dweet.io.
|
|
|
|
|
|
|
|
|
Replication process triggered client-side with PID registration ( **registered data boolean flag** = True):
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_registered-replication_v2.png"></ri:attachment></ac:image>
|
|
|
|
|
|
|
|
|
|
|
|
The **registered data boolean flag** is an input parameter which can be True or False. If True, the replication mechanism assumes that the data have PIDs. If it is not the case, the procedure creates and registers automatically new PIDs for the data.
|
|
|
|
|
|
The **recursive boolean flag** is considered by the mechanism only when the **registered data boolean flag** is True. It can be True or False. In the first case the PID registration process is applied to each object and sub-collection under the root collection, otherwise only the root collection will be taken into account for PID registration, while all the objects and sub-collections will be replicated anyway.
|
|
|
|
|
|
The **status** output parameter is True if the replication has been successful, False otherwise.
|
|
|
|
|
|
The **response** output parameter contains the error message in case of failure.
|
|
|
|
|
|
Error messages
|
|
|
|
|
|
* no access permission (on the source)
|
|
|
* empty PID (of the source)
|
|
|
* empty source's PID
|
|
|
* empty destination's PID
|
|
|
* missing replicated object
|
|
|
* different size (between source and destination)
|
|
|
* different checksum (between source and destination)
|
|
|
|
|
|
|
|
|
|
|
|
### Replication with asynchronous PID registration
|
|
|
|
|
|
In this case the replication is perfomed with **registered data boolean flag = False**. And the PID registration is triggered in a second step.
|
|
|
|
|
|
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_registration.png"></ri:attachment></ac:image>
|
|
|
|
|
|
|
|
|
|
|
|
### Integrity check between a DO and its replica
|
|
|
|
|
|
This procedure allows to verify the coherence between a digital object and its replica.
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_integrity-check.png"></ri:attachment></ac:image>
|
|
|
|
|
|
The boolean input parameter **log enabled** allows to enable the logging of failed checks. In this way it is possible to manage the inconsistencies both in a synchronous way (**log enabled**=False) and in a asynchronous way (**log enabled**=True). For the second option see the pattern "Recover failed transfers from the logging system's queue", while the first option implies to couple the integrity check procedure with another one like "Replication".
|
|
|
|
|
|
Error messages
|
|
|
|
|
|
* missing replicated object
|
|
|
* different size (between source and destination)
|
|
|
* different checksum (between source and destination)
|
|
|
|
|
|
|
|
|
### Recover failed transfers from the logging system's queue
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_failrecover_v2.png"></ri:attachment></ac:image>
|
|
|
|
|
|
The input parameter **buffer length** defines the number of failed operations that are processed by the rule. They are the last operations logged in the queue. If the parameter is > of the queue length, the process stops after the last operation logged in the queue.
|
|
|
|
|
|
|
|
|
### Update URL field in the PID record
|
|
|
|
|
|
<ac:image ac:width="700"><ri:attachment ri:filename="b2safe_flows_url-update.png"></ri:attachment></ac:image> |