Commit 45c0c834 authored by Michele Carpene's avatar Michele Carpene
Browse files

Merge branch 'devel' into 'master'

B2SHARE-B2SAFE-integration.py

See merge request !12
parents 6923466d 44e8e80d
Pipeline #782 failed with stages
in 1 minute and 47 seconds
Installation/Configuration
===========================
The following describes the procedure to enable the B2SHARE publication option with B2SAFE release-4.x.y. The connection between B2SAFE and B2SHARE enables a B2SAFE user to trigger a set of IRods rules that call a python script (b2shareclientCLI) that connects to the HTTP API of b2share and performs the necesary acction to publis a document or collection.
INSTALLATION
------------
The B2SHARE connection component is an extension of the B2SAFE core package (https://gitlab.eudat.eu/b2safe/B2SAFE-core), so following software are expected to be in place before installation:
iRODS
B2HANDLE or msi-persistent
B2SAFE
python 3
NOTE: B2HANDLE can be found at: https://github.eudat.eu/b2safe/B2HANDLE
NOTE: iRODS is running as a normal user process. NOT as root. The package can be build by any user. During installation of the package it will use: "/etc/irods/service_account.config" to set the ownership of the files.
To install the B2SHARE-connect component you have to do following steps:
- clone the b2safe-b2share-connect project as any user. NOT root
```
$ git clone https://gitlab.eudat.eu/b2safe/b2safe-b2share-connect.git
```
- add following scripts to \<your path to B2SAFE\>/B2SAFE-core/cmd:
* b2shareclient.py
* b2shareclientCLI.py
* configuration.py
* create_md_file.py
* irodsUtility.py
- add the rule “b2share.re” from B2SAFE-core/rulebase folder to the "rulebase" folder of your iRODS instance \<your path to B2SAFE\>/B2SAFE-core/rulebase
- add to your configuration folder \<your path to B2SAFE\>/B2SAFE-core/conf and modify the configuration file “b2share_client_example.json” according to your environment as described in "Configuration" section of this wiki page and rename it to “b2share_client.json”
- check for missing python libraries trying to run the major scripts with -d (dry run) option
* b2shareclientCLI.py
* create_md_file.py
This are mostly: python-irodsclient jsonpatch requests configparser mock pytest a list in included in requisites.txt
```
# pip3 install python-irodsclient numpy mock configparser pytest jsonpatch numpy requests
```
Try to install missing packages with the standard package manager like pip (sudo pip3.6 install), apt, yum, zypper etc.
As described in the “Example Workflow” section, the iRODS rules will trigger these scripts according to the flags user specify in the imeta of the collection.
CONFIGURATION
-------------
There are two major scripts for the B2SHARE connection component b2shareclientCLI.py and create_md_file.py that are using the configuration stored in file “b2share_client.json”. The configuration has of 3 parts.
The first two sections `logging` and `b2share_http_api` are prefilled with default values.
* `logging` - with 2 values needed to be specified: log level, default is `"loglevel": "DEBUG"`, and the file where to safe the logging information, default value is `"logfile": "/opt/eudat/b2safe/log/b2share_connection_client.log"`, so in a file named b2share_connection_client.log in folder log under the installation path.
* `b2share_http_api` - with 1 value needed to be specified: host_name of the B2SHARE instance, default is the address of the training instance of B2SHARE `"host_name": "https://trng-b2share.eudat.eu/"`. Do not change the attribute `"access_parameter": "?access_token"` or any other attributes of `b2share_http_api`. It is a part of the string needed to build the URL for the B2SHARE HTTP API and need to be changed only if the B2SHARE HTTP API will change.
* `irods` - connection information of the iRODS instance like the name of the iRODS zone `"zone_name": "YOUR_ZONE"` and `"irods_env": "/home/irods/.irods/irods_environment.json"`.
```
{
"configurations": {
"b2share_http_api": {
"host_name": "https://trng-b2share.eudat.eu/",
"access_parameter": "?access_token",
"list_communities_endpoint": "api/communities/",
"get_community_schema_endpoint": "/schemas/last",
"records_endpoint": "api/records/"
},
"irods": {
"zone_name": "YOUR_ZONE",
"irods_env": "/home/irods/.irods/irods_environment.json",
"resources": "",
"irods_home_dir": "",
"irods_debug": ""
},
"logging": {
"loglevel": "DEBUG",
"logfile": "/opt/eudat/b2safe/log/b2share_connection_client.log"
}
}
}
```
Example Workflow
----------------
Following workflow was considered during the component development [see Figure, use_cases_publish_to_b2share.png ](use_cases_publish_to_b2share.png) :
1. The user is registered at B2SHARE and has an B2SHARE access token. The user provides the b2share token to the B2SAFE administrator, who adds it to the user metadata:
```
`imeta add -u <irods_user_name> access_token <token_value>`
```
2. The user adds a specific metadata attributes to the collection, that will be used to create a draft:
```
`imeta add -C \<collection_X\> EUDAT_B2SHARE_PUBLISH EUDAT` with community name as the value, to publish the collection unter
`imeta add -C \<collection_X\> EUDAT_B2SHARE_TITLE Some_Title` and the title of the publication.
```
The comand
```
`imeta ls -C \<collection_X\>`
```
will then deliver:
```
AVUs defined for collection collection_X:
attribute: EUDAT_B2SHARE_PUBLISH
value: EUDAT
units:
\----
attribute: EUDAT_B2SHARE_TITLE
value: Some_Title
units:
```
3. As preparation the user can create a meta data file **b2share_metadata.json** in the collection.
The helper script **create_md_file.py** creates a sceleton file according to the B2SHARE schema of the community.
4. The B2SAFE administrator executes a rule to scan the B2SAFE repository for that meta data tag or configure a cron job or a script to do so.
* The workflow the rule executes will then create a draft in B2SHARE with a list of files the collection contains. (see worflow diagram below)
* Then it will add a meta data to the draft if the collection is containing the file **b2share_metadata.json** with describing meta data. For this it will compare the meta data in B2SHARE and that in the file and override the ones that user has added online in B2SHARE.
* Finaly it will publish the draft and "freez" the collection in B2SAFE by taking away the users access rights for the collection and giving the rights to the B2SAFE administrator.
5. The user has then the possibility to fill out the meta data online in the B2SHARE in the published record. The record id will be add to the imeta (EUDAT_B2SHARE_PUBLISHED_ID) of the collection and the record in B2SHARE will be accessable to the user as it will be published with his access token.
Following assumptions had been made:
* there is just one owner for each collection.
* the drafts are registered in B2SHARE just one time, even if the rule to scan for draft is executed multiple times.
* the collections are copied and published just one time, even if the rule to publish them is executed multiple times.
B2SAFE
===========
B2SAFE service code for EUDAT project.
# EUDAT B2SAFE
It is released under BSD license.
#### Table of Contents
The EUDAT (http://www.eudat.eu) B2SAFE Service offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID (Persistent IDentifier) records, each managed in separate administrative domains. The B2SAFE Service is implemented as an iRODS (http://www.irods.org) module providing a set of iRODS rules or policies to interface with the EPIC (http://www.pidconsortium.eu) handle API and uses the iRODS middleware to replicate datasets from a source data (or community) centre to a destination data centre.
1. [B2SAFE Description - Why B2SAFE](#module-description)
2. [PREREQUISITES - How to install IRODS and B2HANDLE](#prerequisites)
3. [Install/Deployment - How to install the B2SAFE](#install)
4. [B2SHARE - How to connect B2SHARE with B2SAFE](#b2share)
5. [Documentation - Documentation](#documentation)
6. [Testing - Instructions to test the code](#testing)
---------------
Deployment
---------------
## Module Description
This repository provides the **B2SAFE** service code from the EUDAT project. B2SAFE is released under BSD license.
The [EUDAT](http://www.eudat.eu) B2SAFE Service offers functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required to easily find and query information about the replica locations. The information about the replica locations and other important information is stored in PID (Persistent IDentifier) records, each managed in separate administrative domains.
The **B2SAFE** Service is implemented as an [iRODS](http://www.irods.org) module providing a set of iRODS rules or policies to interface with the [EPIC](http://www.pidconsortium.eu) handle API and uses the iRODS middleware to replicate datasets from a source data (or community) centre to a destination data centre.
The documetation can be found in the [B2SAFE-wiki](https://gitlab.eudat.ei/b2safe/B2SAFE-core/-/wikis)
Known issues can be found in [https://github.com/EUDAT-B2SAFE/B2SAFE-core/wiki/Known-issues](https://github.com/EUDAT-B2SAFE/B2SAFE-core/wiki/Known-issues)
## Prerequisites
IRODS needs to be installed and configured before intalling or upgrading B2SAFE
## Install
Installtion instruction ca be foun in:
* [Deployment on Centos 7, see install_centos7.md](install_centos7.md)
* [Deployment on other systems, see install_other.txt](install_other.txt)
---------------
Documentation
---------------
1. on your **SAFE** server go to the irods user home directory
2. git clone this repository
```
git clone .git B2SAFE-core
```
## B2share
Information about b2share can be found in https://eudat.eu/services/userdoc/the-b2share-http-rest-api
* [Deployment see B2SHARE_install.md](B2SHARE_install.md)
* Install the python packages listed in /opt/eudat/b2safe/cmd/requirements.txt
Create or update /opt/eudat/b2safe/conf/b2share_client.json with the parameters of your b2safe/irods intallation and with your access parameters for your b2share instance
## documentation
https://gitlab.eudat.eu/b2safe/B2SAFE-core/-/wikis/home
## Testing
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
###########################################################################
# B2SAFE B2SHARE client Command Line Interface #
###########################################################################
import argparse
import logging.handlers
import os
import pprint
import requests
import json
from b2shareclient import B2shareClient
from configuration import Configuration
# from irodsUtility import IRODSUtils
logger = logging.getLogger('B2shareClientCLI')
# the methods have return statement only because of unit tests
def draft(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
b2shcl = B2shareClient(configuration)
filePIDsList = collectPIDsForCollection(configuration)
if ']' == filePIDsList:
if configuration.dryrun:
filePIDsList = "[]"
verboseprint("ERROR: no files collected for draft.")
logger.error("ERROR: no files collected for draft.")
commID = getCommunityIDByName(configuration, verboseprint)
if not commID:
verboseprint("NO community_id found for draft creation.")
logger.error("NO community_id found for draft creation.")
recordId = None
if commID and (filePIDsList != ']'):
recordId = b2shcl.createDraft(commID, filePIDsList)
if recordId is not None:
verboseprint("Drafting for record "+recordId+" END.")
logger.info("Drafting for record "+recordId+" END.")
if configuration.irodsu:
configuration.irodsu.setMetadata(configuration.collection_path,
"EUDAT_B2SHARE_RECORD_ID",
recordId)
else:
verboseprint("Drafting FAILED.")
logger.error("Drafting FAILED.")
return recordId
def getCommunityIDByName(configuration, verboseprint):
community_id = None
if not configuration:
return None
community_name = configuration.community
if not community_name:
return None
host = configuration.b2share_host_name
endpoint = configuration.list_communities_endpoint
acces_part = None
list_communities_url = None
if configuration.access_parameter and configuration.access_token:
acces_part = configuration.access_parameter + "=" + \
configuration.access_token
if acces_part and host and endpoint:
list_communities_url = host + endpoint + acces_part
if list_communities_url:
try:
response = requests.get(url=list_communities_url)
verboseprint("getCommunityIDByName status code: " +
str(response.status_code))
logger.debug("getCommunityIDByName status code: " +
str(response.status_code))
if (str(response.status_code) == "200"):
communities_list = response.json()["hits"]["hits"]
for community_object in communities_list:
name = community_object["name"]
if community_name == name:
community_id = community_object["id"]
else:
verboseprint("NO community for name " + community_name +
" found: " + str(response.json()))
logger.error("NO community for name " + community_name +
" found: " + str(response.json()))
except requests.exceptions.RequestException as e:
logger.error(e)
return community_id
def getAllCommunities(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
verboseprint("configuration missing, abort")
return None
b2shcl = B2shareClient(configuration)
communities = b2shcl.getAllCommunities()
if communities:
logger.info("All available communities: \n "+str(communities)+" END.")
verboseprint("List of communities and their id's: \n" +
pprint.pformat(communities))
else:
verboseprint("get all communities FAILED")
logger.error("get all communities FAILED")
return communities
def collectPIDsForCollection(configuration):
PIDobjectsString = '['
res = configuration.irodsu.deepListDir(configuration.collection_path)
if not res:
return None
filePathsMap = None
if res:
filePathsMap = collectFilePathsFromTree(res)
if not filePathsMap:
return None
for filePath in filePathsMap.keys():
filePID = configuration.irodsu.getMetadata(filePath, "PID")
if filePID:
# filePath[1:] deletes leading / in a path
# as requested in issue #112 on GitHub
PIDobject = '{"key":"'+filePath[1:] + \
'",'+' "ePIC_PID":"'+filePID[0] + '"}'
PIDobjectsString = PIDobjectsString + PIDobject + ','
forLastElemIndex = len(PIDobjectsString) - 1 # delete last comma
PIDobjectsString = PIDobjectsString[:forLastElemIndex] + ']'
return PIDobjectsString
def collectFilePathsFromTree(filesTree):
filePaths = {}
for coll in filesTree:
for fp in filesTree[coll]['__files__']:
# loop over the files of the collection
filePaths[coll + os.sep + fp] = fp
if len(filesTree[coll]) > 1:
# there are also subdirs
del filesTree[coll]['__files__']
filemap = collectFilePathsFromTree(filesTree[coll])
# merge the map dictionaries
temp = filemap.copy()
temp.update(filePaths)
filePaths = temp
return filePaths
def addMetadata(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
if args.metadata_file_name:
metadata_file_path = configuration.collection_path + os.sep + \
args.metadata_file_name
else:
metadata_file_path = configuration.collection_path + os.sep + \
"b2share_metadata.json"
# verboseprint(metadata_file_path)
metadata_file = configuration.irodsu.getFile(metadata_file_path)
b2shcl = B2shareClient(configuration)
b2shcl.addB2shareMetadata(metadata_file)
verboseprint("Added metadata to draft: " + configuration.record_id)
logger.info("Added metadata" + configuration.record_id)
def compare(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
if args.coll_metadata_file_name:
metadata_file_path = configuration.collection_path + os.sep + \
args.coll_metadata_file_name
else:
metadata_file_path = configuration.collection_path + os.sep + \
"b2share_metadata.json"
metadata_file = configuration.irodsu.getFile(metadata_file_path)
b2shcl = B2shareClient(configuration)
b2shcl.compareMD(metadata_file)
verboseprint("Success. Compare END.")
logger.info("Compare END.")
def publish(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
b2shcl = B2shareClient(configuration)
response = b2shcl.publishRecord()
if response:
if str(response.status_code) == "200":
verboseprint("Publishing SUCCESSFUL.")
logger.info("Publishing SUCCESSFUL. " +
str(response.text))
if configuration.irodsu:
configuration.irodsu.setMetadata(configuration.collection_path,
"EUDAT_B2SHARE_PUBLISHED_ID",
configuration.record_id)
else:
verboseprint("Publishing FAILED")
logger.error("Publishing FAILED")
logger.info("Publishing END.")
def getAccessTokenWithConfigs(configuration):
# get access_token from users metadata in iRODS
if configuration.irodsu:
users_metadata = \
configuration.irodsu.getUserMetadata(configuration.user,
"access_token")
if users_metadata:
return users_metadata[0]
else:
return None
else:
return None
def getCommunitySchema(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
commID = getCommunityIDByName(configuration, verboseprint)
verboseprint("Get schema for community: " +
configuration.community + " with ID: " + commID)
b2shcl = B2shareClient(configuration)
schema = b2shcl.getCommunitySchema(commID)
verboseprint(str(schema))
verboseprint("Get Community Schema END.")
logger.info("Get Community Schema END.")
return schema
def getDraftByID(args):
logger.info("Get draft by ID ...")
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
b2shcl = B2shareClient(configuration)
verboseprint("Get draft by ID ...")
draft = b2shcl.getDraftByID(args.draft_id)
if draft:
verboseprint("Request for a draft with id " + draft + " SUCCESSFUL")
logger.info("Request for a draft with id " + draft + " SUCCESSFUL.")
logger.info("Get draft by ID END.")
return draft
def deleteDraft(args):
verboseprint = getVerbosePrintMethod(args)
configuration = getConfigs(verboseprint)
if not configuration:
return None
logger.info("DELETING DRAFT: " + args.draft_to_delete_id)
b2shcl = B2shareClient(configuration)
if not args.draft_to_delete_id:
b2shcl.deleteDraft(configuration.record_id)
verboseprint("DRAFT DELETED: " + configuration.record_id)
else:
b2shcl.deleteDraft(args.draft_to_delete_id)
verboseprint("DRAFT DELETED: " + args.draft_to_delete_id)
logger.info("DELETING DRAFT END.")
def getVerbosePrintMethod(args):
if args.verbose:
def verboseprint(*args):
for arg in args:
print(arg)
return
else:
def verboseprint(*args):
# do-nothing
return
return verboseprint
DEFAULT_CONFIG_FILENAME = "b2share_client.json"
def getConfigs(verboseprint):
config_path = None
if not args.confpath:
default_config_path = os.path.dirname(os.getcwd()) + os.sep +\
"conf" + os.sep
config_path = default_config_path + DEFAULT_CONFIG_FILENAME
else:
config_path = args.confpath
if not os.path.exists(config_path):
print('missing configuration file %s:' % (config_path))
return None
configuration = Configuration(logger)
# verboseprint("config path: " + str(config_path))
configuration.config_path = config_path
read_config_error = configuration.loadConfigurarionsFrom(config_path)
if read_config_error != "":
print(read_config_error)
return None
if args.dryrun:
configuration.dryrun = True
configuration.user = args.user
configuration.collection_path = args.collection_path
if 'title' in args:
configuration.title = args.title
if 'community' in args:
configuration.community = args.community
if 'community_name' in args:
configuration.community = args.community_name
if 'draft_id' in args:
configuration.record_id = args.draft_id
if 'publish_id' in args:
configuration.record_id = args.publish_id
if 'draft_to_delete_id' in args:
configuration.record_id = args.draft_to_delete_id
if 'draft_to_add_md' in args:
configuration.record_id = args.draft_to_add_md
if 'draft_id_mdcompare' in args:
configuration.record_id = args.draft_id_mdcompare
accessToken = getAccessTokenWithConfigs(configuration)
if accessToken is None:
print("\
No B2SHARE access token found in users meta data.")
logger.error("\
No B2SHARE access token found in users meta data.")
return None
configuration.access_token = accessToken
return configuration
if __name__ == "__main__":
parser = argparse.ArgumentParser(description='B2SAFE B2SHARE \
command line client')
parser.add_argument("--confpath",
help="path to the configuration file if not in default /path_to_b2safe/conf/b2share_client.json")
# parser.add_argument("--irodsenv",
# help="path to irods configuration")
parser.add_argument("-u", "--user", required=True,
help="irods user to get B2SHARE access token")
parser.add_argument("-p", "--collection_path", required=True,
help="irods path to the collection")
parser.add_argument("-d", "--dryrun", action="store_true",
help="run without performing any real change")
parser.add_argument("-v", "--verbose", action="store_true",
help="enable printouts for debug")
# Options for the normal publication workflow:
# create draft,
# add metadata to it,
# publish the draft.
# draft
subparsers = parser.add_subparsers(help='sub-command help', dest='subcmd')
parser_draft = subparsers.add_parser('draft',
help='create a draft in B2Share')
parser_draft.add_argument('-t', '--title', required=True,
help='title to publish the draft unter')
parser_draft.add_argument('-comm', '--community', required=True,
help='community to publish the draft unter')
parser_draft.set_defaults(func=draft)
# compare meta
parser_compare_meta = subparsers.add_parser('compare_meta', help='compare \
metadata of the draft \
and collection')
parser_compare_meta.add_argument('-mdn', '--coll_metadata_file_name',
help='file name of the collection\
describing metadata')
parser_compare_meta.add_argument('-idc', '--draft_id_mdcompare',
required=True,
help='the b2share id of the record')
parser_compare_meta.set_defaults(func=compare)
# extend meta data
parser_meta = subparsers.add_parser('meta',
help='add metadata to the draft')
parser_meta.add_argument('-md', '--metadata_file_name',
help='file name of the collection describing \
metadata')
parser_meta.add_argument('-rid', '--draft_to_add_md', required=True,
help='the b2share id of the record')
# publish
parser_meta.set_defaults(func=addMetadata)
parser_pub = subparsers.add_parser('publish', help='publish the draft')
parser_pub.add_argument('-pubid', '--publish_id', required=True,
help='the b2share id of the record')
parser_pub.set_defaults(func=publish)
# extra options for the user to get more information
# check or delete the draft in case there are errors in it