12/18/2019

Coordinated Access to Data & Resources : the Linked Data Platform

The EJP RD Virtual Platform (VP) will use a specific ontology (EJPO) and metadata model. The ontology is designed to represent the core domain concepts used to describe data elements in rare disease patient registries, biobanks and also catalogs of registries and biobanks. The ontology provides standard vocabulary terms that can be used by individual registries to describe their metadata elements, or to map to their existing elements to a shared model. These semantic annotations will be used by the EJP RD Virtual Platform to harmonise metadata from the various resources (catalogs, registries and biobanks) and provide a unified semantics for accessing and processing data.

Based on the metadata model, we set up a first “Linked Data Platform” that allows semantic queries accordingly to this model.
A Linked Data Platform (LDP) is a structured data specification defining a set of integration patterns for building RESTful HTTP services that are capable of read-write of RDF data.

The LDP allows use of RESTful HTTP API to consume, create, update and delete RDF  (Resource Description Framework) resources.

The selected catalogs for this task were the Orphanet catalog of registries (https://www.orpha.net/consor/cgi-bin/index.php), RD-connect Biobank and registry finder (https://rd-connect.eu/what-we-do/phenotypic-data/rb-finder-for-registries/), the RD-connect Sample catalogue (https://samples.rd-connect.eu) and the ERDRI directory of registries (https://eu-rd-platform.jrc.ec.europa.eu/erdridor/).


Fig 1: Catalogs processing

The “Linked Data Platform” is a “back-end” component of the EJP Virtual Platform, which is used as a “semantic” API (Application Program Interface). It’s “machine friendly”. End-users are not supposed to manage coding queries.

An end-users “front-end” web interface will be able to use this service.


Fig 2: Linked Data Platform

SPARQL (a recursive acronym for SPARQL Protocol and RDF Query Language) is an RDF query language to retrieve and manipulate data stored in Resource Description Framework (RDF) format. SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns

This section provides examples of SPARQL queries interrogating the information currently present in the Linked Data Platform (LDP)[1]. Please note that the development of additional queries will be possible following the planned extension of the Linked Data Platform’s dataset  (e.g. number of patients in a registry diagnosed with a particular disease) to be integrated in a future update of the LDP.

[1]Linked Data Platform (LDP):  http://ejprd.fair-dtls.surf-hosted.nl:8890/sparql

Results of SPARQL queries are “machine readable” and usable by a Front-End web interface. In order to ease the “human readability” in the examples below, we decided to display results in simple HTML arrays.
Each query is sent to the Linked Data Platform. Click on the code, it will open a new tab.
You can also copy and past the query directly in the SPARQL Endpoint interface :
http://ejprd.fair-dtls.surf-hosted.nl:8890/sparql/

The following SPARQL query will return a list of all patient registries or biobanks present in the LDP, displaying the “Titles”:

prefix ejp:<http://purl.org/ejp-rd/vocabulary/>
prefix dct:<http://purl.org/dc/terms/>
select ?ressources ?titles where
{
{?ressources a ejp:PatientRegistryDataset}
UNION {?ressources a ejp:BiobankDataset} ?ressources dct:title ?titles}

The following SPARQL query will return a list of patient registries or biobanks located in a particular country (e.g. ITALY):
prefix ejp:<http://purl.org/ejp-rd/vocabulary/>
prefix dcat:<http://www.w3.org/ns/dcat#>
prefix ordo:<http://www.orpha.net/ORDO/>
prefix dct:<http://purl.org/dc/terms/>
select ?ressources ?titles where {
{?ressources dct:publisher ?organization. ?organization dct:spatial ?location. ?location dcat:country-name "Italy"}
UNION
{?ressources dct:publisher ?organization. ?organization dct:spatial ?location. ?location ejp:country "ITALY"}. ?ressources dct:title ?titles
}

The following SPARQL query will return a list of patient registries or biobanks located in a particular country and linked to a specific disease (e.g. Italian registries and biobanks linked to Cystic fibrosis):
prefix ejp:<http://purl.org/ejp-rd/vocabulary/>
prefix dcat:<http://www.w3.org/ns/dcat#>
prefix ordo:<http://www.orpha.net/ORDO/>
prefix dct:<http://purl.org/dc/terms/>
select ?ressources ?titles
where {?ressources dcat:theme ordo:Orphanet_586.
{?ressources dct:publisher ?organization. ?organization dct:spatial ?location.
?location dcat:country-name "Italy"}
UNION {?ressources dct:publisher ?organization.
?organization dct:spatial ?location. ?location ejp:country "ITALY"}. ?ressources dct:title ?titles }

We also provide as a “sand box” or users Front-End example a “Demonstrator” which use the LDP.
http://purl.org/ejp-rd/fair-metadata-demo
nb: This link is provided in a beta-phase. It’s hosted in a development environment.
Thus, this is not the VirtualPlatform end-users entry point.