FAIRopoly – FAIRification Guidance for ERN Patient Registries

Introduction

The FAIRification board game aims to give a visual and succinct representation of the FAIRification steps followed by the ERNs registries to make their data more findable, accessible, interoperable and reusable. The player starts the game on the top left corner of the board and moves around clockwise one tile at a time. Scroll down for more information on each game tile. A detailed FAIRification guidance document is beyond the scope of this simplified visual representation. Such a guidance is currently being developed and will be provided in a near future.

Disclaimer:

  • Human resources are the most important part of the FAIRification process. Having a team with the right skillset will play an important role in achieving your FAIRification goals.
  • Essential roles for interdisciplinary ‘three-party collaboration’:
    • ERN registry steward (managing the registry)
    • Registry software provider (implements the software behind the registry observing FAIR principles)
    • EJP-RD FAIRification stewards (guidance regarding FAIR standards, technologies, and use-cases guiding the FAIRification steps)
  • Recommended additional roles:
    • Project manager (highly recommended, organiser role)
    • Part-time advisors for expertise on (i) the domain (e.g., a medical doctor), (ii) international FAIR standards (e.g., an outsourced senior FAIR expert), (iii) implementation of FAIR software.
    • Data scientist dedicated to exploiting the added value of FAIR, machine readable data.
  • The most important step in FAIRification of your rare disease registry is to identify the goals of FAIRification. Well-defined FAIRification goals will help guide the process and make it more efficient.
  • Goals can be short term or long term and might include, but are not limited to, the following examples:
    • Making the registry interoperable – For this you will need standardised and machine-readable data
    • Allow the identification of patient cohorts for clinical trials – This requires you to collect standardised data and be able to reuse it
    • Create a RD research ecosystem – This requires the data to be completely FAIR so that information can be queried at the source by the EJP RD Virtual Platform
Determining your FAIRification goal is a very important step since that will guide the process and determine the ‘success’ of the FAIRification process
The guiding FAIR principles are available at https://www.go-fair.org/fair-principles/
  • Identify what metadata is already being collected by your registry (e.g., provenance, creation date, file type and size, timestamp). The result of this step will support defining the metadata model of your registry. Also, check if your metadata is already being collected using standardized vocabularies.
  • Usually, terms from upper ontologies can be used to describe metadata. For example, use dcat:Dataset from Data Catalog Vocabulary (DCAT) to describe the type of any rare disease dataset and dct:creator from DCMI Metadata Terms (DCT) to indicate the relationship between a dataset and its creator.
  • Tip: EJPRD Metadata Model and DCAT are good standards to start with.
The FAIRification process may seem daunting at first glance but we have tools, study materials and experts to help you. Keep in mind that FAIR is a spectrum and that achieving your goals are more important than achieving 100% FAIR!

ERDRI.dor is a catalogue of Rare Disease registries. It provides an overview of participating registries with their characteristics and descriptions. The portal registers metadata in 38 attributes, out of which 23 are compulsory. All information is imputed by the registry owners.
ERDRI.dor can be accessed through https://eu-rd-platform.jrc.ec.europa.eu/erdridor/

ERDRI.mdr is a database of the data elements used by Rare Disease registries. Its main goal is to ease integration between registries. ERDRI.mdr contains metadata that specifies the data elements that are used by registries (including their definition and accepted units of measurement). All the information on ERDRI.mdr is imputed by the registry owners.

The Joint Research Centre have defined Common Data Elements to be collected by all RD registries and collecting them will improve interoperability with other ERN registries that are obligated to collect these data items

The set of “Common Data Elements” contains 16 data elements to be collected by each rare disease registry across Europe. Among these 16 elements are, the patient personal data, diagnosis, disease history and care pathway, elements that are essential for further research. The list of the Common Data Elements can be found at https://eu-rd-platform.jrc.ec.europa.eu/set-of-common-data-elements_en
  • This step aims at a mutual decision within your registry on what information should be exchangeable. To understand “semantics”, data values (i.e., meaning of data elements), data representation (format), and structure information (i.e., hierarchy of that data in the underlying data model) should be analysed.
  • The deliverable should be a set of data elements whose semantics should be clear and unambiguous to reflect the information you want to be exchanged.
  • In the EJP RD, an initial set of such minimal information has been determined by the Joint Research Centre and is described as Common Data Elements (CDEs). A new registry is advised to collect these CDEs while an existing registry should identify local data elements that semantically align with these CDEs.

Reach out to the EJP RD helpdesk  if you are not already in contact with a FAIRification steward

  • EJP RD has defined a semantic model for the CDEs. The data consists of the Common data elements tagged with ontologies and with the relationships between each data element also defined by ontologies.
  • The model can be automatically implemented via the CDE in a box The expert level here may involve knowledge of GIT and Docker images. CDE in a box is a collection of software applications which enables creation, storing and publishing of “Common Data Elements” according to the CDE semantic model.
  • This step aims at assigning machine-readable terms from existing ontologies to metadata and CDEs. Uniform Resource Identifiers (URIs) should be used to identify and refer to each term without ambiguity.
  • Domain ontologies are used to describe data that is specific to a particular domain. For example, use ncit:sex from National Cancer Institute Thesaurus (NCIT) to describe the “sex” data element, and ncit:male to describe the “male” value of “sex” element.
  • The deliverable should be documented bindings of appropriate terms to metadata and data elements so that their semantics can be expressed in a machine-readable fashion.
  • In the EJP RD, there is a list of recommended ontologies built specifically for the CDEs but can cater to various purposes. A selection of terms binding to CDEs are determined in the semantic CDE model. So, a new registry or an existing one is advised to follow the choices made in that model, otherwise ontology mapping is needed.
  • Tip: By implementing the EJP RD CDE Semantic Model, the standard ontologies used in it are also implemented.
The generic ICF developed by EJP RD and available in 24 languages can be used for this purpose. It needs to be customised to fill your needs and validated by the local ethics committees. See ERN Registries Generic Informed Consent Forms
  • Pseudonymisation aims to facilitate the secondary use of healthcare data by avoiding creation of a transparent universal ID for each patient. The same patient will be attributed different pseudonyms in different contexts. The link between the different pseudonyms of the same patient is handled by the pseudonymisation software chosen, in a way that preserves the possibility to re-identify the patient for trusted third parties.
  • The JRC is developing a tool for this purpose.
  • Data collection is a crucial step for a patient registry, make sure to discuss with your team and define the clinical implementation workflow including roles assignment.
  • Aim for integrations between Electronic Patient Records and Electronic Data Capture Systems or the Registry System, to avoid manual data insertion.

  • This step aims at implementing the semantic model for data through an automatic tool, and the metadata model for metadata. The metadata and data that are structured with ontologies and follow standard schemas make it easier for other resources such as the EJP RD Virtual Platform to find your registry metadata and understand its data.
  • Tip: EJPRD developed a metadata model, it may require a developer to implement it in your registry source code.
FAIRification of data is a spectrum and achieving your FAIRification goal is more important that only making data ‘FAIR’.
A FAIR Data Point stores information about data sets, which is the definition of metadata. The system is called a FAIR data point because it takes care of a lot of the issues that need to be taken care of to make data FAIR; especially with the metadata needed for Findability and Reusability, and a uniform open way of Accessing the data. The FAIR data point also addresses the Interoperability of the metadata it stores, but it leaves the Interoperability aspects for the data itself to the data provider. https://www.fairdatapoint.org/
All in one deployment of the CDE Semantic model + FAIR data point? Check it out here!

  • Conditions to get access to the data are set up by the Data Access Committee (DAC) and are described in the data access policy. Templates prepared by the EWGs in ERICA can be used for this purpose (e.g., ERN Data Access Policy Template and ERN Data Sharing Agreement)
  • EJPRD Generic ICF for patient level decision on data access
  • Data Access Policy at registry level should be defined to serve as unbiased criteria for the ERN Data Access Committee. EJP RD experts are working on Common Consent Elements (CCEs) based on DUC (complies ICO and DUO ontologies) and a semantic model for the CCEs. Both resources should reflect the ERN’s Data Access Policy. Stay tuned!
  • The Virtual Platform (VP) aims to bring together data from Rare Diseases resources, which is by nature scattered, scarce, and usually isn’t connected to other data sources. The VP is a federated ecosystem, meaning that resources are made available through multiple query points that allow for the answering of questions that require information from various data sources. This way, it makes the data more FAIR while respecting privacy, consent, and access conditions, since the data stays at source level, but can be queried through the VP.
  • It is envisioned that each registry will constitute a node in the EJP RD VP network. It will be possible to send queries to all the nodes in the network via the VP. Each node stays in control to who can access the data and in which format (yes/no answers, aggregated, anonymised, pseudonymised).
  • Tip: The first version of the Virtual Platform Specifications (VIPS) is out! Please contact us via EJP RD helpdesk if you have any questions.
The first version of the Virtual Platform Specifications (VIPS) is out! Please contact us via EJP RD helpdesk if you have any questions.
  • As a task under the objectives of the EJP RD, we created a set of software packages – The FAIR Evaluator – that coded each Metric into an automatable software-based test, and created an engine that could automatically apply these tests to the metadata of any dataset, generating an objective, quantitative score for the ‘FAIRness’ of that resource, together with advice on what caused any failures (https://www.nature.com/articles/s41597-019-0184-5).  With this information, a data owner would be able to create a strategy to improve their FAIRness by focusing on “priority failures”.  The public version of The FAIR Evaluator (https://w3id.org/AmIFAIR) has been used to assess >5500 datasets.  Within the domain of rare disease registries, a recent publication about the VASCA registry shows how the Evaluator was used to track their progress towards FAIRness (https://www.medrxiv.org/content/10.1101/2021.03.04.21250752v1.full.pdf). To date, no resource – public or private – has ever passed all 22 tests, showing that FAIR assessment is able to provide guidance to even highly-FAIR resources.
  • The FAIR evaluation results can serve as a pointer to where your FAIRness can be improved.
FAIRification is a continuous process, do not get discouraged by achieving only some of your goals. If that is the case, it may be time to reassess and restart the process optimising it to your project priorities.

References

  1. Toward effective, interoperable rare disease registries, in collaboration with the EJP-RD:  https://www.ejprarediseases.org/wp-content/uploads/2019/09/EJPRD-Pillar2-proposal-for-ERN-registries.pdf
  2. A generic workflow for the data FAIRification process: https://doi.org/10.1162/dint_a_00028
  3. The de novo FAIRification process of a registry for vascular anomalies: https://doi.org/10.1186/s13023-021-02004-y
  4. The “A” of FAIR – as open as possible, as closed as necessary: https://doi.org/10.1162/dint_a_00027
  5. Making FAIR Easy with FAIR Tools: From Creolization to convergence: https://doi.org/10.1162/dint_a_00031
  6. De-novo FAIRification via an Electronic Data Capture system by automated transformation of filled electronic Case Report Forms into machine-readable data: https://doi.org/10.1016/j.jbi.2021.103897
  7. Semantic modelling of Common Data Elements for Rare Disease registries, and a prototype workflow for their deployment over registry data: https://doi.org/10.1101/2021.07.27.21261169
  8. Introducing Domain-specific Common Data Elements for Rare Disease Registration: A Joint Initiative towards improving Semantic Interoperability in Rare Disease Research: https://preprints.jmir.org/preprint/32158
  9. Vascular Anomaly Registries: https://www.youtube.com/watch?v=DhwMqM1cM-w 

Acknowledgements: FAIRification Stewards Team