Skip to content

Commit

Permalink
Merge pull request #375 from jliermann/main
Browse files Browse the repository at this point in the history
upgrade dependencies, imprint, correct broken links
  • Loading branch information
jliermann committed Jul 16, 2024
2 parents 500bf66 + f91b472 commit 6c72574
Show file tree
Hide file tree
Showing 12 changed files with 1,010 additions and 103 deletions.
44 changes: 22 additions & 22 deletions docs/00_intro/10_fair.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
title: "FAIR Data Principles"
slug: "/fair"
---

# FAIR Data Principles

![FAIR Data](/img/topics/FAIR_data_principles.png)
Expand All @@ -23,7 +24,6 @@ In chemistry, the deposition of crystallographic data in a standardized file for

In the following, we answer the questions: What makes data FAIR? What do researchers and those who provide data preservation services need to consider?


## Findable

Researchers — and the computers working on their behalf — must be able to find datasets to be able to reuse them. Therefore, the first guideline of the FAIR Data Principles outlines methods to ensure a dataset’s discovery.
Expand All @@ -38,11 +38,11 @@ A common example of a citable PID is the Digital Object Identifier, or [DOI](htt

Data need to be sufficiently described in order to make them both findable and reusable. Hence, the specific focus here lies on making the (meta)data findable by using rich discovery [metadata](/docs/metadata) in a standardized format and allowing computers and humans to quickly understand the dataset’s contents. This is an essential component in the plurality of metadata described by [R1](#r1-metadata-are-richly-described-with-a-plurality-of-accurate-and-relevant-attributes) below. This information may include, but is not limited to:

- the context on what the dataset is, how it was generated, and how it can be interpreted,
- the data quality,
- licensing and (re)use agreements,
- what other data may be related (linked via its PID), and
- associated journal publications and their DOI.
- the context on what the dataset is, how it was generated, and how it can be interpreted,
- the data quality,
- licensing and (re)use agreements,
- what other data may be related (linked via its PID), and
- associated journal publications and their DOI.

Repositories should provide researchers with a fillable [application profile](https://en.wikipedia.org/wiki/Application_profile) that allows researchers to give extensive and precise information on their deposited datasets. For example, the Chemotion Repository uses, among others, the [Datacite Metadata Schema](http://doi.org/10.5438/0012) to build its application profile, a schema specifically created for the publication and citation of research data. [RADAR](https://radar.products.fiz-karlsruhe.de/en), including the variant [RADAR4Chem](https://www.nfdi4chem.de/index.php/2650-2/), has also built [its metadata schema](https://radar.products.fiz-karlsruhe.de/en/radarfeatures/radar-metadatenschema) on Datacite. These include an assortment of mandatory, recommended, and optional metadata properties, allowing for a rich description of the deposited dataset. For those publishing data, always keep in mind: the more information provided, the better.

Expand Down Expand Up @@ -106,17 +106,17 @@ Many of the previous points lead to one key aspect of data sharing: data reusabi

Related to [F2](#f2-data-are-described-with-rich-metadata-defined-by-r1-below) above, the focus here lies on whether the data, once found, is useable to the person or computer searching. It also stresses giving the data as many attributes as possible. Researchers should not assume the person—or that person’s computer—looking to re(use) their data is completely familiar with the discipline. Examples of information to assign here include (non-exhaustive list):

- What the dataset contains, including whether the data is raw and/or processed
- How the data was processed
- How the data can be reused
- Who created the data
- Date of creation
- Variable names
- Standard methods used
- Scope of the data and project
- Lab conditions
- Any limitations to the data
- Software and versions used for acquisition and processing.
- What the dataset contains, including whether the data is raw and/or processed
- How the data was processed
- How the data can be reused
- Who created the data
- Date of creation
- Variable names
- Standard methods used
- Scope of the data and project
- Lab conditions
- Any limitations to the data
- Software and versions used for acquisition and processing.

An important piece of information for chemical data are [machine-readable chemical structures](/docs/machine-readable_chemical_structures). This should be included within the dataset and/or metadata and aids computers in finding the correct data in their queries.

Expand All @@ -141,8 +141,8 @@ Where required, format converters should be linked in the dataset’s metadata.

## Sources and further information

- [FORCE 11: FAIR Data Principles](https://www.force11.org/group/fairgroup/fairprinciples)
- [Go-FAIR initiative: FAIR Principles](https://www.go-fair.org/fair-principles/)
- [TIB Blog: The FAIR Data Principles for Research Data](https://blogs.tib.eu/wp/tib/2017/09/12/the-fair-data-principles-for-research-data/)
- [FAIRsFAIR: How to be FAIR with your data. A teaching and training handbook for higher education institutions](https://doi.org/10.5281/zenodo.6674301) & [Engelhardt et al. (book version)](10.17875/gup2022-1915) & [Gitbook version](https://fairsfair.gitbook.io/fair-teaching-handbook)
- [Checklist: How FAIR are your data?](https://doi.org/10.5281/zenodo.1065991) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1065991.svg)](https://doi.org/10.5281/zenodo.1065991)
- [FORCE 11: FAIR Data Principles](https://www.force11.org/group/fairgroup/fairprinciples)
- [Go-FAIR initiative: FAIR Principles](https://www.go-fair.org/fair-principles/)
- [TIB Blog: The FAIR Data Principles for Research Data](https://blogs.tib.eu/wp/tib/2017/09/12/the-fair-data-principles-for-research-data/)
- [FAIRsFAIR: How to be FAIR with your data. A teaching and training handbook for higher education institutions](https://doi.org/10.5281/zenodo.6674301) & [Engelhardt et al. (book version)](https://doi.org/10.17875/gup2022-1915) & [Gitbook version](https://fairsfair.gitbook.io/fair-teaching-handbook)
- [Checklist: How FAIR are your data?](https://doi.org/10.5281/zenodo.1065991) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1065991.svg)](https://doi.org/10.5281/zenodo.1065991)
56 changes: 33 additions & 23 deletions docs/20_role/50_core_facility_manager.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,20 @@ title: "Core facility manager"
slug: "/core_facility_manager"
---

import useBaseUrl from '@docusaurus/useBaseUrl';
import useBaseUrl from "@docusaurus/useBaseUrl";

:::info Applies to:
This article applies to core facility managers and heads of analytical service units.
:::

## Motivation

<img alt="Data LifeCycle" src={useBaseUrl('/img/Intro/DataLifeCycle_KB.svg')} width="500" align="right" />
<img
alt="Data LifeCycle"
src={useBaseUrl("/img/Intro/DataLifeCycle_KB.svg")}
width="500"
align="right"
/>

In the chemistry data lifecycle, core facilities play an important role as major producers of chemical data. For modern analytical techniques such as mass spectrometry or NMR spectroscopy, data are usually recorded digitally and the challenges lie less in digitalisation but management issues.

Expand All @@ -23,7 +28,7 @@ When thinking about how to store data and make them available, an important star

### The Situation in Germany

The German Research Council ([DFG](https://www.dfg.de)) summarizes the consensus on the *fundamental principles and standards of good practice* in science in their Code fo Conduct *Guidelines for Safeguarding Good Research Practice* [\[1\]](#dfg_code). In guideline 17, a storage of all research data for the period of ten years is demanded, starting from the date of publication. Data storage strategies should therefore contain longterm storage for at least that time.
The German Research Council ([DFG](https://www.dfg.de)) summarizes the consensus on the _fundamental principles and standards of good practice_ in science in their Code fo Conduct _Guidelines for Safeguarding Good Research Practice_ [\[1\]](#dfg_code). In guideline 17, a storage of all research data for the period of ten years is demanded, starting from the date of publication. Data storage strategies should therefore contain longterm storage for at least that time.

## How to start

Expand All @@ -41,24 +46,24 @@ In addition, [backup strategies](/docs/data_storage/) for all instrument worksta

While most of the scientific work still lies ahead, there are already valuable metadata to be harvested and digested at the early stage of sample submission. These can include, among many others:

- Date
- Creator (person, group)
- Project
- Sample identifier
- Molecular structure(s), and derived properties:
- Molecular formula
- Molecular weight
- Elemental composition
- Physicochemical properties
- Solvent or solubility
- Purity
- Experiment information of interest, such as:
- Retation time
- Polarity
- Ionisation method
- NMR nuclei and experiments
- Chiroptical data
- Biological properties
- Date
- Creator (person, group)
- Project
- Sample identifier
- Molecular structure(s), and derived properties:
- Molecular formula
- Molecular weight
- Elemental composition
- Physicochemical properties
- Solvent or solubility
- Purity
- Experiment information of interest, such as:
- Retation time
- Polarity
- Ionisation method
- NMR nuclei and experiments
- Chiroptical data
- Biological properties

The challenge of digesting those metadata according to [FAIR guiding principles](/docs/fair/) can be a challenge for core facilities and essentially come down to two possible strategies:

Expand All @@ -67,5 +72,10 @@ The challenge of digesting those metadata according to [FAIR guiding principles]

## Sources

1. <a name="dfg_code"></a> Deutsche Forschungsgemeinschaft (DFG), <em>Guidelines for Safeguarding Good Research Practice. Code of Conduct</em>, September 2019, <a href="https://doi.org/10.5281/zenodo.3923602"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.3923602.svg" alt="DOI" /></a>
2. <a name="sync"></a>See <a href="https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy" target="_blank">Microsoft documentation</a> for <code>robocopy</code> or <a href="https://linux.die.net/man/1/rsync" target="_blank">manpage</a> for <code>rsync</code>.
1. <span id="dfg_code" />
Deutsche Forschungsgemeinschaft (DFG), *Guidelines for Safeguarding Good Research
Practice. Code of Conduct*, September 2019, [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3923602.svg)](https://doi.org/10.5281/zenodo.3923602)

2. <span id="sync" />
See [Microsoft documentation](https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy)
for `robocopy` or [manpage](https://linux.die.net/man/1/rsync) for `rsync`.
14 changes: 9 additions & 5 deletions docs/40_smartlab/00_smartlab.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,23 @@ slug: "/smartlab"
id: "smartlab"
---

import IntroButton from '@site/src/components/IntroButton.js';
import useBaseUrl from '@docusaurus/useBaseUrl';
import IntroButton from "@site/src/components/IntroButton.js";
import useBaseUrl from "@docusaurus/useBaseUrl";

# Smart Laboratory (Smart Lab)

![smartlab_flow](/img/smartlab/smartlab_flow2.png)

A smart lab represents a holistic approach to [data management](/docs/data_guide) in chemistry with seamless data flows. What does this mean? It means that all steps within a researcher's [workflow](/docs/domain_guide) across the [research data lifecycle](/docs/data_life_cycle) are interconnected in a digital way. The key difference to a [Laboratory Management System (LIMS)](https://en.wikipedia.org/wiki/Laboratory_information_management_system) is that the Smart Lab's main focus is the realisation of the [FAIR data principles](/docs/fair). For example, a researcher plans and [documents](/docs/data_documentation) their experiment in an [electronic lab notebook (ELN)](/docs/eln). Any experimental data from devices such as spectrometers are then directly ingested by the ELN via [Application Programming Interfaces (APIs)](https://en.wikipedia.org/wiki/API).
A smart lab represents a holistic approach to [data management](/docs/data_guide) in chemistry with seamless data flows. What does this mean? It means that all steps within a researcher's [workflow](/docs/domain_guide) across the [research data lifecycle](/docs/data_life_cycle) are interconnected in a digital way. The key difference to a [Laboratory Management System (LIMS)](https://en.wikipedia.org/wiki/Laboratory_information_management_system) is that the Smart Lab's main focus is the realisation of the [FAIR data principles](/docs/fair). For example, a researcher plans and [documents](/docs/data_documentation) their experiment in an [electronic lab notebook (ELN)](/docs/eln). Any experimental data from devices such as spectrometers are then directly ingested by the ELN via [Application Programming Interfaces (APIs)](https://en.wikipedia.org/wiki/API).

The ELN then ideally assigns all the necessary [metadata](/docs/metadata) automatically and appropriately for a corresponding workflow and converts proprietary [data formats](/docs/format_standards) to open data formats. The ELN structures the (meta)data and experimental descriptions in a meaningful and sustainable way which is both human- and machine-readable (e.g., via the use of [machine-readable chemical structures](/docs/machine-readable_chemical_structures). When the researcher chooses to [publish](/docs/data_publishing) or [archive](/docs/data_storage) their data, it is then ingested seamlessly by a data [repository](/docs/repositories) or archive without much further work as the ELN has already appropriately prepared the dataset to meet a repository’s or archive’s [requirements](/docs//choose_repository).
The ELN then ideally assigns all the necessary [metadata](/docs/metadata) automatically and appropriately for a corresponding workflow and converts proprietary [data formats](/docs/format_standards) to open data formats. The ELN structures the (meta)data and experimental descriptions in a meaningful and sustainable way which is both human- and machine-readable (e.g., via the use of [machine-readable chemical structures](/docs/machine-readable_chemical_structures). When the researcher chooses to [publish](/docs/data_publishing) or [archive](/docs/data_storage) their data, it is then ingested seamlessly by a data [repository](/docs/repositories) or archive without much further work as the ELN has already appropriately prepared the dataset to meet a repository’s or archive’s [requirements](/docs/choose_repository).

In this section, key components of the smart lab will be introduced to you.

## Get started:

<IntroButton url={"/docs/eln"} imgUrl={"/img/nfdi4chem_SmartLab.svg"} text={"Electronic Lab Notebooks"} />
<IntroButton
url={"/docs/eln"}
imgUrl={"/img/nfdi4chem_SmartLab.svg"}
text={"Electronic Lab Notebooks"}
/>
2 changes: 1 addition & 1 deletion docs/50_data_publication/10_repositories.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Some, but not all, repositories curate and review the data before **ingestion**

In order to allow data reuse by other researchers, [metadata](/docs/metadata), including [provenance information](/docs/provenance/), are required beside the actual data. Metadata describe the research data and provide information about its creation, the methods or software used as well as legal aspects. Metadata can be either added manually via a metadata editor or can be provided through other applications. The process to manually add metadata via a metadata editor can be compared to the process of submitting a manuscript to a publisher via the publishers submission system.

One main function of repositories is to provide a search function, with which users and machines can find, view, and download data. In order to ensure that data are permanently referenced and can be [linked and cited](/docs/best_practice/#how-to-use-dataset-pids-in-scientific-articles), repositories assign unique [persistent identifiers](/docs/pid) (PIDs). This also enhances the findability and accessibility of research data.
One main function of repositories is to provide a search function, with which users and machines can find, view, and download data. In order to ensure that data are permanently referenced and can be [linked and cited](/docs/best_practice/), repositories assign unique [persistent identifiers](/docs/pid) (PIDs). This also enhances the findability and accessibility of research data.

Repositories can also be certified (e.g. CoreTrustSeal). Such certification ensures that the data is citable, preserved in the long run, and may also cover aspects of data curation and data quality.

Expand Down
Loading

0 comments on commit 6c72574

Please sign in to comment.