Skip to content

Validation Function

Owen Petchey edited this page Mar 1, 2019 · 8 revisions

Content

Format Conventions

  • name a column in a worksheet with the name name
  • Sheet$column the sheet is named 'sheet'
  • "a certain value"
  • name() a function in R
  • : This still needs to be done
  • : Implemented in validate()

Introduction

The validation report should report on the following aspects of the conformity of the metadata and it's consistency with the data files.

Structural / Formal validity

  • Format correct
  • Names correct
  • All Sheets present

Metadata Consistency within and between worksheets

Experiment

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • suggestedValues check if values are in suggestedValues

    * 0: all, in suggested values
    * 2: not in suggested values
    
  • ???

Species

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • suggestedValues check if values are in suggestedValues

    * 0: all, in suggested values
    * 2: not in suggested values
    
  • name in species database and report score (using taxize::gnr_resolve())

    * 0: species names are equal to the matched names
    * 2: one or more have a score of less than -.7 - likely typo
    * 2: one or more is not found
    

Treatment

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • suggestedValues check if values are in suggestedValues

    * 0: all, in suggested values
    * 2: not in suggested values
    
  • ???

<== DataFileMetaData

  • Treatment$parameter is in DataFileMetaData$mappingColumn

    * 0: everything OK
    * 2: otherwise
    

Measurement

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • suggestedValues check if values are in suggestedValues

    * 0: all, in suggested values
    * 2: not in suggested values
    
  • name is unique

    * 0: all unique
    * 3: not unique
    
  • measuredFrom is "raw", "NA" or in name

    * 0: is in set
    * 3: is not in set
    

<== DataFileMetaData

  • Measurement$variable is in DataFileMetaData$mappingColumn

    * 0: everything OK
    * 2: otherwise
    

<== DataExtractionName

  • dataExtractionName is "none", "NA", or in DataExtraction$name

DataExtraction

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • suggestedValues check if values are in suggestedValues

    * 0: all, in suggested values
    * 2: not in suggested values
    
  • name is unique

<== Measurement

  • name is in Measurement$dataExtractionName

DataFileMetaData

  • type: conversion from character to specified type possible

    * 0: everything OK
    * 3: otherwise
    
  • allowedValues check if values are in allowedValues

    * 0: all, in suggested values
    * 3: not in allowed values
    
  • dataFileName exists

  • if type == "datetime", description has contain format information. The validity will be tested together with the data

    * 0: all specified
    * 3: at least one not given
    

<== Measurement & Treatment

  • if columnData == "Measurement", mappingColumn has to be in Measurement$name & if columnData == "Treatment", mappingColumn has to be in Treatment$name

<== data file

  • columnName has to be in the data file dataFileName
  • the date, time and date time can be converted using the format specifications in description

data file

  • ranges of numeric columns
  • values of text columns
  • ???

<== DataFileMetaData

  • column names of DataFileMetaData$dataFileName have to be in DataFileMetaData$columnName with correct DataFileMetaData$dataFileName

<== Treatment

  • the values in the column describing the treatment (i.e. treatment levels) have to be in the column Treatment$treatmentLevel of the corresponding Treatment$parameter

<== Measurement

  • type checks of columnName compared to type

Other Validity Checks

  • Check each dataset for duplicated rows, alert to their presence.
  • Check each dataset for duplicated variable names, alert to their presence.

Validation result levels

  • error this is an error based on structure, content, or consistency between metadata and / or data file(s) which will result in incorrect metadata. Examples will be missing metadata of treatment levels in the. These need to be fixed before export!
  • warning a warning that inconsistencies were detected, but the metadata can be correct. These should be checked before export!
  • note an information of an issue which can (or can not be) an error. This is a softer form of a warning but should also be looked at.