YES3 Exporter Version 1.0 Documentation Help

The YES3 Exporter Technical Guide

Intended Audience

This guide is intended for individuals who will be writing programs to process files generated by the YES3 Exporter. These include data managers and statistical programmers writing programs to import YES3 Exporter data into statistical systems such as SAS and R, as well as application developers seeking to integrate the YES3 Exporter into datamart generation and validation workflows.

The YES3 technical team

The following individuals participate in one aspect or another - usually multiple aspects - of YES3 software development.

Katy Araujo ninja, testing, documentation
Venugopal Bhatia, application developer
Peter Charpentier, lead EM developer
Kraig Eisenman, IT system engineer
Brian Funaro, lead IT system engineer
Mary Geda, program manager, documentation lead
Kaitlin Maciejewski, testing
Baylah Tessier-Sherman, ninja, testing
Janet Truebig, ninja, testing
Sui Tsang, lead ninja, testing lead, documentation

YES3 Exporter character encoding

The YES3 Exporter generates UTF-8 encoded text files. While not required by the UTF-8 specification, for compatibility the byte order mark (BOM) is included. We have found that the BOM is needed for Microsoft products like Excel. Rows are terminated by a new line (0A), but this might be implementation-specific (our hosts run Ubuntu 20.04 and PHP 7.4). It might be best to anticipate a possible CRLF (0D0A) row termination.

YES3 Exporter destinations

There are two export destinations: download and filesystem.

The download destination offers the options of downloading the exported data, the data dictionary, or a .zip package that includes both (see "The YES3 Exporter Payload," below).

To configure filesystem exports for a project, filesystem exports must be enabled by a REDCap administrator in the YES3 Exporter system EM configuration settings. Then a host filesystem target folder must be specified in the YES3 Exporter project EM configuration settings. The host filesyetem target folder should be a "bind mount" for a secure institutional fileshare.

Managing multiple datamarts

Since host filesystem mount points must be configured by ITS system administrators, and probably approved by an Information Security Officer, in most implementations filesystem exports will be reserved for large studies having the resources and a compelling case for automated datamart builds. However, filesystem export targets for large numbers of studies might be facilitated using the following approach:

  1. Create a single mount point as a folder on a secure, very restricted institutional fileshare.

  2. Create a subfolder for each study requiring a filesystem export. This will be that study's filesystem export target - but will not be accessible to that study.

  3. Build an external application that will sweep through the daily exports, validate them against the REDCap database and if validated, pass each to a folder that is accessible to that export's study staff.

The YES3 Exporter Payload

When exporting to a host filesystem folder, or downloading to a .zip export package, a "payload" of three files is delivered:

File

Naming convention

Description

export information file

[normalized export name]_info_[timestamp].json

information about the project and the export settings

export data dictionary

[normalized export name]_dd_[timestamp].csv

metadata and data distributions, tailored to the specific export

export data file

[normalized export name]_data_[timestamp].csv

the exported data

The export name is "normalized" by converting to lower case and replacing spaces and other non-alphanumeric characters with underscores.

How export specifications are identified

The REDCap EM logging feature provides an elegant way to maintain specification backups. In order to track specification backups across EM log entries, each export specification is identified by the EM parameter export_uuid. This value is an RFC-compliant universally unique identifier, generated by the Javascript function YES3.uuidv4 function in js/yes3.js.

EM Log records

The YES3 Exporter makes extensive use of External Module logs, both for logging actions such as downloads and filesystem exports and for storing the export specifications and event prefixes. An EM log parameter named log_entry_typeindicates the function of the log record. Below is a listing of each log_entry_type value and the purpose of the associated EM log record.

log_entry_type

EM Log record purpose

includes


export_uuid

yes3-export-specification

an export specification

yes

yes3-export-events

event prefixes

no

yes3-export-log

an event associated with an export or a download

yes

yes3-export-cron-log

a daily cron summary across all projects and exports

no

The YES3 Exporter Information file

The YES3 Export Information file is a JSON-encoded object that provides information about the project and the export specification.

Object properties:

Property

Example

Comments

host

http://redcap@trantor.gov/

timestamp

2022-06-25 09:20:22

project_id

16

project_recordid_field

screenid

project_title

YES3 Test Project

project_is_longitudinal

1

project_has_dags

1

export_name

DCARE outcomes

export_layout

h

v=vertical, h=horizontal, r=repeating form

export_uuid

8d99f980-fa35-4f3a-9f33-3cf302568a33

export ID

export_target_folder

/mnt/exports/dcare/

for filesystem exports

path

/mnt/exports/dcare/dcare_outcomes_data_22062592022.csv, /tmp/ys30W6gjh

either the full path of a filesystem export or the temporary file name used during download generation

bytes_written

66216

columns

273

rows

221

destination

download

download or filesystem

notification_email

joe_user@trantor.gov

email address for project notifications, defined in the project EM configuration

username

palooka

YES3 Exporter Variables

Variable Types

Type

format

comments/associated REDCap metadata

INTEGER

REDCap field type slider or REDCap field validation integer

FLOAT

w.d as determined by PHP

REDCap field type calc or REDCap field validation float. 14 significant digits (IEEE 64-bit).

DATE

yyyy-mm-dd

REDCap field validations date_mdy, date_ymd, date_dmy, date_dmy

DATETIME

yyyy-mm-dd hh:nn:ss

REDCap field validations datetime_mdy, datetime_ymd, datetime_dmy, datetime_seconds_mdy, datetime_seconds_ymd, datetime_seconds_dmy

TIME

hh:nn:ss

REDCap field validation time

NOMINAL

value set

REDCap field types radio, yesno, truefalse, select (dropdown)

CHECKBOX

value set

REDCap field type checkbox

TEXT

REDCap field type paragraph/notes, REDCap field type sql or REDCap field type text with no REDCap field validation

REDCap Attribute variables

Depending on the export layout and study design, an exported dataset may include one or more "REDCap Attribute" variables. These are the intrinsic REDCap attributes that may be associated with an output record: data access group, event and instance. Where possible we have used the same names that are used by the REDCap API. The full list is shown below.

variable name

type

example

redcap_data_access_group_id

INTEGER

31

redcap_data_access_group_name

TEXT

utmb

redcap_event_id

INTEGER

4159

redcap_event_name

TEXT

baseline_arm_1

redcap_repeat_instance

INTEGER

1

REDCap Data variables

All exported datasets include REDCap data values of course, and these are stored in REDCap Data variables. For vertical and repeated export layouts, the variable names for REDCap data will be the same as the underlying REDCap field_name. For horizontal layouts the variable names will be composites of the REDCap event prefix and the REDCap field name.

Export Data Layouts

The YES3 Exporter currently supports three export layouts: Vertical(v), Horizontal(h) and Repeating forms(r).

For all export layouts the first field or 'variable' in the exported record will be the recordid, and the variable name will be the recordid field name as defined for the specific project.

Depending on the layout and study design, the recordid variable may be followed by a collection of "REDCap Attribute" variables that indicate the REDCap data access group and event.

The remainder of the exported record is the collection of values for the selected fields, in the order specified. The variable name of each will be either the REDCap field name (vertical and repeating form layouts), or a composite of the REDCap event prefix and the REDCap field name (horizontal layout).

Following are more detailed descriptions of the export layouts.

The Vertical Export Layout

For classic (cross sectional) study designs, the vertical export layout has one row for each record. For longitudinal study designs, the vertical export layout has one row per observed record and event combination.

Export record specification for a Vertical layout

variable

description

comments

[recordId]

The record unique identifier

study-specific variable name

redcap_data_access_group_id

The numeric DAG id

only present if DAGs are assigned to records

redcap_data_access_group_name

The unique DAG name

only present if DAGs are assigned to records

redcap_event_id

The numeric event id

only present for longitudinal studies

redcap_event_name

The unique REDCap event name

only present for longitudinal studies

[fieldname]

the first selected field value

variable name is the underlying REDCap field name

...

[fieldname]

the last selected field value

variable name is the underlying REDCap field name

The Horizontal Export Layout

Horizontal layouts are allowed only for longitudinal study designs, and have one row for each record.

Export record specification for a Horizontal layout

variable

description

comments

[recordId]

The record unique identifier

study-specific variable name

redcap_data_access_group_id

The numeric DAG id

only present if DAGs are assigned to records

redcap_data_access_group_name

The unique DAG name

only present if DAGs are assigned to records

[composite fieldname]

the first selected field/event value

variable name is [event prefix]_[ REDCap field name]

...

[composite fieldname]

the last selected field/event value

variable name is [event prefix]_[ REDCap field name]

The Repeating Form Export Layout

For classic (cross sectional) study designs, the repeating form export layout has one row for each observed record and instance combination.

For longitudinal study designs, the repeating form export layout has one row per observed record, event and instance combination.

Export record specification for a Repeating Form layout

variable

description

comments

[recordId]

The record unique identifier

study-specific variable name

redcap_data_access_group_id

The numeric DAG id

only present if DAGs are assigned to records

redcap_data_access_group_name

The unique DAG name

only present if DAGs are assigned to records

redcap_event_id

The numeric event id

only present for longitudinal studies

redcap_event_name

The unique REDCap event name

only present for longitudinal studies

redcap_repeat_instance

the sequential instance value (1, 2, ...)

[fieldname]

the first selected field value

variable name is the underlying REDCap field name

...

[fieldname]

the last selected field value

variable name is the underlying REDCap field name

The YES3 Exporter Data Dictionary

The YES3 Exporter Data Dictionary includes detailed metadata for every exported variable, as well as data distribution summaries.

One intended use for the YES3 Exporter data dictionary is for external applications to generate and execute fine-tuned data import code. For example, the observed maximum string length can be used to tailor SAS variable attributes to exactly match the exported data. Once a datamart has been rebuilt, an external process can use the data distribution information in the data dictionary to validate the import - or to create and deploy a simple summary data report.

Data distributions summaries

The data distribution summaries stored in the YES3 Exporter data dictionary were originally conceived as an aid to fine-tuned code generation for datamart builds.

Data validations

The distribution summaries can be used to validate a data management process, for example an R or SAS program to import data into a datamart. Be sure to make appropriately "fuzzy" comparisons for distributional parameters like mean and variance. If I (PC) remember correctly, SAS has a bit-level truncation function that's useful in this regard.

Ad hoc reports

We have found that the data distribution summaries are also convenient for data completion reports, study monitoring reports and to check for extreme values. Always: make sure that your intended purpose wouldn't be achieved more simply and elegantly by the reporting and project portal tools provided by REDCap.

YES3 Exporter Data Dictionary specification

Each row of a YES3 Exporter Data Dictionary matches one column in the exported dataset, in the same order. Below is a table of column specifications.

column name

column description

var_name

The variable name. This will either be a raw REDCap field name, a REDCap Attribute field name or for horizontal layouts, a composite field name based on the event and REDCap field name

var_label

The label for the variable. It is based on the REDCap field label, but is subject to the truncation and 'sanitizing' options of the export specification.

var_type

The variable's data type (see "Variable Types")

valueset

NOMINAL and CHECKBOX variables only: A JSON-encoded array of value/label pairs.

origin

Indicates the data source for the variable: redcap = REDCap field, other = other REDCap attribute such as the event, DAG or instance.

redcap_field_name

the name of the underlying REDCap field, if relevant

redcap_form_name

the form associated with the underlying REDCap field, if relevant

redcap_event_id

HORIZONTAL LAYOUTS ONLY: the REDCap event id associated with the variable.

redcap_event_name

HORIZONTAL LAYOUTS ONLY: the REDCap event name associated with the variable.

non_missing_count

The count of non-blank values for this variable, as observed in the exported records.

min_length

For TEXT variables: the minimum string length - after any truncations or sanitations - observed in the exported records.

max_length

For TEXT variables: the maximum string length - after any truncations or sanitations - observed in the exported records.

min_value

For numeric and date/time variables: the minimum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00

max_value

For numeric and date/time variables: the maximum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00

sum_of_values

For numeric and date/time variables: the sum of the values observed in the exported records

sum_of_squared_values

For numeric and date/time variables: the sum of the squared values observed in the exported records

mean

For numeric and date/time variables: the average of the values observed in the exported records

standard_deviation

For numeric and date/time variables: the sample standard deviation of the values observed in the exported records

formatted_min_value

The minimum observed value, formatted as necessary for DATE, DATETIME and TIME variables.

formatted_max_value

The maximum observed value, formatted as necessary for DATE, DATETIME and TIME variables.

formatted_mean

The average observed value, formatted as necessary for DATE, DATETIME and TIME variables.

frequency_table

For NOMINAL variables: a JSON-encoded table of discrete values and their frequency counts.

Last modified: 01 July 2024