The YES3 Exporter Technical Guide
Intended Audience
This guide is intended for individuals who will be writing programs to process files generated by the YES3 Exporter. These include data managers and statistical programmers writing programs to import YES3 Exporter data into statistical systems such as SAS and R, as well as application developers seeking to integrate the YES3 Exporter into datamart generation and validation workflows.
The YES3 technical team
The following individuals participate in one aspect or another - usually multiple aspects - of YES3 software development.
Katy Araujo ninja, testing, documentation
Venugopal Bhatia, application developer
Peter Charpentier, lead EM developer
Kraig Eisenman, IT system engineer
Brian Funaro, lead IT system engineer
Mary Geda, program manager, documentation lead
Kaitlin Maciejewski, testing
Baylah Tessier-Sherman, ninja, testing
Janet Truebig, ninja, testing
Sui Tsang, lead ninja, testing lead, documentation
YES3 Exporter character encoding
The YES3 Exporter generates UTF-8 encoded text files. While not required by the UTF-8 specification, for compatibility the byte order mark (BOM) is included. We have found that the BOM is needed for Microsoft products like Excel. Rows are terminated by a new line (0A), but this might be implementation-specific (our hosts run Ubuntu 20.04 and PHP 7.4). It might be best to anticipate a possible CRLF (0D0A) row termination.
YES3 Exporter destinations
There are two export destinations: download and filesystem.
The download destination offers the options of downloading the exported data, the data dictionary, or a .zip package that includes both (see "The YES3 Exporter Payload," below).
To configure filesystem exports for a project, filesystem exports must be enabled by a REDCap administrator in the YES3 Exporter system EM configuration settings. Then a host filesystem target folder must be specified in the YES3 Exporter project EM configuration settings. The host filesyetem target folder should be a "bind mount" for a secure institutional fileshare.
Managing multiple datamarts
Since host filesystem mount points must be configured by ITS system administrators, and probably approved by an Information Security Officer, in most implementations filesystem exports will be reserved for large studies having the resources and a compelling case for automated datamart builds. However, filesystem export targets for large numbers of studies might be facilitated using the following approach:
Create a single mount point as a folder on a secure, very restricted institutional fileshare.
Create a subfolder for each study requiring a filesystem export. This will be that study's filesystem export target - but will not be accessible to that study.
Build an external application that will sweep through the daily exports, validate them against the REDCap database and if validated, pass each to a folder that is accessible to that export's study staff.
The YES3 Exporter Payload
When exporting to a host filesystem folder, or downloading to a .zip export package, a "payload" of three files is delivered:
File | Naming convention | Description |
|---|---|---|
export information file | [normalized export name]_info_[timestamp].json | information about the project and the export settings |
export data dictionary | [normalized export name]_dd_[timestamp].csv | metadata and data distributions, tailored to the specific export |
export data file | [normalized export name]_data_[timestamp].csv | the exported data |
The export name is "normalized" by converting to lower case and replacing spaces and other non-alphanumeric characters with underscores.
How export specifications are identified
The REDCap EM logging feature provides an elegant way to maintain specification backups. In order to track specification backups across EM log entries, each export specification is identified by the EM parameter export_uuid. This value is an RFC-compliant universally unique identifier, generated by the Javascript function YES3.uuidv4 function in js/yes3.js.
EM Log records
The YES3 Exporter makes extensive use of External Module logs, both for logging actions such as downloads and filesystem exports and for storing the export specifications and event prefixes. An EM log parameter named log_entry_typeindicates the function of the log record. Below is a listing of each log_entry_type value and the purpose of the associated EM log record.
log_entry_type | EM Log record purpose | includes export_uuid |
|---|---|---|
yes3-export-specification | an export specification | yes |
yes3-export-events | event prefixes | no |
yes3-export-log | an event associated with an export or a download | yes |
yes3-export-cron-log | a daily cron summary across all projects and exports | no |
The YES3 Exporter Information file
The YES3 Export Information file is a JSON-encoded object that provides information about the project and the export specification.
Object properties:
Property | Example | Comments |
|---|---|---|
host | http://redcap@trantor.gov/ | |
timestamp | 2022-06-25 09:20:22 | |
project_id | 16 | |
project_recordid_field | screenid | |
project_title | YES3 Test Project | |
project_is_longitudinal | 1 | |
project_has_dags | 1 | |
export_name | DCARE outcomes | |
export_layout | h |
|
export_uuid | 8d99f980-fa35-4f3a-9f33-3cf302568a33 | export ID |
export_target_folder | /mnt/exports/dcare/ | for filesystem exports |
path | /mnt/exports/dcare/dcare_outcomes_data_22062592022.csv, /tmp/ys30W6gjh | either the full path of a filesystem export or the temporary file name used during download generation |
bytes_written | 66216 | |
columns | 273 | |
rows | 221 | |
destination | download |
|
notification_email | joe_user@trantor.gov | email address for project notifications, defined in the project EM configuration |
username | palooka | |
YES3 Exporter Variables
Variable Types
Type | format | comments/associated REDCap metadata |
|---|---|---|
INTEGER | REDCap field type | |
FLOAT | w.d as determined by PHP | REDCap field type |
DATE | yyyy-mm-dd | REDCap field validations |
DATETIME | yyyy-mm-dd hh:nn:ss | REDCap field validations |
TIME | hh:nn:ss | REDCap field validation |
NOMINAL | value set | REDCap field types |
CHECKBOX | value set | REDCap field type |
TEXT | REDCap field type |
REDCap Attribute variables
Depending on the export layout and study design, an exported dataset may include one or more "REDCap Attribute" variables. These are the intrinsic REDCap attributes that may be associated with an output record: data access group, event and instance. Where possible we have used the same names that are used by the REDCap API. The full list is shown below.
variable name | type | example |
|---|---|---|
redcap_data_access_group_id | INTEGER | 31 |
redcap_data_access_group_name | TEXT | utmb |
redcap_event_id | INTEGER | 4159 |
redcap_event_name | TEXT | baseline_arm_1 |
redcap_repeat_instance | INTEGER | 1 |
REDCap Data variables
All exported datasets include REDCap data values of course, and these are stored in REDCap Data variables. For vertical and repeated export layouts, the variable names for REDCap data will be the same as the underlying REDCap field_name. For horizontal layouts the variable names will be composites of the REDCap event prefix and the REDCap field name.
Export Data Layouts
The YES3 Exporter currently supports three export layouts: Vertical(v), Horizontal(h) and Repeating forms(r).
For all export layouts the first field or 'variable' in the exported record will be the recordid, and the variable name will be the recordid field name as defined for the specific project.
Depending on the layout and study design, the recordid variable may be followed by a collection of "REDCap Attribute" variables that indicate the REDCap data access group and event.
The remainder of the exported record is the collection of values for the selected fields, in the order specified. The variable name of each will be either the REDCap field name (vertical and repeating form layouts), or a composite of the REDCap event prefix and the REDCap field name (horizontal layout).
Following are more detailed descriptions of the export layouts.
The Vertical Export Layout
For classic (cross sectional) study designs, the vertical export layout has one row for each record. For longitudinal study designs, the vertical export layout has one row per observed record and event combination.
Export record specification for a Vertical layout
variable | description | comments |
|---|---|---|
[recordId] | The record unique identifier | study-specific variable name |
redcap_data_access_group_id | The numeric DAG id | only present if DAGs are assigned to records |
redcap_data_access_group_name | The unique DAG name | only present if DAGs are assigned to records |
redcap_event_id | The numeric event id | only present for longitudinal studies |
redcap_event_name | The unique REDCap event name | only present for longitudinal studies |
[fieldname] | the first selected field value | variable name is the underlying REDCap field name |
... | ||
[fieldname] | the last selected field value | variable name is the underlying REDCap field name |
The Horizontal Export Layout
Horizontal layouts are allowed only for longitudinal study designs, and have one row for each record.
Export record specification for a Horizontal layout
variable | description | comments |
|---|---|---|
[recordId] | The record unique identifier | study-specific variable name |
redcap_data_access_group_id | The numeric DAG id | only present if DAGs are assigned to records |
redcap_data_access_group_name | The unique DAG name | only present if DAGs are assigned to records |
[composite fieldname] | the first selected field/event value | variable name is [event prefix]_[ REDCap field name] |
... | ||
[composite fieldname] | the last selected field/event value | variable name is [event prefix]_[ REDCap field name] |
The Repeating Form Export Layout
For classic (cross sectional) study designs, the repeating form export layout has one row for each observed record and instance combination.
For longitudinal study designs, the repeating form export layout has one row per observed record, event and instance combination.
Export record specification for a Repeating Form layout
variable | description | comments |
|---|---|---|
[recordId] | The record unique identifier | study-specific variable name |
redcap_data_access_group_id | The numeric DAG id | only present if DAGs are assigned to records |
redcap_data_access_group_name | The unique DAG name | only present if DAGs are assigned to records |
redcap_event_id | The numeric event id | only present for longitudinal studies |
redcap_event_name | The unique REDCap event name | only present for longitudinal studies |
redcap_repeat_instance | the sequential instance value (1, 2, ...) | |
[fieldname] | the first selected field value | variable name is the underlying REDCap field name |
... | ||
[fieldname] | the last selected field value | variable name is the underlying REDCap field name |
The YES3 Exporter Data Dictionary
The YES3 Exporter Data Dictionary includes detailed metadata for every exported variable, as well as data distribution summaries.
One intended use for the YES3 Exporter data dictionary is for external applications to generate and execute fine-tuned data import code. For example, the observed maximum string length can be used to tailor SAS variable attributes to exactly match the exported data. Once a datamart has been rebuilt, an external process can use the data distribution information in the data dictionary to validate the import - or to create and deploy a simple summary data report.
Data distributions summaries
The data distribution summaries stored in the YES3 Exporter data dictionary were originally conceived as an aid to fine-tuned code generation for datamart builds.
Data validations
The distribution summaries can be used to validate a data management process, for example an R or SAS program to import data into a datamart. Be sure to make appropriately "fuzzy" comparisons for distributional parameters like mean and variance. If I (PC) remember correctly, SAS has a bit-level truncation function that's useful in this regard.
Ad hoc reports
We have found that the data distribution summaries are also convenient for data completion reports, study monitoring reports and to check for extreme values. Always: make sure that your intended purpose wouldn't be achieved more simply and elegantly by the reporting and project portal tools provided by REDCap.
YES3 Exporter Data Dictionary specification
Each row of a YES3 Exporter Data Dictionary matches one column in the exported dataset, in the same order. Below is a table of column specifications.
column name | column description |
|---|---|
var_name | The variable name. This will either be a raw REDCap field name, a REDCap Attribute field name or for horizontal layouts, a composite field name based on the event and REDCap field name |
var_label | The label for the variable. It is based on the REDCap field label, but is subject to the truncation and 'sanitizing' options of the export specification. |
var_type | The variable's data type (see "Variable Types") |
valueset | NOMINAL and CHECKBOX variables only: A JSON-encoded array of value/label pairs. |
origin | Indicates the data source for the variable: |
redcap_field_name | the name of the underlying REDCap field, if relevant |
redcap_form_name | the form associated with the underlying REDCap field, if relevant |
redcap_event_id | HORIZONTAL LAYOUTS ONLY: the REDCap event id associated with the variable. |
redcap_event_name | HORIZONTAL LAYOUTS ONLY: the REDCap event name associated with the variable. |
non_missing_count | The count of non-blank values for this variable, as observed in the exported records. |
min_length | For TEXT variables: the minimum string length - after any truncations or sanitations - observed in the exported records. |
max_length | For TEXT variables: the maximum string length - after any truncations or sanitations - observed in the exported records. |
min_value | For numeric and date/time variables: the minimum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00 |
max_value | For numeric and date/time variables: the maximum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00 |
sum_of_values | For numeric and date/time variables: the sum of the values observed in the exported records |
sum_of_squared_values | For numeric and date/time variables: the sum of the squared values observed in the exported records |
mean | For numeric and date/time variables: the average of the values observed in the exported records |
standard_deviation | For numeric and date/time variables: the sample standard deviation of the values observed in the exported records |
formatted_min_value | The minimum observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
formatted_max_value | The maximum observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
formatted_mean | The average observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
frequency_table | For NOMINAL variables: a JSON-encoded table of discrete values and their frequency counts. |