Data Dictionary¶

The YES3 Exporter II Data Dictionary includes detailed metadata for every exported variable, as well as data distribution summaries.

One intended use for the YES3 Exporter II data dictionary is for external applications to generate and execute fine-tuned data import code. For example, the observed maximum string length can be used to tailor SAS variable attributes to exactly match the exported data. Once a datamart has been rebuilt, an external process can use the data distribution information in the data dictionary to validate the import - or to create and deploy a simple summary data report.

Data distribution summaries¶

The data distribution summaries stored in the YES3 Exporter II data dictionary were originally conceived as an aid to fine-tuned code generation for datamart builds.

Data validations¶

The distribution summaries can be used to validate a data management process, for example an R or SAS program to import data into a datamart. Be sure to make appropriately "fuzzy" comparisons for distributional parameters like mean and variance. If I (PC) remember correctly, SAS has a bit-level truncation function that's useful in this regard.

Ad hoc reports¶

We have found that the data distribution summaries are also convenient for data completion reports, study monitoring reports and to check for extreme values. Always: make sure that your intended purpose wouldn't be achieved more simply and elegantly by the reporting and project portal tools provided by REDCap.

Data Dictionary Specification¶

Each row of a YES3 Exporter II Data Dictionary matches one column in the exported dataset, in the same order. Below is a table of column specifications.

column name	column description
var_name	The variable name. This will either be a raw REDCap field name, a REDCap Attribute field name or for horizontal layouts, a composite field name based on the event and REDCap field name
var_label	The label for the variable. It is based on the REDCap field label, but is subject to the truncation and 'sanitizing' options of the export specification.
var_type	The variable's data type (see "Variable Types")
valueset	NOMINAL and CHECKBOX variables only: A JSON-encoded array of value/label pairs.
origin	Indicates the data source for the variable: `redcap` = REDCap field, `other` = other REDCap attribute such as the event, DAG or instance.
redcap_field_name	the name of the underlying REDCap field, if relevant
redcap_form_name	the form associated with the underlying REDCap field, if relevant
redcap_event_id	HORIZONTAL LAYOUTS ONLY: the REDCap event id associated with the variable.
redcap_event_name	HORIZONTAL LAYOUTS ONLY: the REDCap event name associated with the variable.
non_missing_count	The count of non-blank values for this variable, as observed in the exported records.
min_length	For TEXT variables: the minimum string length - after any truncations or sanitations - observed in the exported records.
max_length	For TEXT variables: the maximum string length - after any truncations or sanitations - observed in the exported records.
min_value	For numeric and date/time variables: the minimum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00
max_value	For numeric and date/time variables: the maximum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00
sum_of_values	For numeric and date/time variables: the sum of the values observed in the exported records
sum_of_squared_values	For numeric and date/time variables: the sum of the squared values observed in the exported records
mean	For numeric and date/time variables: the average of the values observed in the exported records
standard_deviation	For numeric and date/time variables: the sample standard deviation of the values observed in the exported records
formatted_min_value	The minimum observed value, formatted as necessary for DATE, DATETIME and TIME variables.
formatted_max_value	The maximum observed value, formatted as necessary for DATE, DATETIME and TIME variables.
formatted_mean	The average observed value, formatted as necessary for DATE, DATETIME and TIME variables.
frequency_table	For NOMINAL variables: a JSON-encoded table of discrete values and their frequency counts.