Data Dictionary¶
The YES3 Exporter II Data Dictionary includes detailed metadata for every exported variable, as well as data distribution summaries.
One intended use for the YES3 Exporter II data dictionary is for external applications to generate and execute fine-tuned data import code. For example, the observed maximum string length can be used to tailor SAS variable attributes to exactly match the exported data. Once a datamart has been rebuilt, an external process can use the data distribution information in the data dictionary to validate the import - or to create and deploy a simple summary data report.
Data distribution summaries¶
The data distribution summaries stored in the YES3 Exporter II data dictionary were originally conceived as an aid to fine-tuned code generation for datamart builds.
Data validations¶
The distribution summaries can be used to validate a data management process, for example an R or SAS program to import data into a datamart. Be sure to make appropriately "fuzzy" comparisons for distributional parameters like mean and variance. If I (PC) remember correctly, SAS has a bit-level truncation function that's useful in this regard.
Ad hoc reports¶
We have found that the data distribution summaries are also convenient for data completion reports, study monitoring reports and to check for extreme values. Always: make sure that your intended purpose wouldn't be achieved more simply and elegantly by the reporting and project portal tools provided by REDCap.
Data Dictionary Specification¶
Each row of a YES3 Exporter II Data Dictionary matches one column in the exported dataset, in the same order. Below is a table of column specifications.
| column name | column description |
|---|---|
| var_name | The variable name. This will either be a raw REDCap field name, a REDCap Attribute field name or for horizontal layouts, a composite field name based on the event and REDCap field name |
| var_label | The label for the variable. It is based on the REDCap field label, but is subject to the truncation and 'sanitizing' options of the export specification. |
| var_type | The variable's data type (see "Variable Types") |
| valueset | NOMINAL and CHECKBOX variables only: A JSON-encoded array of value/label pairs. |
| origin | Indicates the data source for the variable: redcap = REDCap field, other = other REDCap attribute such as the event, DAG or instance. |
| redcap_field_name | the name of the underlying REDCap field, if relevant |
| redcap_form_name | the form associated with the underlying REDCap field, if relevant |
| redcap_event_id | HORIZONTAL LAYOUTS ONLY: the REDCap event id associated with the variable. |
| redcap_event_name | HORIZONTAL LAYOUTS ONLY: the REDCap event name associated with the variable. |
| non_missing_count | The count of non-blank values for this variable, as observed in the exported records. |
| min_length | For TEXT variables: the minimum string length - after any truncations or sanitations - observed in the exported records. |
| max_length | For TEXT variables: the maximum string length - after any truncations or sanitations - observed in the exported records. |
| min_value | For numeric and date/time variables: the minimum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00 |
| max_value | For numeric and date/time variables: the maximum value observed in the exported records. For date/time variables this is a PHP time value, i.e. the count of seconds since 1/1/1970 00:00:00 |
| sum_of_values | For numeric and date/time variables: the sum of the values observed in the exported records |
| sum_of_squared_values | For numeric and date/time variables: the sum of the squared values observed in the exported records |
| mean | For numeric and date/time variables: the average of the values observed in the exported records |
| standard_deviation | For numeric and date/time variables: the sample standard deviation of the values observed in the exported records |
| formatted_min_value | The minimum observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
| formatted_max_value | The maximum observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
| formatted_mean | The average observed value, formatted as necessary for DATE, DATETIME and TIME variables. |
| frequency_table | For NOMINAL variables: a JSON-encoded table of discrete values and their frequency counts. |