Data Dictionary Validation Functions

Data Dictionary Validation Functions#

The following functions can be used to format and validate an external data dictionary (e.g. from REdCap) ready for use with AutoParser’s mapping functionalities.

adtl.autoparser.format_dict(data_dict: DataFrame | str, save=False) DataFrame

Formats a pre-existing data dictionary to use with autoparser, or checks one that is already pre-formatted.

Parameters:
  • data_dict – Path to a CSV, XLSX or parquet file, or a DataFrame, containing the data dictionary.

  • config – Path to the configuration file to use if not using the default configuration

  • save – If True, saves the formatted data dictionary to a parquet file in the same directory as the input data dictionary, as ‘formatted_data_dictionary.parquet’.

Returns:

Data dictionary containing field names, field types, and common values.

Return type:

pd.DataFrame

Class definitions#

You can also interact with the base class DictReader

class adtl.autoparser.DictReader(data_dict: DataFrame | str)#

Class for reading in and converting data dictionaries provided by users into a format usable by autoparser.

Validates the final data dictionary against the schema defined in adtl.autoparser.data_dict_schema.DataDictionaryProcessed.

Parameters:
  • data_dict – Path to a CSV, XLSX or parquet file, or a DataFrame, containing the data dictionary.

  • config – The path to the configuration file to use if not using the default configuration

save_formatted_dictionary(name: str | Path | None = None) None#

Save the formatted data dictionary to a parquet file.

The file will be saved in the same directory as the input data dictionary, either with the name provided or as ‘formatted_data_dict.parquet’