Data Dictionary Validation Functions#
The following functions can be used to format and validate an external data dictionary (e.g. from REdCap) ready for use with AutoParser’s mapping functionalities.
- adtl.autoparser.format_dict(data_dict: DataFrame | str, save=False) DataFrame
Formats a pre-existing data dictionary to use with autoparser, or checks one that is already pre-formatted.
- Parameters:
data_dict – Path to a CSV, XLSX or parquet file, or a DataFrame, containing the data dictionary.
config – Path to the configuration file to use if not using the default configuration
save – If True, saves the formatted data dictionary to a parquet file in the same directory as the input data dictionary, as ‘formatted_data_dictionary.parquet’.
- Returns:
Data dictionary containing field names, field types, and common values.
- Return type:
pd.DataFrame
Class definitions#
You can also interact with the base class DictReader
- class adtl.autoparser.DictReader(data_dict: DataFrame | str)#
Class for reading in and converting data dictionaries provided by users into a format usable by autoparser.
Validates the final data dictionary against the schema defined in adtl.autoparser.data_dict_schema.DataDictionaryProcessed.
- Parameters:
data_dict – Path to a CSV, XLSX or parquet file, or a DataFrame, containing the data dictionary.
config – The path to the configuration file to use if not using the default configuration
- save_formatted_dictionary(name: str | Path | None = None) None#
Save the formatted data dictionary to a parquet file.
The file will be saved in the same directory as the input data dictionary, either with the name provided or as ‘formatted_data_dict.parquet’