Parser Functions#
The following functions can be used to create the final TOML parser file
- adtl.autoparser.create_parser(mappings: DataFrame | str, schema_path: Path, parser_name: str, description: str | None = None, constant_fields: dict[str, dict[str, bool]] | None = None)
Takes the csv mapping file created by create_mapping and writes out a TOML parser
Generates a TOML parser for use with ADTL from the intermediate CSV file generated by create_mapping. This will generate a TOML file that can be used to parse raw data into the format expected by the schema.
- Parameters:
mappings – Path to the CSV file containing the mappings
schema_path – Path to the schema file
parser_name – Name of the parser to create
description – Description of the parser. Defaults to the parser name.
constant_fields – Constant fields are those which are single values, rather than taken from a field in the source data.
- Return type:
None
Class definitions#
You can also interact with the base class parserGenerator
- class adtl.autoparser.ParserGenerator(mappings: DataFrame | str | Path | dict[str, DataFrame | str | Path], schema_path: Path | str, parser_name: str, description: str | None = None, constant_fields: dict[str, dict[str, bool]] | None = None)#
Class for creating a TOML parser from an intermediate CSV file.
Use create_parser() to write out the TOML parser file, as the function equivalent of the command line create-parser script.
- Parameters:
mappings (pd.DataFrame | str | Path) – The intermediate CSV file created by create_mapping.py
schema_path (Path | str) – The path to the folder containing all the schema files
parser_name (str) – The name of the parser
description (str, optional) – The description of the parser
constant_fields (dict[str, dict[str, bool]], optional) – Constant fields are those which are single values, rather than taken from a field from the source data. For example, if an entire dataset is from the DRC, but a country field is in the target schema, there may not be a field in the dataset stating the country. A dictionary of constant fields for each table, where the keys are the table names and the values are boolean True/False values indicating whether the field should be pulled from the source data or not. All fields in wide tables default to False, while long tables default to True for all columns except the value column(s).
- create_parser(file_name: str = None)#
Main function to create the TOML parser from the intermediate CSV file.
- header() dict[str, Any]#
The ADTL-specific header for the TOML file
- make_single_parser() dict[str, Any]#
Takes the csv mapping file from create_mapping and writes out a TOML parser
Generates a TOML parser for use with ADTL using the intermediate CSV file from by create_mapping. This will generate a TOML file that can be used to parse raw data into the format expected by the schema.
- Returns:
Dictionary containing the TOML parser data, ready to be written out.
- Return type:
dict
- write_toml(data: dict[str, Any], output: str = None)#
Write a dictionary structure to a TOML file, using output as the filename if provided.
- Parameters:
data (dict) – Dictionary containing the TOML parser data
output (str, optional) – Filename to write the TOML data to. Defaults to the name of the parser.