Module reference#
- class adtl.Parser(spec: str | Path | dict[str, Any], include_defs: list[str] = [], include_transform: str | None = None, quiet: bool = False, verbose: bool = False, parallel: bool = False)#
Main parser class that loads a specification
Typical use of this within Python code:
import adtl parser = adtl.Parser(specification) print(parser.tables) # list of tables created for row in parser.parse().read_table(table): print(row)
- check_spec_fields(file) tuple[set, set]#
Compares fields in a data file to a given specification, to check for unmapped (present in data but not in spec) and absent (present in spec but not in data) fields
- Parameters:
file – File to compare
- Returns:
A tuple (missing, absent), where ‘missing’ is a set of fields missing from schema, and ‘absent’ is a set of fields present in schema but not in file.
- clear()#
Clears parser state
- get_spec_fields() set#
Returns all fields mapped in the specification (parser) file.
- Returns:
A set of fields present in the specification
- Return type:
schema_fields
- group_rows(table: str, group_field: str, aggregation: str, rows: Iterable[dict[str, Any]])#
Applies the ‘groupBy’ rule and any ‘combinedType’ rules to the rows of data grouped by the group_field (e.g. an ID number).
- parse(file: str, encoding: str = 'utf-8-sig', skip_validation=False)#
Transform file according to specification
- Parameters:
file – Source file to transform
encoding – Source file encoding
skip_validation – Whether to skip validation, default off
- Returns:
Returns an instance of itself, updated with the parsed tables
- Return type:
- parse_rows(rows: Iterable[dict[str, Any]], file_name: str, row_count: float | None = None, skip_validation=False)#
Transform rows from an iterable according to specification
- Parameters:
rows – Iterable of rows, specified as a dictionary of (field name, field value) pairs
skip_validation – Whether to skip validation, default off
- Returns:
Returns an instance of itself, updated with the parsed tables
- Return type:
- read_table(table: str) Iterable[dict[str, Any]]#
Returns parsed table
- Parameters:
table – Table to read
- Returns:
Iterable of transformed rows in table
- save(output: str | None = None, format: Literal['csv', 'parquet'] = 'csv')#
Saves all tables to CSV
- Parameters:
output – (optional) Filename prefix that is used for all tables
- show_report()#
Shows report with validation errors
- validate_spec()#
Raises exceptions if specification is invalid
- write_csv(table: str, output: str | None = None) str | None#
Writes to output as CSV a particular table
- Parameters:
table – Table that should be written to CSV
output – (optional) Output file name. If not specified, defaults to parser name + table name with a csv suffix.
- write_parquet(table: str, output: str | None = None) str | None#
Writes to output as parquet a particular table
- Parameters:
table – Table that should be written to parquet
output – (optional) Output file name. If not specified, defaults to parser name + table name with a parquet suffix.