Module reference#

class adtl.Parser(spec: str | Path | dict[str, Any], include_defs: list[str] = [], include_transform: str | None = None, quiet: bool = False, verbose: bool = False, parallel: bool = False)#

Main parser class that loads a specification

Typical use of this within Python code:

import adtl

parser = adtl.Parser(specification)
print(parser.tables) # list of tables created

for row in parser.parse().read_table(table):
    print(row)

check_spec_fields(file) → tuple[set, set]#

Compares fields in a data file to a given specification, to check for unmapped (present in data but not in spec) and absent (present in spec but not in data) fields

Parameters:: file – File to compare
Returns:: A tuple (missing, absent), where ‘missing’ is a set of fields missing from schema, and ‘absent’ is a set of fields present in schema but not in file.

clear()#: Clears parser state

get_spec_fields() → set#

Returns all fields mapped in the specification (parser) file.

Returns:: A set of fields present in the specification
Return type:: schema_fields

group_rows(table: str, group_field: str, aggregation: str, rows: Iterable[dict[str, Any]])#: Applies the ‘groupBy’ rule and any ‘combinedType’ rules to the rows of data grouped by the group_field (e.g. an ID number).

parse(file: str, encoding: str = 'utf-8-sig', skip_validation=False)#

Transform file according to specification

Parameters:

file – Source file to transform
encoding – Source file encoding
skip_validation – Whether to skip validation, default off

Returns:

Returns an instance of itself, updated with the parsed tables

Return type:

adtl.Parser

parse_rows(rows: Iterable[dict[str, Any]], file_name: str, row_count: float | None = None, skip_validation=False)#

Transform rows from an iterable according to specification

Parameters:

rows – Iterable of rows, specified as a dictionary of (field name, field value) pairs
skip_validation – Whether to skip validation, default off

Returns:

Returns an instance of itself, updated with the parsed tables

Return type:

adtl.Parser

read_table(table: str) → Iterable[dict[str, Any]]#

Returns parsed table

Parameters:: table – Table to read
Returns:: Iterable of transformed rows in table

save(output: str | None = None, format: Literal['csv', 'parquet'] = 'csv')#

Saves all tables to CSV

Parameters:: output – (optional) Filename prefix that is used for all tables

show_report()#: Shows report with validation errors

validate_spec()#: Raises exceptions if specification is invalid

write_csv(table: str, output: str | None = None) → str | None#

Writes to output as CSV a particular table

Parameters:

table – Table that should be written to CSV
output – (optional) Output file name. If not specified, defaults to parser name + table name with a csv suffix.

write_parquet(table: str, output: str | None = None) → str | None#

Writes to output as parquet a particular table

Parameters:

table – Table that should be written to parquet
output – (optional) Output file name. If not specified, defaults to parser name + table name with a parquet suffix.

Module reference

Contents

Module reference#