Module reference#

class adtl.Parser(spec: str | Path | dict[str, Any], include_defs: list[str] = [], include_transform: str | None = None, quiet: bool = False, verbose: bool = False, parallel: bool = False)#

Main parser class that loads a specification

Typical use of this within Python code:

import adtl

parser = adtl.Parser(specification)
print(parser.tables) # list of tables created

for row in parser.parse().read_table(table):
    print(row)
check_spec_fields(file) tuple[set, set]#

Compares fields in a data file to a given specification, to check for unmapped (present in data but not in spec) and absent (present in spec but not in data) fields

Parameters:

file – File to compare

Returns:

A tuple (missing, absent), where ‘missing’ is a set of fields missing from schema, and ‘absent’ is a set of fields present in schema but not in file.

clear()#

Clears parser state

get_spec_fields() set#

Returns all fields mapped in the specification (parser) file.

Returns:

A set of fields present in the specification

Return type:

schema_fields

group_rows(table: str, group_field: str, aggregation: str, rows: Iterable[dict[str, Any]])#

Applies the ‘groupBy’ rule and any ‘combinedType’ rules to the rows of data grouped by the group_field (e.g. an ID number).

parse(file: str, encoding: str = 'utf-8-sig', skip_validation=False)#

Transform file according to specification

Parameters:
  • file – Source file to transform

  • encoding – Source file encoding

  • skip_validation – Whether to skip validation, default off

Returns:

Returns an instance of itself, updated with the parsed tables

Return type:

adtl.Parser

parse_rows(rows: Iterable[dict[str, Any]], file_name: str, row_count: float | None = None, skip_validation=False)#

Transform rows from an iterable according to specification

Parameters:
  • rows – Iterable of rows, specified as a dictionary of (field name, field value) pairs

  • skip_validation – Whether to skip validation, default off

Returns:

Returns an instance of itself, updated with the parsed tables

Return type:

adtl.Parser

read_table(table: str) Iterable[dict[str, Any]]#

Returns parsed table

Parameters:

table – Table to read

Returns:

Iterable of transformed rows in table

save(output: str | None = None, format: Literal['csv', 'parquet'] = 'csv')#

Saves all tables to CSV

Parameters:

output – (optional) Filename prefix that is used for all tables

show_report()#

Shows report with validation errors

validate_spec()#

Raises exceptions if specification is invalid

write_csv(table: str, output: str | None = None) str | None#

Writes to output as CSV a particular table

Parameters:
  • table – Table that should be written to CSV

  • output – (optional) Output file name. If not specified, defaults to parser name + table name with a csv suffix.

write_parquet(table: str, output: str | None = None) str | None#

Writes to output as parquet a particular table

Parameters:
  • table – Table that should be written to parquet

  • output – (optional) Output file name. If not specified, defaults to parser name + table name with a parquet suffix.