adtl – another data transformation language

adtl – another data transformation language#

Python 3.9+

tests codecov Code style: black

adtl is a data transformation language (DTL) used by some applications in Global.health, notably for the ISARIC clinical data pipeline at globaldothealth/isaric and the InsightBoard project dashboard at globaldothealth/InsightBoard

adtl is currently a prototype and is subject to major revisions

Motivation#

Most existing data transformation languages are usually in a XML dialect, though there are recent variations in other file formats. In addition, many DTLs use a custom domain specific language. The primary utility of this DTL is to provide a easy to use library in Python for basic data transformations, which are specified in either a JSON or TOML file. It is not meant to be a comprehensive, and adtl can be used as a step within a larger data processing pipeline.

AutoParser#

AutoParser provides a semi-automated method for writing the transformation files required by ADTL, by using LLMs for field and value mapping. This reduces the need for users to write JSON/TOML specification files from scratch by hand.

Specification

AutoParser