Digital biomarkers are quantitative, goal measures of physiological and behavioral knowledge. They’re collected and measured utilizing digital gadgets that higher signify free-living exercise in distinction to a extremely structured in-clinic setting. This strategy generates giant quantities of knowledge that requires processing.
Digital biomarker knowledge is often saved in numerous codecs, and varied sources require totally different (and sometimes a number of) processing steps. In 2020, the Digital Sciences and Translational Imaging group at Pfizer developed a specialised pipeline utilizing AWS providers for analyzing incoming sensor knowledge from wrist gadgets for sleep and scratch exercise.
In time, the unique pipeline wanted updating to compute digital biomarkers at scale whereas sustaining reproducibility and knowledge provenance. The important thing element in re-designing the pipeline is flexibility so the platform can:
- Deal with varied incoming knowledge sources and totally different algorithms
- Deal with distinct units of algorithm parameters
These objectives have been completed with a framework for dealing with file evaluation that makes use of a two-part strategy:
- A Python package deal utilizing a customized frequent structure
- An AWS-based pipeline to deal with processing mapping and computing distribution
This weblog publish introduces this tradition Python package deal knowledge processing pipeline utilizing AWS providers. The AWS structure maintains knowledge provenance whereas enabling quick, environment friendly, and scalable knowledge processing. These giant knowledge units would in any other case…