prenatalppkt
A Python library for transforming raw prenatal sonography data into standardized GA4GH Phenopackets (v2) with clinically validated fetal growth references from NIHCD and INTERGROWTH-21st.
Table of Contents
- Overview
- Motivation
- Architecture
- Data Flow
- Module Breakdown
- Configuration Guide
- Inputs and Outputs
- Installation
- Usage Examples
- Testing
- Future Roadmap
- Contributing
- License
- Acknowledgments
- Citation
- Support
- Visuals
Overview
prenatalppkt bridges the gap between clinical prenatal ultrasound measurements and machine-readable, ontology-aware phenotype representations. The library:
- Standardizes biometric data from multiple ultrasound reporting systems (Observer JSON, ViewPoint Excel)
- Evaluates measurements against authoritative growth references (NIHCD and INTERGROWTH-21st)
- Maps percentile classifications to Human Phenotype Ontology (HPO) terms
- Generates GA4GH Phenopackets with complete provenance and metadata
This enables federated genomic repositories to integrate prenatal phenotype data with whole exome/genome sequencing (WES/WGS) results in a consistent, computationally tractable format.
Motivation
Clinical Context
Prenatal ultrasound biometry provides critical developmental markers for fetal health assessment. Key measurements include:
- Head Circumference (HC): Marker for brain development
- Biparietal Diameter (BPD): Skull width measurement
- Abdominal Circumference (AC): Indicates fetal nutrition status
- Femur Length (FL): Long bone growth indicator
- Occipito-Frontal Diameter (OFD): Alternative skull measurement
- Estimated Fetal Weight (EFW): Overall growth assessment
Technical Challenges
- Data heterogeneity: Ultrasound systems export data in proprietary formats (ViewPoint Excel, Observer JSON)
- Reference ambiguity: Multiple growth standards exist (NIHCD, INTERGROWTH-21st) with different population bases
- Ontology mapping: Converting numeric percentiles to standardized phenotype terms requires domain expertise
- Genomic integration: Linking prenatal observations to genetic data demands structured, machine-readable formats
- Maintainability: Hard-coded mapping logic becomes brittle as clinical guidelines evolve
Solution
prenatalppkt provides a unified, configuration-driven pipeline from raw measurements to Phenopackets, enabling:
- Reproducible phenotype analysis across institutions
- Integration with genomic variant interpretation workflows
- Federated data sharing with privacy-preserving pseudonymization
- Flexible ontology mapping through declarative YAML configuration
- Longitudinal tracking of fetal development
Architecture
The system implements a data-driven, configuration-based architecture with clean separation between measurement evaluation, ontology mapping, and export logic.
Core Design Principles
- Configuration over Code: HPO term mappings are defined declaratively in YAML, not hard-coded in Python classes
- Dependency Injection: Measurement evaluators receive configuration at instantiation, enabling flexible testing and deployment
- Single Responsibility: Each component has one well-defined purpose
- Open/Closed Principle: New measurements and mapping rules are added via configuration files, not code changes
System Layers
flowchart LR
subgraph L1["Layer 1: Configuration (YAML)"]
YAML["data/mappings/biometry_hpo_mappings.yaml
• Percentile ranges (min/max)
• HPO term IDs and labels
• Normal/abnormal flags"]
end
subgraph L2["Layer 2: Data Models"]
PR["PercentileRange
• min_percentile
• max_percentile
• contains(percentile)"]
TB["TermBin
• range: PercentileRange
• hpo_id, hpo_label
• normal: bool
• category (auto-detected)"]
TO["TermObservation
• hpo_id, hpo_label
• observed: bool
• gestational_age
• percentile"]
end
subgraph L3["Layer 3: Loading & Validation"]
Loader["BiometryMappingLoader
• load(yaml_path)
• Parses YAML → TermBin objects
• Validates ranges
• Sorts by min_percentile"]
end
subgraph L4["Layer 4: Business Logic"]
SM["SonographicMeasurement
• measurement_type: str
• term_bins: List[TermBin]
• from_percentile() → TermObservation"]
Factory["MeasurementEvaluation
• Factory pattern
• Loads all mappings once
• get_measurement_mapper()"]
end
subgraph L5["Layer 5: Reference Data"]
Ref["FetalGrowthPercentiles
• NIHCD / INTERGROWTH-21st tables
• Percentile calculation
• Z-score calculation"]
end
subgraph L6["Layer 6: Export"]
Export["PhenotypicExporter
• Phenopacket v2 assembly
• QC validation
• JSON serialization"]
end
YAML --> Loader
Loader --> PR
Loader --> TB
Loader --> Factory
Factory --> SM
SM --> TO
Ref --> SM
TO --> Export
classDef config fill:#fff4e6,stroke:#333,stroke-width:2px
classDef model fill:#e3f2fd,stroke:#333,stroke-width:2px
classDef logic fill:#f3e5f5,stroke:#333,stroke-width:2px
classDef export fill:#e8f5e9,stroke:#333,stroke-width:2px
class YAML config
class PR,TB,TO model
class Loader,SM,Factory logic
class Ref logic
class Export export
Key Architectural Choice: Configuration-Driven Mapping
OLD APPROACH (Hard-coded):
| # Each measurement had its own class with hard-coded logic
class HeadCircumferenceMeasurement(SonographicMeasurement):
def get_bin_to_term_mapping(self):
return {
"below_3p": MinimalTerm("HP:0000252", "Microcephaly"),
"between_3p_5p": MinimalTerm("HP:0040195", "Decreased HC"),
# ... 6 more hard-coded bins
}
|
NEW APPROACH (Data-driven):
| # data/mappings/biometry_hpo_mappings.yaml
head_circumference:
- min: 0
max: 3
id: "HP:0000252"
label: "Microcephaly"
normal: false
- min: 3
max: 5
id: "HP:0040195"
label: "Decreased head circumference"
normal: false
# ... all 8 ranges covering 0-100 percentile
|
| # Python code loads configuration, no hard-coding needed
factory = MeasurementEvaluation() # Loads YAML once
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(2.1, gestational_age)
# Returns: TermObservation(hpo_id="HP:0000252", hpo_label="Microcephaly", ...)
|
Parsing Observer (JSON)
The prenatalppkt package parses and transforms ultrasound ("Observer") JSON data into structured Python data transfer objects (DTOs). For example, each top-level fetuses array element represents one fetus, with standardized subkeys:
| JSON subkey | Parser | DTO | Purpose |
|--------------|--------|------|----------|
| fetus | FetusFetusParser | FetusCoreData | Core fetal metadata (GA, sex, presentation) |
| anatomy_text | FetusAnatomyTextParser | hpo_term_list (List[SimpleTerm]) | Qualitative HPO terms from anatomy report |
| measurements | FetusMeasurementsParser | MeasurementsData (Measurement list) | Quantitative biometric data |
| ratios | FetusRatiosParser | FetusRatiosData | Computed biometric ratios (e.g., HC/AC) |
| efws | FetusEfwParser | FetusEfwData (EfwEntry list) | Estimated fetal weights |
A central FetusParser coordinates all sub-parsers and assembles their results into a unified FetusData object. This modular architecture ensures each JSON subkey is isolated, testable, and easily extendable for future Observer fields (e.g., placenta, bpp, etc.).
Package structure
graph TD
A[ExamDataParser] --> B[FetusParser]
B --> C1[FetusFetusParser]
B --> C2[FetusAnatomyTextParser]
B --> C3[FetusMeasurementsParser]
B --> C4[FetusRatiosParser]
B --> C5[FetusEfwParser]
C1 --> D1[FetusCoreData]
C2 --> D2[List of SimpleTerm]
C3 --> D3[MeasurementsData / Measurement]
C4 --> D4[FetusRatiosData / Ratio]
C5 --> D5[FetusEfwData / EfwEntry]
Each fetus_*_parser.py is responsible for interpreting a single JSON section and producing its corresponding DTO in prenatalppkt/dto/fetuses/. The FetusData class then aggregates all of them into a cohesive representation for one fetus.
Fetus Parsing Flow
graph TD
JSON[Observer JSON fetuses] --> FP[FetusParser]
FP --> |fetus| Core[FetusFetusParser -> FetusCoreData]
FP --> |anatomy_text| Anat[FetusAnatomyTextParser -> List of SimpleTerms]
FP --> |measurements| Meas[FetusMeasurementsParser -> MeasurementsData]
FP --> |ratios| Rat[FetusRatiosParser -> FetusRatiosData]
FP --> |efws| Efw[FetusEfwParser -> FetusEfwData]
Parsing ViewPoint (VPL)
System Class Diagram
classDiagram
%% Configuration Layer
class BiometryMappingsYAML {
<<Configuration>>
head_circumference[]
biparietal_diameter[]
femur_length[]
abdominal_circumference[]
occipitofrontal_diameter[]
}
%% Data Models
class PercentileRange {
+min_percentile: float
+max_percentile: float
+contains(percentile: float) bool
}
class TermBin {
+range: PercentileRange
+hpo_id: str
+hpo_label: str
+normal: bool
+fits(percentile: float) bool
+category: str
}
class TermObservation {
+hpo_id: str
+hpo_label: str
+category: str
+observed: bool
+gestational_age: GestationalAge
+percentile: float
+to_phenotypic_feature() dict
}
class GestationalAge {
+weeks: int
+days: int
+from_weeks(float) GestationalAge
+to_iso() str
}
%% Loading Layer
class BiometryMappingLoader {
<<Service>>
+load(path: Path) Dict[str, List[TermBin]]
}
%% Business Logic Layer
class SonographicMeasurement {
+measurement_type: str
+term_bins: List[TermBin]
+from_percentile(percentile, ga) TermObservation
+name() str
}
class MeasurementEvaluation {
<<Factory>>
-_mappings: Dict[str, List[TermBin]]
+__init__(mappings_path?)
+get_measurement_mapper(type: str) SonographicMeasurement
}
%% Reference Data Layer
class FetalGrowthPercentiles {
+source: str
+tables: Dict[str, DataFrame]
+calculate_percentile(measurement, ga, value) float
+get_z_score(measurement, ga, value) float
+lookup_percentile(measurement, ga, value) float
}
%% Export Layer
class PhenotypicExporter {
+term_observations: List[TermObservation]
+build_phenopacket() dict
+to_json() str
+validate() QCReport
}
class QCValidator {
+validate_schema(json) List[Error]
+validate_ontology_terms() List[Error]
+check_completeness() List[Warning]
}
%% Relationships
BiometryMappingsYAML ..> BiometryMappingLoader : reads
BiometryMappingLoader --> PercentileRange : creates
BiometryMappingLoader --> TermBin : creates
TermBin *-- PercentileRange : contains
BiometryMappingLoader --> MeasurementEvaluation : provides mappings
MeasurementEvaluation --> SonographicMeasurement : creates
SonographicMeasurement *-- TermBin : configured with
SonographicMeasurement --> TermObservation : produces
TermObservation *-- GestationalAge : includes
FetalGrowthPercentiles ..> SonographicMeasurement : provides percentiles
PhenotypicExporter *-- TermObservation : collects
PhenotypicExporter --> QCValidator : uses
Data Flow
End-to-End Processing Pipeline
The new architecture streamlines the flow from raw measurement to Phenopacket:
sequenceDiagram
participant User
participant Parser as Input Parser
participant GA as GestationalAge
participant Ref as FetalGrowthPercentiles
participant Factory as MeasurementEvaluation
participant Mapper as SonographicMeasurement
participant Export as PhenotypicExporter
participant Output as Phenopacket JSON
User->>Parser: Load ultrasound report (JSON/XLSX)
Parser->>GA: Parse gestational age string
GA-->>Parser: GestationalAge(weeks=20, days=6)
Note over Factory: ONE-TIME INITIALIZATION
Factory->>Factory: Load biometry_hpo_mappings.yaml
Factory->>Factory: Create TermBins for all measurements
Parser->>Ref: Request percentile for HC at 20w6d
Ref->>Ref: Interpolate INTERGROWTH table
Ref-->>Parser: percentile = 2.1
Parser->>Factory: get_measurement_mapper("head_circumference")
Factory-->>Parser: SonographicMeasurement(term_bins=[...])
Parser->>Mapper: from_percentile(2.1, gestational_age)
Note over Mapper: DATA-DRIVEN LOOKUP
Mapper->>Mapper: Iterate through term_bins
Mapper->>Mapper: Find TermBin where range.contains(2.1)
Mapper->>Mapper: Found: [0, 3) -> HP:0000252 "Microcephaly"
Mapper-->>Parser: TermObservation(
Note right of Mapper: hpo_id="HP:0000252"
Note right of Mapper: hpo_label="Microcephaly"
Note right of Mapper: observed=True
Note right of Mapper: category="lower_extreme_term"
Note right of Mapper: percentile=2.1)
Parser->>Export: Add TermObservation to export batch
Export->>Export: Build Phenopacket structure
Export->>Export: QC validation
Export-->>Output: Write JSON file
Output-->>User: Phenopacket with HPO term + provenance
Detailed Step-by-Step Flow
1. Configuration Loading (Happens Once)
flowchart LR
A["Application Startup"] --> B["MeasurementEvaluation.__init__()"]
B --> C["BiometryMappingLoader.load('biometry_hpo_mappings.yaml')"]
C --> D["Parse YAML<br/>→ Create PercentileRange objects"]
D --> E["Create TermBin objects linking ranges to HPO terms"]
E --> F["Store in dictionary:<br/>{ 'head_circumference': [TermBin(...), ...],<br/>'biparietal_diameter': [...], ... }"]
2. Measurement Processing (Per Observation)
flowchart LR
A["Raw Input:<br/>HC = 175mm at 20w6d"] --> B["GestationalAge.from_weeks(20.86)<br/>→ GestationalAge(weeks=20, days=6)"]
B --> C["FetalGrowthPercentiles.calculate_percentile('head_circumference', 20.86, 175.0)<br/>→ Lookup INTERGROWTH table<br/>→ Interpolate between 20w and 21w<br/>→ Return percentile: 2.1"]
C --> D["factory.get_measurement_mapper('head_circumference')<br/>→ Returns SonographicMeasurement with 8 TermBins"]
D --> E["mapper.from_percentile(2.1, gestational_age)<br/>→ Finds TermBin [0,3): HP:0000252 'Microcephaly'"]
E --> F["Creates TermObservation:<br/>• hpo_id=HP:0000252<br/>• observed=True<br/>• category='lower_extreme_term'"]
F --> G["TermObservation.to_phenotypic_feature()<br/>→ Phenopacket JSON output"]
3. Multi-Measurement Workflow
flowchart LR
Input["Raw Ultrasound Report<br/>(JSON or Excel)"] --> Parse["Parse Measurements"]
Parse --> HC["HC<br/>175 mm"]
Parse --> BPD["BPD<br/>45 mm"]
Parse --> FL["FL<br/>30 mm"]
subgraph Processing["Parallel Processing"]
direction LR
HC --> HCMap["HC Mapper"] --> HCObs["TermObservation"]
BPD --> BPDMap["BPD Mapper"] --> BPDObs["TermObservation"]
FL --> FLMap["FL Mapper"] --> FLObs["TermObservation"]
end
HCObs --> Collect["Collect All Observations"]
BPDObs --> Collect
FLObs --> Collect
Collect --> PP["Assemble Phenopacket components"]
PP --> QC["Quality Control"]
QC --> Output["Build Phenopacket: JSON Output"]
Module Breakdown
Core Architecture Modules
src/prenatalppkt/measurements/term_bin.py
Data structures for configuration-driven ontology mapping:
| @dataclass
class PercentileRange:
"""Represents a percentile interval [min, max)."""
min_percentile: float
max_percentile: float
def contains(self, percentile: float) -> bool:
"""Check if percentile falls within this range."""
return self.min_percentile <= percentile < self.max_percentile
@dataclass
class TermBin:
"""Links a percentile range to an HPO term."""
range: PercentileRange
hpo_id: str
hpo_label: str
normal: bool # Explicit flag: is this range considered normal?
def fits(self, percentile: float) -> bool:
"""Check if percentile fits in this bin."""
return self.range.contains(percentile)
@property
def category(self) -> str:
"""Auto-categorize based on boundaries."""
if self.range.min_percentile == 0:
return "lower_extreme_term"
elif self.range.max_percentile == 100:
return "upper_extreme_term"
elif self.normal:
return "normal_term"
else:
return "abnormal_term"
|
Purpose: Pure data structures with no business logic. Can be easily serialized, tested, and validated.
src/prenatalppkt/mapping_loader.py
Handles all YAML parsing and TermBin construction:
| class BiometryMappingLoader:
"""
Loads HPO mappings from YAML configuration.
Separates file I/O from measurement evaluation logic.
"""
@staticmethod
def load(path: Path) -> Dict[str, List[TermBin]]:
"""
Load biometry-to-HPO mappings from YAML.
Returns:
Dictionary mapping measurement types to sorted lists of TermBins
Example:
{
"head_circumference": [
TermBin(range=[0,3), id="HP:0000252", ...),
TermBin(range=[3,5), id="HP:0040195", ...),
...
],
"biparietal_diameter": [...]
}
"""
|
Key Features:
- Validates YAML structure
- Creates PercentileRange and TermBin objects
- Sorts bins by min_percentile for efficient lookup
- Logs warnings for gaps or overlaps
- Single point of failure for configuration errors
src/prenatalppkt/measurement_eval.py
Factory pattern for creating measurement evaluators:
| class MeasurementEvaluation:
"""
Factory for measurement mappers.
Loads configuration once, creates mappers on demand.
"""
def __init__(self, mappings_path: Optional[Path] = None) -> None:
"""Initialize with YAML path (defaults to bundled config)."""
self._mappings = BiometryMappingLoader.load(
mappings_path or DEFAULT_MAPPINGS_FILE
)
def get_measurement_mapper(
self,
measurement_type: str
) -> Optional[SonographicMeasurement]:
"""
Get a configured mapper for the specified measurement.
Example:
factory = MeasurementEvaluation()
hc_mapper = factory.get_measurement_mapper("head_circumference")
observation = hc_mapper.from_percentile(2.1, gestational_age)
"""
|
Design Pattern: Factory + Singleton behavior (loads YAML once, reuses mappings)
src/prenatalppkt/sonographic_measurement.py
Generic measurement mapper (no longer abstract, no subclasses needed):
| class SonographicMeasurement:
"""
Generic measurement mapper using configured TermBins.
Replaces all measurement-specific subclasses.
"""
def __init__(self, measurement_type: str, term_bins: List[TermBin]) -> None:
"""Configuration is INJECTED at instantiation."""
self.measurement_type = measurement_type
self.term_bins = term_bins
def from_percentile(
self,
percentile: float,
gestational_age: GestationalAge
) -> TermObservation:
"""
Map a percentile to an HPO term observation.
DATA-DRIVEN - no hard-coded if/elif chains!
"""
for term_bin in self.term_bins:
if term_bin.fits(percentile):
return TermObservation(
hpo_id=term_bin.hpo_id,
hpo_label=term_bin.hpo_label,
category=term_bin.category,
observed=not term_bin.normal,
gestational_age=gestational_age,
percentile=percentile,
)
raise ValueError(
f"No HPO mapping found for {self.measurement_type} "
f"percentile {percentile:.1f}"
)
|
Key Change: No more inheritance hierarchy! One generic class works for all measurements.
src/prenatalppkt/term_observation.py
Lightweight data holder (no complex logic or external dependencies):
| @dataclass
class TermObservation:
"""HPO term observation with gestational age context."""
hpo_id: str
hpo_label: str
category: str
observed: bool
gestational_age: GestationalAge
percentile: Optional[float] = None
def to_phenotypic_feature(self) -> Dict[str, object]:
"""Convert to Phenopacket v2 format."""
ga_str = f"{self.gestational_age.weeks}w{self.gestational_age.days}d"
return {
"type": {"id": self.hpo_id, "label": self.hpo_label},
"excluded": not self.observed,
"onset": {"gestationalAge": self.gestational_age.to_iso()},
"description": f"Measurement at {ga_str}"
}
|
Removed Dependencies:
- No longer depends on MinimalTerm from hpo-toolkit
- No __post_init__ logic
- No build_standard_bin_mapping() method
Reference Data Modules
src/prenatalppkt/biometry_reference.py
Unified interface for loading and querying fetal growth reference data:
| class FetalGrowthPercentiles:
"""
Load and query NIHCD or INTERGROWTH-21st fetal growth references.
Supports:
- Percentile lookup by gestational age
- Z-score calculation
- Linear interpolation for non-integer gestational ages
"""
def __init__(self, source: str = "intergrowth") -> None:
"""
Initialize with reference data source.
Args:
source: "nihcd" or "intergrowth"
"""
def calculate_percentile(
self,
measurement_type: str,
gestational_age_weeks: float,
value_mm: float
) -> float:
"""
Calculate which percentile a measurement falls into.
Returns:
Percentile value (0-100)
"""
|
Key features:
- Loads parsed TSV tables from data/parsed/
- Handles gestational age interpolation
- Supports both centile and z-score tables
- Validates measurement types and ranges
src/prenatalppkt/gestational_age.py
Represents gestational age with weeks + days:
| @dataclass
class GestationalAge:
"""Gestational age representation."""
weeks: int
days: int
@classmethod
def from_weeks(cls, total_weeks: float) -> GestationalAge:
"""Convert decimal weeks to weeks+days."""
weeks = int(total_weeks)
days = int((total_weeks - weeks) * 7)
return cls(weeks=weeks, days=days)
def to_iso(self) -> dict:
"""Convert to Phenopacket ISO format."""
return {"weeks": self.weeks, "days": self.days}
|
Data Parsing Modules
scripts/parse_nichd_raw.py
Parses NIHCD raw text data into standardized TSV format:
| def parse_nichd_raw(input_file: Path, output_dir: Path) -> None:
"""
Parse NIHCD fetal growth calculator text export.
Handles:
- Multi-word measurement names
- Race/ethnicity categories
- Multiple percentile columns
- Header/footer junk lines
"""
|
scripts/parse_intergrowth_txt_all.py
Parses INTERGROWTH-21st centile and z-score tables:
| def parse_intergrowth_tables(raw_dir: Path, out_dir: Path) -> None:
"""
Parse INTERGROWTH centile (_ct_) and z-score (_zs_) tables.
Handles:
- Text file parsing
- Gestational age range validation
- Measure name normalization
- Provenance metadata
"""
|
Export Modules
src/prenatalppkt/phenotypic_export.py
Assembles Phenopackets from TermObservations:
| class PhenotypicExporter:
"""
Build GA4GH Phenopackets v2 from term observations.
"""
def __init__(self) -> None:
self.term_observations: List[TermObservation] = []
def add_observation(self, obs: TermObservation) -> None:
"""Add an observation to the export batch."""
self.term_observations.append(obs)
def build_phenopacket(
self,
subject_id: str,
maternal_id: Optional[str] = None
) -> dict:
"""
Build complete Phenopacket structure.
Returns:
Phenopacket v2 compliant dictionary
"""
|
Configuration Guide
YAML Mapping Structure
The data/mappings/biometry_hpo_mappings.yaml file defines how percentile values map to HPO terms:
| # Template for each measurement
measurement_name:
- min: <float> # Minimum percentile (inclusive)
max: <float> # Maximum percentile (exclusive)
id: "<HPO:ID>" # HPO term identifier
label: "<string>" # Human-readable label
normal: <boolean> # Is this range considered normal?
|
Complete Example: Head Circumference
| head_circumference:
# Extreme low: <3rd percentile
- min: 0
max: 3
id: "HP:0000252"
label: "Microcephaly"
normal: false
# Borderline low: 3rd-5th percentile
- min: 3
max: 5
id: "HP:0040195"
label: "Decreased head circumference"
normal: false
# Mildly abnormal low: 5th-10th percentile
- min: 5
max: 10
id: "HP:0000240"
label: "Abnormality of skull size"
normal: false
# Normal range: 10th-50th percentile
- min: 10
max: 50
id: "HP:0000240"
label: "Abnormality of skull size"
normal: true # Marked as normal
# Normal range: 50th-90th percentile
- min: 50
max: 90
id: "HP:0000240"
label: "Abnormality of skull size"
normal: true
# Mildly abnormal high: 90th-95th percentile
- min: 90
max: 95
id: "HP:0000240"
label: "Abnormality of skull size"
normal: false
# Borderline high: 95th-97th percentile
- min: 95
max: 97
id: "HP:0040194"
label: "Increased head circumference"
normal: false
# Extreme high: >97th percentile
- min: 97
max: 100
id: "HP:0000256"
label: "Macrocephaly"
normal: false
|
Validation Rules
The system automatically validates:
- Complete Coverage: Ranges must span [0, 100) with no gaps
- No Overlaps: Each percentile value must map to exactly one bin
- Sorted Order: Ranges must be in ascending order by
min
- Valid Percentiles:
0 <= min < max <= 100
- HPO Term Format: IDs must match pattern
HP:\d{7}
Adding a New Measurement
| # 1. Add to biometry_hpo_mappings.yaml
estimated_fetal_weight:
- min: 0
max: 10
id: "HP:0001518"
label: "Small for gestational age"
normal: false
- min: 10
max: 90
id: "HP:0000118" # Generic placeholder
label: "Phenotypic abnormality"
normal: true
- min: 90
max: 100
id: "HP:0001520"
label: "Large for gestational age"
normal: false
# 2. Use immediately (no code changes needed!)
factory = MeasurementEvaluation()
efw_mapper = factory.get_measurement_mapper("estimated_fetal_weight")
|
Customizing Normal Ranges
Different clinical contexts may define "normal" differently:
| # Conservative definition (narrower normal range)
head_circumference_conservative:
- min: 0
max: 5
id: "HP:0000252"
label: "Microcephaly"
normal: false
- min: 5
max: 15 # More restrictive
id: "HP:0040195"
label: "Decreased head circumference"
normal: false
- min: 15
max: 85 # Narrower normal range
id: "HP:0000240"
label: "Abnormality of skull size"
normal: true
# ... continue pattern
|
Load with:
| factory = MeasurementEvaluation(
mappings_path=Path("config/conservative_mappings.yaml")
)
|
1. Observer JSON
| {
"exam": {
"patient_dob": "1990-01-15",
"lmp_date": "2024-03-10",
"exam_date": "2024-08-15",
"icd10_codes": ["Z34.00"]
},
"fetuses": [
{
"fetus_id": 1,
"measurements": {
"bpd_mm": 45.2,
"hc_mm": 175.3,
"ac_mm": 150.1,
"fl_mm": 32.5
},
"anatomy": {
"cranium": "normal",
"heart": "four_chamber_view_normal"
}
}
]
}
|
2. ViewPoint Excel (.xlsx)
| ExamDate |
LMP |
Fetus |
BPD (mm) |
HC (mm) |
AC (mm) |
FL (mm) |
| 2024-08-15 |
2024-03-10 |
1 |
45.2 |
175.3 |
150.1 |
32.5 |
Note: ViewPoint uses proprietary dropdown lists (.vpl files) for anatomy findings. See docs/viewpoint_dropdown_options.md for conversion utilities.
| {
"id": "prenatal-exam-20240815-fetus1",
"subject": {
"id": "FETUS_001",
"timeAtLastEncounter": {
"gestationalAge": {
"weeks": 20,
"days": 6
}
}
},
"phenotypicFeatures": [
{
"type": {
"id": "HP:0000252",
"label": "Microcephaly"
},
"excluded": false,
"onset": {
"gestationalAge": {
"weeks": 20,
"days": 6
}
},
"description": "Measurement at 20w6d"
},
{
"type": {
"id": "HP:0000240",
"label": "Abnormality of skull size"
},
"excluded": true,
"onset": {
"gestationalAge": {
"weeks": 20,
"days": 6
}
},
"description": "Measurement within normal range for gestational age (20w6d)"
}
],
"measurements": [
{
"assay": {
"id": "LOINC:11820-8",
"label": "Head circumference"
},
"value": {
"quantity": {
"unit": {
"id": "UCUM:mm",
"label": "millimeter"
},
"value": 175.3
}
},
"timeObserved": {
"gestationalAge": {
"weeks": 20,
"days": 6
}
}
}
],
"metaData": {
"created": "2024-08-15T14:30:00Z",
"createdBy": "prenatalppkt-v0.1.0",
"resources": [
{
"id": "hp",
"name": "Human Phenotype Ontology",
"url": "http://purl.obolibrary.org/obo/hp.owl",
"version": "2024-04-26",
"namespacePrefix": "HP",
"iriPrefix": "http://purl.obolibrary.org/obo/HP_"
},
{
"id": "intergrowth",
"name": "INTERGROWTH-21st Standards",
"url": "https://intergrowth21.tghn.org/",
"version": "2014",
"namespacePrefix": "INTERGROWTH"
}
],
"phenopacketSchemaVersion": "2.0"
}
}
|
Installation
Prerequisites
- Python 3.10 or higher
- pip package manager
Install from Source
| # Clone the repository
git clone https://github.com/P2GX/prenatalppkt.git
cd prenatalppkt
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[test]"
# Verify installation
python -c "import prenatalppkt; print(prenatalppkt.__version__)"
|
Install Dependencies Only
| pip install -r requirements/requirements.txt
|
Optional: Documentation Build
| pip install -e ".[docs]"
mkdocs serve # View docs at http://localhost:8000
|
Usage Examples
Basic Workflow
| from prenatalppkt.measurement_eval import MeasurementEvaluation
from prenatalppkt.biometry_reference import FetalGrowthPercentiles
from prenatalppkt.gestational_age import GestationalAge
# 1. Initialize (loads YAML configuration once)
factory = MeasurementEvaluation()
ref_data = FetalGrowthPercentiles(source="intergrowth")
# 2. Parse gestational age
ga = GestationalAge.from_weeks(20.86) # 20 weeks, 6 days
# 3. Calculate percentile
percentile = ref_data.calculate_percentile(
measurement_type="head_circumference",
gestational_age_weeks=20.86,
value_mm=175.0
)
# Returns: 2.1 (well below 3rd percentile)
# 4. Get measurement mapper
hc_mapper = factory.get_measurement_mapper("head_circumference")
# 5. Map to HPO term
observation = hc_mapper.from_percentile(percentile, ga)
# Result:
# TermObservation(
# hpo_id="HP:0000252",
# hpo_label="Microcephaly",
# category="lower_extreme_term",
# observed=True,
# gestational_age=GestationalAge(weeks=20, days=6),
# percentile=2.1
# )
# 6. Convert to Phenopacket format
phenotypic_feature = observation.to_phenotypic_feature()
# {
# "type": {"id": "HP:0000252", "label": "Microcephaly"},
# "excluded": false,
# "onset": {"gestationalAge": {"weeks": 20, "days": 6}},
# "description": "Measurement at 20w6d"
# }
|
Batch Processing Multiple Measurements
| from prenatalppkt.measurement_eval import MeasurementEvaluation
from prenatalppkt.biometry_reference import FetalGrowthPercentiles
from prenatalppkt.gestational_age import GestationalAge
# Initialize once
factory = MeasurementEvaluation()
ref_data = FetalGrowthPercentiles(source="intergrowth")
ga = GestationalAge.from_weeks(22.5)
# Raw measurements from ultrasound
measurements = {
"head_circumference": 196.3,
"biparietal_diameter": 52.1,
"femur_length": 35.8,
"abdominal_circumference": 170.2
}
# Process all measurements
observations = []
for measurement_type, value_mm in measurements.items():
# Calculate percentile
percentile = ref_data.calculate_percentile(measurement_type, 22.5, value_mm)
# Get mapper and create observation
mapper = factory.get_measurement_mapper(measurement_type)
obs = mapper.from_percentile(percentile, ga)
observations.append(obs)
# Build Phenopacket
from prenatalppkt.phenotypic_export import PhenotypicExporter
exporter = PhenotypicExporter()
for obs in observations:
exporter.add_observation(obs)
phenopacket = exporter.build_phenopacket(
subject_id="FETUS_001",
maternal_id="MOTHER_001"
)
|
Custom Configuration
| from pathlib import Path
from prenatalppkt.measurement_eval import MeasurementEvaluation
# Use custom YAML configuration
custom_mappings = Path("config/custom_hpo_mappings.yaml")
factory = MeasurementEvaluation(mappings_path=custom_mappings)
# Rest of workflow is identical
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(15.2, ga)
|
Testing with Mock Configuration
| from prenatalppkt.measurements.term_bin import TermBin, PercentileRange
from prenatalppkt.sonographic_measurement import SonographicMeasurement
from prenatalppkt.gestational_age import GestationalAge
# Create mock configuration for testing
test_bins = [
TermBin(
range=PercentileRange(0, 10),
hpo_id="HP:TEST001",
hpo_label="Low test value",
normal=False
),
TermBin(
range=PercentileRange(10, 90),
hpo_id="HP:TEST002",
hpo_label="Normal test value",
normal=True
),
TermBin(
range=PercentileRange(90, 100),
hpo_id="HP:TEST003",
hpo_label="High test value",
normal=False
),
]
# Create mapper with mock config
test_mapper = SonographicMeasurement("test_measurement", test_bins)
# Test with various percentiles
ga = GestationalAge(weeks=20, days=0)
obs_low = test_mapper.from_percentile(5.0, ga)
obs_normal = test_mapper.from_percentile(50.0, ga)
obs_high = test_mapper.from_percentile(95.0, ga)
assert obs_low.hpo_id == "HP:TEST001"
assert obs_low.observed == True
assert obs_normal.hpo_id == "HP:TEST002"
assert obs_normal.observed == False # Normal range
assert obs_high.hpo_id == "HP:TEST003"
assert obs_high.observed == True
|
Testing
Run All Tests
Run Specific Test Module
| pytest tests/test_term_bin.py -v
|
Run with Coverage
| pytest --cov=prenatalppkt --cov-report=html
|
| # Format code
ruff format .
# Check for issues
ruff check .
# Auto-fix issues
ruff check . --fix
|
Test Coverage
Current test suite covers:
Core Functionality Tests
tests/test_term_bin.py
- PercentileRange.contains() for various ranges
- TermBin.fits() boundary conditions
- Automatic category detection
- Edge cases (boundary values, overlaps)
tests/test_mapping_loader.py
- YAML file loading
- TermBin object creation
- Validation of range coverage
- Error handling for malformed YAML
tests/test_measurement_eval.py
- Factory initialization
- Mapper creation
- Configuration caching
- Missing measurement handling
tests/test_sonographic_measurement.py
- Percentile-to-observation mapping
- Data-driven lookup logic
- Normal vs. abnormal classification
- Edge percentiles (0.0, 99.9, etc.)
Reference Data Tests
tests/test_biometry_reference.py
- NIHCD table loading
- INTERGROWTH table loading
- Percentile interpolation accuracy
- Z-score calculation
- Cross-reference consistency
Export Tests
tests/test_phenotypic_export.py
- HPO term assignment correctness
- Phenopacket JSON serialization
- Batch export functionality
- QC validation integration
Parsing Tests
tests/test_parse_nichd_raw.py
- Header/junk line detection
- Multi-word measurement parsing
- Race/ethnicity field extraction
- Percentile value extraction
tests/test_parse_intergrowth_txt_all.py
- Data line identification
- GA range validation
- Measure name normalization
- Provenance metadata addition
Test Data
Test fixtures use validated reference values:
| # Example: NIHCD BPD at 20.86 weeks (Non-Hispanic White)
NIHCD_BPD_20_86_WEEKS = {
"3rd": 145.25,
"5th": 147.25,
"10th": 150.37,
"50th": 161.95,
"90th": 174.41,
"95th": 178.12,
"97th": 180.56
}
# Example: INTERGROWTH HC z-scores at 22 weeks
INTERGROWTH_HC_22_WEEKS_ZSCORES = {
"-3 SD": 169.2,
"-2 SD": 179.5,
"-1 SD": 189.8,
"0 SD": 200.1,
"+1 SD": 210.4,
"+2 SD": 220.7,
"+3 SD": 231.0
}
|
Future Roadmap
Phase 1: Core Functionality (Current Release)
- [x] Reference data loading (NIHCD, INTERGROWTH-21st)
- [x] Percentile-based evaluation
- [x] HPO term mapping via YAML configuration
- [x] Data-driven measurement architecture
- [x] TermBin and PercentileRange models
- [ ] Observer JSON parser
- [ ] ViewPoint Excel parser
- [ ] Gestational age calculation from LMP/exam dates
- [ ] Multi-fetus handling
- [ ] Anatomy finding extraction (using ViewPoint dropdown lists)
Phase 3: Quality Control (Planned)
- [ ] Schema validation (JSON Schema, Protobuf)
- [ ] Completeness checking (required fields, measurement coverage)
- [ ] Range validation (biologically plausible values)
- [ ] Anomaly detection (statistical outliers)
- [ ] Cross-measurement consistency (e.g., BPD/HC ratio)
Phase 4: Phenopacket Builder (Planned)
- [ ] Full Phenopacket v2 assembly
- [ ] Family/pedigree integration (twins, triplets)
- [ ] ICD-10 -> MONDO/OMIM mapping
- [ ] Provenance tracking (pipeline version, analyst ID)
- [ ] Batch export utilities
Phase 5: CLI and Web API (Not Planned Yet)
| # Command-line interface
prenatalppkt parse --input exam_data.json --output results/ --reference intergrowth
# Web API
POST /api/v1/evaluate
{
"gestational_age_weeks": 22.5,
"measurements": {"hc_mm": 196.3, "bpd_mm": 52.1}
}
-> Returns Phenopacket JSON
|
Phase 6: Advanced Features (Not Planned Yet)
- [ ] Longitudinal growth tracking (serial ultrasounds)
- [ ] Growth velocity calculations
- [ ] Multi-parameter risk scoring
- [ ] Predictive modeling integration (machine learning)
- [ ] DICOM integration (extract measurements from ultrasound images)
Contributing
We welcome contributions! Please see our Contributing Guidelines for details.
Development Workflow
| # 1. Fork and clone
git clone https://github.com/YOUR_USERNAME/prenatalppkt.git
cd prenatalppkt
git remote add upstream https://github.com/P2GX/prenatalppkt.git
# 2. Create feature branch
git checkout -b feature/add-efw-support
# 3. Install development dependencies
pip install -e ".[test]"
# 4. Make changes and test
pytest -vv
ruff format .
ruff check . --fix
# 5. Commit with descriptive messages
git add .
git commit -m "feat: Add estimated fetal weight (EFW) measurement support"
# 6. Push and create pull request
git push origin feature/add-efw-support
|
Code Style Guidelines
- Python: Follow PEP 8 (enforced by Ruff)
- Docstrings: Use Sphinx format
- Type hints: Required for all public functions
- Line length: 88 characters (Black-compatible)
Example:
| def evaluate(
self,
gestational_age: GestationalAge,
measurement_value: float,
reference_range: ReferenceRange
) -> MeasurementResult:
"""
Evaluate a raw measurement against the provided reference range.
Parameters
----------
gestational_age : GestationalAge
The gestational age context for this measurement.
measurement_value : float
The observed measurement in millimeters.
reference_range : ReferenceRange
Percentile thresholds for this gestational age.
Returns
-------
MeasurementResult
Percentile bin classification for the measurement.
"""
|
LICENSE
| ## License
This project is released under a **dual-license model**:
- **Academic / Non-Commercial License:** Free to use, modify, and distribute for research and educational purposes.
- **Commercial License:** Required for commercial or for-profit use. Please contact [varenyajj@gmail.com](mailto:varenyajj@gmail.com).
Attribution required: (C) 2025 Varenya Jain, Peter N. Robinson.
For complete terms, see the [LICENSE](./LICENSE) file.
|
Acknowledgments
Reference Standards
- NICHD Fetal Growth Studies: U.S. National Institute of Child Health and Human Development
- INTERGROWTH-21st Project: International consortium for fetal growth standards
Key Dependencies
- HPO Toolkit: Human Phenotype Ontology integration
- GA4GH Phenopackets: Standardized phenotype representation
- PyPhetools: Phenotype analysis utilities from Monarch Initiative
Contributors
Citation
If you use prenatalppkt in your research, please cite:
| @article{prenatalppkt
author = {Jain, Varenya and Robinson, Peter N.},
title = {prenatalppkt: Standardized Prenatal Phenotype Representation},
year = {2025},
url = {https://github.com/P2GX/prenatalppkt},
version = {0.1.dev}
}
|
And cite the relevant reference standards:
- NICHD Fetal Growth Studies: Buck Louis GM, Grewal J, Albert PS, Sciscione A, Wing DA, Grobman WA, Newman RB, Wapner R, D'Alton ME, Skupski D, Nageotte MP, Ranzini AC, Owen J, Chien EK, Craigo S, Hediger ML, Kim S, Zhang C, Grantz KL. Racial/ethnic standards for fetal growth: the NICHD Fetal Growth Studies. Am J Obstet Gynecol. 2015 Oct;213(4):449.e1-449.e41. doi: 10.1016/j.ajog.2015.08.032. PMID: 26410205; PMCID: PMC4584427.
- INTERGROWTH-21st: Papageorghiou AT, Kennedy SH, Salomon LJ, Altman DG, Ohuma EO, Stones W, Gravett MG, Barros FC, Victora C, Purwar M, Jaffer Y, Noble JA, Bertino E, Pang R, Cheikh Ismail L, Lambert A, Bhutta ZA, Villar J; International Fetal and Newborn Growth Consortium for the 21(st) Century (INTERGROWTH-21(st)). The INTERGROWTH-21st fetal growth standards: toward the global integration of pregnancy and pediatric care. Am J Obstet Gynecol. 2018 Feb;218(2S):S630-S640. doi: 10.1016/j.ajog.2018.01.011. PMID: 29422205.
| @article{intergrowth2014,
title={International standards for fetal growth based on serial ultrasound measurements: the INTERGROWTH-21st Project},
author={Papageorghiou, Aris T and Ohuma, Eric O and others},
journal={The Lancet},
volume={384},
number={9946},
pages={869--879},
year={2014},
publisher={Elsevier}
}
|
| @article{buck2015nichd,
title={The NICHD Fetal Growth Studies: design, methods, and cohort description},
author={Buck Louis, Germaine M and Grewal, Jagteshwar and others},
journal={American Journal of Obstetrics and Gynecology},
volume={213},
number={4},
pages={459--e1},
year={2015}
}
|
Support
| Documentation: https://github.com/P2GX/prenatalppkt/docs
Issue Tracker: https://github.com/P2GX/prenatalppkt/issues
Discussions: https://github.com/P2GX/prenatalppkt/discussions
Email: [Contact @VarenyaJ or @pnrobinson]
|
Visuals:
High-Level System Overview
flowchart LR
subgraph Input["Input Processing"]
JSON["JSON/XLSX Input"] --> Parser["Parser Layer"]
Parser --> GA["Gestational Age
Calculation"]
GA --> Measurements["Extract Measurements
o BPD
o HC
o AC
o FL
o OFD"]
end
subgraph Config["Configuration Layer"]
YAML["YAML Mappings
o Percentile ranges
o HPO term IDs
o Normal flags"]
end
subgraph Reference["Reference Data"]
NIH["NIHCD Reference
o Percentiles by race/ethnicity
o Growth charts"]
IG21["INTERGROWTH-21st
o Z-scores
o Centiles"]
end
subgraph Processing["Measurement Processing"]
Measurements --> Factory["MeasurementEvaluation
(Factory)"]
Factory --> Mapper["SonographicMeasurement
(Generic Mapper)"]
Mapper --> Percentile["Calculate Percentile"]
Percentile --> Direct["Direct Mapping
percentile -> HPO term"]
end
subgraph Output["Output Generation"]
Direct --> TermObs["TermObservation
o HPO ID + Label
o observed flag
o percentile"]
TermObs --> Phenopacket["Phenopacket Builder"]
Phenopacket --> QC["QC Validation"]
QC --> Final["Final Phenopackets"]
end
YAML -.-> Factory
NIH --> Percentile
IG21 --> Percentile
classDef input fill:#a8d5ff,stroke:#333,stroke-width:2px,color:#000000
classDef config fill:#fff4e6,stroke:#333,stroke-width:2px,color:#000000
classDef reference fill:#ffe6cc,stroke:#333,stroke-width:2px,color:#000000
classDef process fill:#d5ffa8,stroke:#333,stroke-width:2px,color:#000000
classDef output fill:#ffafcc,stroke:#333,stroke-width:2px,color:#000000
class JSON,Parser,GA,Measurements input
class YAML config
class NIH,IG21 reference
class Factory,Mapper,Percentile,Direct process
class TermObs,Phenopacket,QC,Final output
Key Differences from Legacy Architecture:
- YAML Configuration Layer: Central source of truth for HPO mappings
- Factory Pattern: MeasurementEvaluation creates mappers on demand
- Generic Mapper: Single SonographicMeasurement class (no subclasses)
- Direct Mapping: No intermediate MeasurementResult - percentile maps directly to TermObservation
- Explicit Normal Flags: YAML defines what's normal, not hard-coded logic
Detailed Module-Level Architecture
flowchart LR
subgraph Input["1 Input Sources"]
JSON["data/EVMS_SAMPLE.json"]
XLSX["ViewPoint Excel (.xlsx)"]
end
subgraph Parsing["2 Parsing & Gestational Age"]
PARSER["biometry.py / parse_viewpoint.py
o Reads JSON/XLSX
o Extracts raw measurements"]
GA["gestational_age.py
o GestationalAge(weeks, days)
o from_weeks() converter"]
end
subgraph Config["3 Configuration Loading"]
YAML["data/mappings/
biometry_hpo_mappings.yaml
o Percentile ranges [min, max)
o HPO IDs and labels
o normal: true/false"]
LOADER["mapping_loader.py
BiometryMappingLoader
o Parses YAML
o Creates TermBin objects
o Validates coverage"]
end
subgraph Reference["4 Reference Standards"]
REF["biometry_reference.py
FetalGrowthPercentiles
o Loads NIHCD/INTERGROWTH tables
o calculate_percentile()
o get_z_score()"]
end
subgraph Factory["5 Mapper Factory"]
FACTORY["measurement_eval.py
MeasurementEvaluation
o Loads YAML once at init
o get_measurement_mapper()
o Returns configured mappers"]
end
subgraph Measurement["6 Generic Measurement Mapper"]
MAPPER["sonographic_measurement.py
SonographicMeasurement
o measurement_type: str
o term_bins: List[TermBin]
o from_percentile() -> TermObservation"]
TERMBIN["measurements/term_bin.py
o PercentileRange
o TermBin
o category auto-detection"]
end
subgraph Observation["7 Ontology Observation"]
OBS["term_observation.py
TermObservation
o hpo_id, hpo_label
o observed: bool
o percentile: float
o to_phenotypic_feature()"]
end
subgraph Export["8 Phenopacket Export"]
EXPORT["phenotypic_export.py
PhenotypicExporter
o Collects TermObservations
o build_phenopacket()
o to_json()"]
end
subgraph QC["9 Quality Control"]
VALID["qc/validator.py (planned)
o Schema validation
o Ontology term checks
o Completeness reports"]
end
subgraph Output["Outputs"]
PP["Phenopackets v2 JSON
o Subject metadata
o phenotypicFeatures[]
o measurements[]
o metaData"]
LOGS["QC Reports
o Validation results
o Provenance tracking"]
end
%% Connections
JSON --> PARSER
XLSX --> PARSER
PARSER --> GA
PARSER --> REF
YAML --> LOADER
LOADER --> TERMBIN
LOADER --> FACTORY
FACTORY --> MAPPER
TERMBIN -.-> MAPPER
REF --> MAPPER
GA --> MAPPER
MAPPER --> OBS
OBS --> EXPORT
EXPORT --> VALID
VALID --> PP
VALID --> LOGS
classDef input fill:#a8d5ff,stroke:#333,stroke-width:2px,color:#000000
classDef config fill:#fff4e6,stroke:#333,stroke-width:2px,color:#000000
classDef reference fill:#ffe6cc,stroke:#333,stroke-width:2px,color:#000000
classDef process fill:#d5ffa8,stroke:#333,stroke-width:2px,color:#000000
classDef output fill:#ffafcc,stroke:#333,stroke-width:2px,color:#000000
class JSON,XLSX,PARSER,GA input
class YAML,LOADER,FACTORY config
class REF,TERMBIN reference
class MAPPER,OBS,EXPORT process
class VALID,PP,LOGS output
Architecture Highlights:
| Component |
Old Approach |
New Approach |
| Ontology Mapping |
Hard-coded in subclasses |
Declarative YAML configuration |
| Measurement Classes |
One subclass per measurement (HC, BPD, FL...) |
Single generic SonographicMeasurement |
| Normal Range Logic |
Hard-coded if/else chains |
Explicit normal: true/false in YAML |
| Intermediate Results |
MeasurementResult with string bin keys |
Direct percentile -> TermObservation |
| Dependencies |
MinimalTerm from hpo-toolkit |
Simple strings (hpo_id, hpo_label) |
| Extensibility |
Code changes for new measurements |
YAML edits only |
Core Design Principles
- Configuration over Code: HPO term mappings are defined declaratively in YAML, not hard-coded in Python classes
Key Architectural Change: Configuration-Driven Mapping
-OLD APPROACH (Hard-coded):
TRANSFORMATION OVERVIEW:
flowchart LR
subgraph Old["Old Architecture (Hard-coded)"]
direction TB
O1["Raw Measurement
HC = 175mm @ 20w6d"]
O2["ReferenceRange.evaluate()"]
O3["MeasurementResult
bin_key = 'below_3p'"]
O4["Hard-coded if/elif
'below_3p' -> Microcephaly"]
O5["TermObservation
with MinimalTerm object"]
O1 --> O2 --> O3 --> O4 --> O5
end
subgraph New["New Architecture (Data-driven)"]
direction LR
N1["Raw Measurement
HC = 175mm @ 20w6d"]
N2["FetalGrowthPercentiles
calculate_percentile()"]
N3["Percentile = 2.1"]
N4["YAML Lookup
[0, 3) -> HP:0000252"]
N5["TermObservation
hpo_id, hpo_label"]
N1 --> N2 --> N3 --> N4 --> N5
end
style Old fill:#ffe6e6,stroke:#cc0000,stroke-width:2px
style New fill:#e6ffe6,stroke:#00cc00,stroke-width:2px
OLD APPROACH (Hard-coded):
| # Each measurement had its own class with hard-coded logic
class HeadCircumferenceMeasurement(SonographicMeasurement):
def get_bin_to_term_mapping(self):
return {
"below_3p": MinimalTerm("HP:0000252", "Microcephaly"),
"between_3p_5p": MinimalTerm("HP:0040195", "Decreased HC"),
# ... 6 more hard-coded bins
}
|
NEW APPROACH (Data-driven):
| # data/mappings/biometry_hpo_mappings.yaml
head_circumference:
- min: 0
max: 3
id: "HP:0000252"
label: "Microcephaly"
normal: false
- min: 3
max: 5
id: "HP:0040195"
label: "Decreased head circumference"
normal: false
# ... all 8 ranges covering 0-100 percentile
|
| # Python code loads configuration, no hard-coding needed
factory = MeasurementEvaluation() # Loads YAML once
mapper = factory.get_measurement_mapper("head_circumference")
observation = mapper.from_percentile(2.1, gestational_age)
# Returns: TermObservation(hpo_id="HP:0000252", hpo_label="Microcephaly", ...)
|
Component Interaction Comparison
flowchart LR
subgraph Legacy["Legacy Architecture"]
direction TB
L1["HeadCircumferenceMeasurement
(Subclass)"]
L2["BipariatalDiameterMeasurement
(Subclass)"]
L3["FemurLengthMeasurement
(Subclass)"]
L4["Hard-coded mappings
in each class"]
L5["MeasurementResult
(bin_key strings)"]
L6["MinimalTerm objects
from hpo-toolkit"]
L1 --> L4
L2 --> L4
L3 --> L4
L4 --> L5
L5 --> L6
style L1 fill:#ffcccc
style L2 fill:#ffcccc
style L3 fill:#ffcccc
style L4 fill:#ffcccc
style L5 fill:#ffcccc
style L6 fill:#ffcccc
end
subgraph Modern["Modern Architecture"]
direction TB
M1["biometry_hpo_mappings.yaml
(Single source of truth)"]
M2["BiometryMappingLoader
(Parses YAML)"]
M3["MeasurementEvaluation
(Factory)"]
M4["SonographicMeasurement
(Generic - works for ALL)"]
M5["TermBin objects
(Percentile ranges)"]
M6["TermObservation
(Simple data holder)"]
M1 --> M2
M2 --> M3
M3 --> M4
M4 --> M5
M5 --> M6
style M1 fill:#ccffcc
style M2 fill:#ccffcc
style M3 fill:#ccffcc
style M4 fill:#ccffcc
style M5 fill:#ccffcc
style M6 fill:#ccffcc
end
Legacy -.->|Refactored to| Modern
Benefits of New Architecture:
| Aspect |
Legacy |
Modern |
Improvement |
| Lines of Code |
~500 LOC across subclasses |
~200 LOC + YAML |
60% reduction |
| Adding Measurements |
Write new Python class |
Add YAML entry |
No code changes |
| Changing Thresholds |
Edit hard-coded values |
Edit YAML values |
Non-technical edits |
| Testing |
Mock entire class hierarchy |
Inject test config |
Easier unit tests |
| Dependencies |
Tight coupling to hpo-toolkit |
Simple strings |
Looser coupling |
| Maintainability |
Changes require code review |
Config can be validated |
Faster iteration |
Decision Flow: Processing a Single Measurement
flowchart LR
Start([Ultrasound Measurement]) --> Parse{Parse Input}
Parse -->|Success| ExtractGA[Extract Gestational Age]
Parse -->|Fail| Error1[Error: Invalid Input]
ExtractGA --> ExtractMeas[Extract Measurement Value]
ExtractMeas --> ValidateRange{Value in
Valid Range?}
ValidateRange -->|No| Error2[Error: Out of Range]
ValidateRange -->|Yes| InitFactory[Initialize Factory]
InitFactory --> LoadYAML{YAML Already
Loaded?}
LoadYAML -->|Yes| GetMapper
LoadYAML -->|No| LoadConfig[Load biometry_hpo_mappings.yaml]
LoadConfig --> ValidateYAML{YAML Valid?}
ValidateYAML -->|No| Error3[Error: Invalid Config]
ValidateYAML -->|Yes| GetMapper[Get Measurement Mapper]
GetMapper --> MapperExists{Mapper Exists
for Type?}
MapperExists -->|No| Error4[Error: Unknown Measurement]
MapperExists -->|Yes| CalcPercentile[Calculate Percentile
from Reference Data]
CalcPercentile --> InterpolateGA{GA in
Table?}
InterpolateGA -->|No| Interpolate[Interpolate Between Rows]
InterpolateGA -->|Yes| LookupThresholds
Interpolate --> LookupThresholds[Lookup Percentile Thresholds]
LookupThresholds --> CompareValue[Compare Value to Thresholds]
CompareValue --> FindPercentile[Determine Percentile Value]
FindPercentile --> IterateBins[Iterate Through TermBins]
IterateBins --> CheckFit{Percentile Fits
in Bin?}
CheckFit -->|No| NextBin[Try Next Bin]
NextBin --> MoreBins{More Bins
to Check?}
MoreBins -->|Yes| CheckFit
MoreBins -->|No| Error5[Error: No Matching Bin]
CheckFit -->|Yes| CreateObs[Create TermObservation]
CreateObs --> SetHPO[Set HPO ID & Label]
SetHPO --> SetObserved{normal
flag?}
SetObserved -->|true| SetNormal[observed = False
excluded = True]
SetObserved -->|false| SetAbnormal[observed = True
excluded = False]
SetNormal --> AddMetadata[Add Gestational Age
& Percentile]
SetAbnormal --> AddMetadata
AddMetadata --> ToFeature[Convert to
Phenotypic Feature]
ToFeature --> Success([TermObservation Ready])
style Start fill:#a8d5ff
style Success fill:#ccffcc
style Error1 fill:#ffcccc
style Error2 fill:#ffcccc
style Error3 fill:#ffcccc
style Error4 fill:#ffcccc
style Error5 fill:#ffcccc
style LoadYAML fill:#fff4e6
style GetMapper fill:#d5ffa8
style CalcPercentile fill:#ffe6cc
style CreateObs fill:#e3f2fd
Key Decision Points:
- Input Validation: Ensures data format is correct
- Range Validation: Checks biological plausibility (e.g., HC not negative)
- Configuration Loading: One-time YAML load, then cached
- Mapper Resolution: Factory pattern creates appropriate mapper
- Percentile Calculation: Reference data lookup with interpolation
- Bin Matching: Data-driven iteration through TermBins
- Observation Creation: Sets HPO term and observed/excluded flags
- Feature Export: Converts to Phenopacket-compliant format
Initialization vs. Runtime Phases
Understanding when things happen is crucial for performance and debugging:
flowchart LR
subgraph Init["Initialization Phase (Once)"]
direction TB
I1["Load YAML Configuration"]
I2["Parse into PercentileRange
& TermBin objects"]
I3["Create mapping dictionary
{measurement_type: [TermBins]}"]
I4["Store in Factory"]
I1 --> I2 --> I3 --> I4
style I1 fill:#fff4e6
style I2 fill:#fff4e6
style I3 fill:#fff4e6
style I4 fill:#fff4e6
end
subgraph Runtime["Runtime Phase (Per Measurement)"]
direction TB
R1["Get mapper from Factory
O(1) dictionary lookup"]
R2["Calculate percentile
from reference data"]
R3["Iterate through TermBins
(typically 8 bins)"]
R4["Create TermObservation
when bin matches"]
R5["Convert to Phenopacket
feature"]
R1 --> R2 --> R3 --> R4 --> R5
style R1 fill:#d5ffa8
style R2 fill:#d5ffa8
style R3 fill:#d5ffa8
style R4 fill:#d5ffa8
style R5 fill:#d5ffa8
end
Init ==>|One-time cost| Runtime
Note1["YAML parsing happens ONCE
at application startup, not per measurement"]
Note2["Runtime is pure in-memory
operations - very fast"]
Init -.-> Note1
Runtime -.-> Note2
Performance Characteristics:
| Phase |
Operation |
Complexity |
Frequency |
| Init |
Load & parse YAML |
O(n) where n = total mappings |
Once per application start |
| Init |
Create TermBin objects |
O(n) |
Once per application start |
| Runtime |
Get mapper |
O(1) dictionary lookup |
Per measurement |
| Runtime |
Calculate percentile |
O(log n) with interpolation |
Per measurement |
| Runtime |
Find matching bin |
O(k) where k 8 bins |
Per measurement |
| Runtime |
Create observation |
O(1) |
Per measurement |
Memory Footprint:
- YAML File: ~11 KB on disk
- Loaded Mappings: ~50 KB in memory (all 5 measurements)
- Per Observation: ~1 KB (TermObservation object)