.. _measurement-stat: ==================== Compare measurements ==================== **************** Example analysis **************** We will analyze 69 individuals reported by `Xu et al. (2019) `_. The reported data includes the *CYP21A2* mutations, the lab measurement, and other clinical signs and symptoms encoded as HPO terms. TODO: wordsmith Load cohort =========== For the purpose of this analysis, we will load the :class:`~gpsea.model.Cohort` from a `JSON file `_. The cohort was prepared from phenopackets as described in :ref:`create-a-cohort` section, and then serialized as a JSON file following the instructions in :ref:`cohort-persistence` section. .. Prepare the JSON file by running the tests in `tests/tests/test_generate_doc_cohorts.py`. >>> import json >>> from gpsea.io import GpseaJSONDecoder >>> fpath_cohort_json = 'docs/cohort-data/CYP21A2.0.1.20.json' >>> with open(fpath_cohort_json) as fh: ... cohort = json.load(fh, cls=GpseaJSONDecoder) >>> len(cohort) 69 Configure analysis ================== *MANE* transcript of *CYP21A2*. >>> tx_id = 'NM_000500.9' Genotype predicate ------------------ For now, just Missense vs rest. Nonsense variants + big SVs, Missense should *NOT* be severe. TODO - create real predicate. >>> from gpsea.model import VariantEffect >>> from gpsea.analysis.predicate import variant_effect >>> is_missense = variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id=tx_id) >>> is_missense.description 'MISSENSE_VARIANT on NM_000500.9' Assuming AR inheritance, we compare missense vs. rest: >>> from gpsea.analysis.clf import biallelic_classifier >>> gt_clf = biallelic_classifier( ... a_predicate=is_missense, ... b_predicate=~is_missense, ... a_label="Missense", b_label="Other", ... partitions=({0,}, {1, 2}), ... ) >>> gt_clf.class_labels ('Missense/Missense', 'Missense/Other OR Other/Other') Phenotype score --------------- We investigate an association between a measurement value and a genotype group. We use the measurement of `Testosterone [Mass/volume] in Serum or Plasma `_ (`LOINC:2986-8`). >>> from gpsea.analysis.pscore import MeasurementPhenotypeScorer >>> testosterone = 'LOINC:2986-8' >>> pheno_scorer = MeasurementPhenotypeScorer.from_measurement_id( ... term_id=testosterone, ... label="Testosterone [Mass/volume] in Serum or Plasma", ... ) >>> pheno_scorer.description 'Value of Testosterone [Mass/volume] in Serum or Plasma [LOINC:2986-8]' Statistical test ---------------- We will use T test to test for differences between scores of the different genotype groups >>> from gpsea.analysis.pscore.stats import TTestStatistic >>> score_statistic = TTestStatistic() Final analysis -------------- We will put the final analysis together into :class:`~gpsea.analysis.pscore.PhenotypeScoreAnalysis`. >>> from gpsea.analysis.pscore import PhenotypeScoreAnalysis >>> score_analysis = PhenotypeScoreAnalysis( ... score_statistic=score_statistic, ... ) Analysis ======== We execute the analysis by running >>> result = score_analysis.compare_genotype_vs_phenotype_score( ... cohort=cohort, ... gt_clf=gt_clf, ... pheno_scorer=pheno_scorer, ... ) >>> result.pval 0.741216622359659 Show data frame with scores >>> scores = result.data.sort_index() >>> scores.head() # doctest: +NORMALIZE_WHITESPACE genotype phenotype patient_id individual 10[PMID_30968594_individual_10] 1 614.0 individual 11[PMID_30968594_individual_11] 1 630.0 individual 12[PMID_30968594_individual_12] 1 NaN individual 13[PMID_30968594_individual_13] 1 303.0 individual 14[PMID_30968594_individual_14] 1 664.0 Prepare genotype category legend: >>> gt_id_to_name = {c.category.cat_id: c.category.name for c in gt_clf.get_categorizations()} >>> gt_id_to_name {0: 'Missense/Missense', 1: 'Missense/Other OR Other/Other'} TODO: wordsmith & finish!