gpsea.analysis.pscore package
- class gpsea.analysis.pscore.PhenotypeScorer[source]
Bases:
ContinuousPartitioningPhenotypeScorer assigns the patient with a phenotype score.
The score can be math.nan if it is not possible to compute the score for a patient.
The scorer can be created by wrapping a scoring function (see
wrap_scoring_function()).- static wrap_scoring_function(func: Callable[[Patient], float], name: str = 'Custom Scoring Function') PhenotypeScorer[source]
Create a PhenotypeScorer by wrap the provided scoring function func.
The function must take exactly one argument of type
Patientand return a float with the corresponding phenotype score.Example
>>> from gpsea.analysis.pscore import PhenotypeScorer >>> def f(p): 123.4 >>> phenotype_scorer = PhenotypeScorer.wrap_scoring_function(f)
phenotype_scorer will assign all patients a score of 123.4.
- Parameters:
func – the scoring function.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysis(score_statistic: PhenotypeScoreStatistic)[source]
Bases:
objectPhenotypeScoreAnalysis tests the association between two or more genotype classes and a phenotype score.
A genotype class is assigned by a
GenotypeClassifierand the phenotype score is computed with aPhenotypeScorer.The association is tested with a
PhenotypeScoreStatisticand the results are reported as aPhenotypeScoreAnalysisResult.- compare_genotype_vs_phenotype_score(cohort: Iterable[Patient], gt_clf: GenotypeClassifier, pheno_scorer: PhenotypeScorer) PhenotypeScoreAnalysisResult[source]
Compute the association between genotype groups and phenotype score.
- Parameters:
cohort – the cohort to analyze.
gt_clf – a classifier for assigning an individual into a genotype class.
pheno_scorer – the scorer to compute phenotype score.
- class gpsea.analysis.pscore.PhenotypeScoreAnalysisResult(gt_clf: GenotypeClassifier, phenotype: PhenotypeScorer, statistic: Statistic, data: DataFrame, statistic_result: StatisticResult)[source]
Bases:
MonoPhenotypeAnalysisResultPhenotypeScoreAnalysisResult is a container for
PhenotypeScoreAnalysisresults.The
dataproperty provides a data frame with phenotype score for each tested individual:patient_id
genotype
phenotype
patient_1
0
1
patient_2
0
nan
patient_3
None
2
patient_4
1
2
…
…
…
The DataFrame index includes the identifiers of the tested individuals and the values are stored in genotype and phenotype columns.
The genotype includes the genotype category ID (
cat_id) or None if the patient cannot be assigned into any genotype category.The phenotype contains a float with the phenotype score. A NaN value is used if the phenotype score is impossible to compute.
- phenotype_scorer() PhenotypeScorer[source]
Get the scorer that computed the phenotype score.
- plot_boxplots(ax, colors: Sequence[str] = ('#990F0F', '#A72929', '#B64343', '#C45D5D', '#D27676', '#E19090', '#EFAAAA'), median_color: str = '#00aaff', **boxplot_kwargs)[source]
Draw box plot with distributions of phenotype scores for the genotype groups.
- Parameters:
ax – the Matplotlib
Axesto draw on.colors – a sequence with color palette for the box plot patches.
median_color – a str with the color for the boxplot median line.
boxplot_kwargs – arguments to pass into
matplotlib.axes.Axes.boxplot()function.
- plot_violins(ax, colors: Sequence[str] = ('#990F0F', '#A72929', '#B64343', '#C45D5D', '#D27676', '#E19090', '#EFAAAA'), **violinplot_kwargs)[source]
Draw a violin plot with distributions of phenotype scores for the genotype groups.
- Parameters:
ax – the Matplotlib
Axesto draw on.colors – a sequence with color palette for the violin patches.
violinplot_kwargs – arguments to pass into
matplotlib.axes.Axes.violinplot()function.
- class gpsea.analysis.pscore.CountingPhenotypeScorer(hpo: MinimalOntology, query: Iterable[TermId])[source]
Bases:
PhenotypeScorerCountingPhenotypeScorer assigns the patient with a phenotype score that is equivalent to the count of observed phenotypes that are either an exact match to the query terms or their descendants.
For instance, we may want to count whether an individual has brain, liver, kidney, and skin abnormalities. In the case, the query would include the corresponding terms (e.g., Abnormal brain morphology HP:0012443). An individual can then have between 0 and 4 phenotype group abnormalities. This predicate is intended to be used with the Mann Whitney U test.
- static from_query_curies(hpo: MinimalOntology, query: Iterable[TermId | str])[source]
Create a scorer to test for the number of phenotype terms that fall into the phenotype groups.
- Parameters:
hpo – HPO as represented by
MinimalOntologyof HPO toolkit.query – an iterable of the top-level terms, either represented as CURIEs (str) or as term IDs.
- class gpsea.analysis.pscore.DeVriesPhenotypeScorer(hpo: MinimalOntology)[source]
Bases:
PhenotypeScorerDeVriesPhenotypeScorer computes “adapted De Vries Score” as described in Feenstra et al..
See more in De Vries Score section.
- class gpsea.analysis.pscore.MeasurementPhenotypeScorer(term_id: str | TermId, label: str)[source]
Bases:
PhenotypeScorerMeasurementPhenotypeScorer uses a value of a measurement as a phenotype score.
For instance, the amount of Testosterone [Mass/volume] in Serum or Plasma.
Example
Create a scorer that uses the level of testosterone represented by the Testosterone [Mass/volume] in Serum or Plasma LOINC code as a phenotype score.
>>> from gpsea.analysis.pscore import MeasurementPhenotypeScorer >>> pheno_scorer = MeasurementPhenotypeScorer.from_measurement_id( ... term_id="LOINC:2986-8", ... label="Testosterone [Mass/volume] in Serum or Plasma", ... ) >>> # use the scorer in the analysis ...
- static from_measurement_id(term_id: str | TermId, label: str) MeasurementPhenotypeScorer[source]
Create MeasurementPhenotypeScorer from a measurement identifier.
- Parameters:
term_id – a str with CURIE or a
TermIdrepresenting the term ID of a measurement (e.g. LOINC:2986-8).label – a str with the measurement label (e.g. Testosterone [Mass/volume] in Serum or Plasma)