gpsea.analysis package

exception gpsea.analysis.AnalysisException(data: Mapping[str, Any], *args)[source]

Bases: Exception

Reports analysis issues that need user’s attention.

To aid troubleshooting, the exception includes data - a mapping with any data that has been computed prior encountering the issues.

property data: Mapping[str, Any]: Get a mapping with (partial) data to aid troubleshooting.

class gpsea.analysis.AnalysisResult(gt_clf: GenotypeClassifier, statistic: Statistic)[source]

Bases: object

AnalysisResult includes the common parts of results of all analyses.

property gt_clf: GenotypeClassifier: Get the genotype classifier used in the survival analysis that produced this result.

property statistic: Statistic: Get the statistic which computed the (nominal) p values for this result.

class gpsea.analysis.MonoPhenotypeAnalysisResult(gt_clf: GenotypeClassifier, phenotype: Partitioning, statistic: Statistic, data: DataFrame, statistic_result: StatisticResult)[source]

Bases: AnalysisResult

MonoPhenotypeAnalysisResult reports the outcome of an analysis that tested a single genotype-phenotype association.

DATA_COLUMNS = ('genotype', 'phenotype'): The required columns of the data data frame.

GT_COL = 'genotype': Name of column for storing genotype data.

PH_COL = 'phenotype': Name of column for storing phenotype data.

SAMPLE_ID = 'patient_id': Name of the data index.

complete_records() → DataFrame[source]: Get the data rows where both genotype and phenotype columns are available (i.e. not None or NaN).

property data: DataFrame

Get the data frame with genotype and phenotype values for each tested individual.

The index of the data frame contains the identifiers of the tested individuals, and the values are stored in genotype and phenotype columns.

The genotype column includes the genotype category ID (cat_id) or None if the individual could not be assigned into a genotype group. The phenotype contains the phenotype values, and the data type depends on the analysis.

Here are some common phenotype data types:

a phenotype score computed in PhenotypeScoreAnalysis is a float
survival computed in SurvivalAnalysis is of type Survival

property phenotype: Partitioning: Get the Partitioning that produced the phenotype.

property pval: float: Get the p value of the test.

statistic_result() → StatisticResult[source]: Get statistic result with the nominal p value and the associated statistics.

class gpsea.analysis.MultiPhenotypeAnalysisResult(gt_clf: GenotypeClassifier, pheno_clfs: Iterable[PhenotypeClassifier[P]], statistic: Statistic, n_usable: Sequence[int], all_counts: Sequence[DataFrame], statistic_results: Sequence[StatisticResult | None], corrected_pvals: Sequence[float] | None, mtc_correction: str | None)[source]

Bases: Generic[P], AnalysisResult

MultiPhenotypeAnalysisResult reports the outcome of an analysis that tested the association of genotype with two or more phenotypes.

property all_counts: Sequence[DataFrame]

Get a DataFrame sequence where each DataFrame includes the counts of patients in genotype and phenotype groups.

An example for a genotype predicate that bins into two categories (Yes and No) based on presence of a missense variant in transcript NM_123456.7, and phenotype predicate that checks presence/absence of HP:0001166 (a phenotype term):

           Has MISSENSE_VARIANT in NM_123456.7
           No       Yes
Present
Yes        1        13
No         7        5

The rows correspond to the phenotype categories, and the columns represent the genotype categories.

property corrected_pvals: Sequence[float] | None: Get a sequence with p values for each tested phenotype after multiple testing correction or None if the correction was not applied. The sequence includes a NaN value for each phenotype that was not tested.

property mtc_correction: str | None: Get name/code of the used multiple testing correction (e.g. fdr_bh for Benjamini-Hochberg) or None if no correction was applied.

n_significant_for_alpha(alpha: float = 0.05) → int | None[source]

Get the count of the corrected p values with the value being less than or equal to alpha.

Parameters:: alpha – a float with significance level.

property n_usable: Sequence[int]: Get a sequence of numbers of patients where the phenotype was assessable, and are, thus, usable for genotype-phenotype correlation analysis.

property pheno_clfs: Sequence[PhenotypeClassifier[P]]: Get the phenotype classifiers used in the analysis.

property phenotypes: Sequence[P]: Get the phenotypes that were tested for association with genotype in the analysis.

property pvals: Sequence[float]: Get a sequence of nominal p values for each tested phenotype. The sequence includes a NaN value for each phenotype that was not tested.

significant_phenotype_indices(alpha: float = 0.05, pval_kind: Literal['corrected', 'nominal'] = 'corrected') → Sequence[int] | None[source]: Get the indices of phenotypes that attain significance for provided alpha.

property statistic_results: Sequence[StatisticResult | None]: Get a sequence of StatisticResult items with nominal p values and the associated statistic values for each tested phenotype or None for the untested phenotypes.

property total_tests: int: Get total count of genotype-phenotype associations that were tested in this analysis.

class gpsea.analysis.Statistic(name: str)[source]

Bases: object

Mixin for classes that are used to compute a nominal p value for a genotype-phenotype association.

property name: str: Get the name of the statistic (e.g. Fisher Exact Test, Logrank test).

class gpsea.analysis.StatisticResult(statistic: int | float | None, pval: float)[source]

Bases: object

StatisticResult reports result of a Statistic.

It includes a statistic (optional) and a corresponding p value. The p value can be NaN if it is impossible to compute for a given dataset.

Raises an AssertionError for an invalid input.

property pval: float: Get a p value (a value or a NaN).

property statistic: float | None: Get a float with the test statistic or None if not available.

class gpsea.analysis.Partitioning[source]

Bases: Summarizable

Partitioning is a superclass of all classes that assign a group, compute a score or survival for an individual.

abstract property description: str: Get a description of the partitioning.

abstract property name: str: Get the name of the partitioning.

summarize(out: TextIO)[source]: Summarize the item while also considering other (default None).

abstract property variable_name: str

Get a str with the name of the variable investigated by the partitioning.

For instance Sex, Allele groups, HP:0001250, OMIM:256000

class gpsea.analysis.ContinuousPartitioning[source]

Bases: Partitioning

ContinuousPartitioning computes a score that is a real number.

The class is just a marker class at this time.

class gpsea.analysis.Summarizable[source]

Bases: object

A mixin for entities that can summarize themselves into a provided IO handle.

abstractmethod summarize(out: TextIO)[source]

Summarize the item into the provided IO handle.

Parameters:: out – an IO handle to write into.

summary() → str[source]: Get the summary.

gpsea.analysis package

Subpackages