gpsea.analysis.predicate package
- class gpsea.analysis.predicate.VariantPredicate[source]
Bases:
PartitioningVariantPredicate tests if a variant meets a certain criterion.
The subclasses MUST implement all abstract methods of this class plus
__eq__and__hash__, to support building the compound predicates.We strongly recommend implementing
__str__and__repr__as well.
- gpsea.analysis.predicate.true() VariantPredicate[source]
The most inclusive variant predicate - returns True for any variant whatsoever.
- gpsea.analysis.predicate.allof(predicates: Iterable[VariantPredicate]) VariantPredicate[source]
Prepare a
VariantPredicatethat returns True if ALL predicates evaluate to True.This is useful for building compound predicates programmatically.
Example
Build a predicate to test if variant has a functional annotation to genes SURF1 and SURF2:
>>> from gpsea.analysis.predicate import allof, gene
>>> genes = ('SURF1', 'SURF2',) >>> predicate = allof(gene(g) for g in genes) >>> predicate.description '(affects SURF1 AND affects SURF2)'
- Parameters:
predicates – an iterable of predicates to test
- gpsea.analysis.predicate.anyof(predicates: Iterable[VariantPredicate]) VariantPredicate[source]
Prepare a
VariantPredicatethat returns True if ANY of the predicates evaluates to True.This can be useful for building compound predicates programmatically.
Example
Build a predicate to test if variant leads to a missense or nonsense change on a fictional transcript NM_123456.7:
>>> from gpsea.model import VariantEffect >>> from gpsea.analysis.predicate import anyof, variant_effect
>>> tx_id = 'NM_123456.7' >>> effects = (VariantEffect.MISSENSE_VARIANT, VariantEffect.STOP_GAINED,) >>> predicate = anyof(variant_effect(e, tx_id) for e in effects) >>> predicate.description '(MISSENSE_VARIANT on NM_123456.7 OR STOP_GAINED on NM_123456.7)'
- Parameters:
predicates – an iterable of predicates to test
- gpsea.analysis.predicate.variant_effect(effect: VariantEffect, tx_id: str) VariantPredicate[source]
Prepare a
VariantPredicateto test if the functional annotation predicts the variant to lead to a certain variant effect.Example
Make a predicate for testing if the variant leads to a missense change on transcript NM_123.4:
>>> from gpsea.model import VariantEffect >>> from gpsea.analysis.predicate import variant_effect >>> predicate = variant_effect(VariantEffect.MISSENSE_VARIANT, tx_id='NM_123.4') >>> predicate.description 'MISSENSE_VARIANT on NM_123.4'
- Parameters:
effect – the target
VariantEffecttx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- gpsea.analysis.predicate.variant_key(key: str) VariantPredicate[source]
Prepare a
VariantPredicatethat tests if the variant matches the provided key.- Parameters:
key – a str with the variant key (e.g. X_12345_12345_C_G or 22_10001_20000_INV)
- gpsea.analysis.predicate.gene(symbol: str) VariantPredicate[source]
Prepare a
VariantPredicatethat tests if the variant affects a given gene.We recommend to consult the HUGO Gene Name Nomenclature Committee website to obtain the approved symbol for the gene of interest.
- Parameters:
symbol – a str with the approved gene symbol (e.g.
FBN1).
- gpsea.analysis.predicate.transcript(tx_id: str) VariantPredicate[source]
Prepare a
VariantPredicatethat tests if the variant affects a transcript.- Parameters:
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- gpsea.analysis.predicate.exon(exon: int, tx_id: str) VariantPredicate[source]
Prepare a
VariantPredicatethat tests if the variant overlaps with an exon of a specific transcript.Warning
We use 1-based numbering to number the exons, not the usual 0-based numbering of the computer science. Therefore, the first exon of the transcript has
exon_number==1, the second exon is2, and so on …Warning
We do not check if the exon_number spans beyond the number of exons of the given transcript_id! Therefore,
exon_number==10,000will effectively return False for all variants!!! 😱 Well, at least the genome variants of the Homo sapiens sapiens taxon…- Parameters:
exon – a positive int with the index of the target exon (e.g. 1 for the 1st exon, 2 for the 2nd, …)
tx_id – a str with the accession ID of the target transcript (e.g. NM_123.4)
- gpsea.analysis.predicate.protein_region(region: Tuple[int, int] | Region, tx_id: str) VariantPredicate[source]
Prepare a
VariantPredicatethat tests if the variant overlaps with a region on a protein of a specific transcript.Example
Create a predicate to test if the variant overlaps with the 5th aminoacid of the protein encoded by a fictional transcript NM_1234.5:
>>> from gpsea.analysis.predicate import protein_region >>> overlaps_with_fifth_aa = protein_region(region=(5, 5), tx_id="NM_1234.5") >>> overlaps_with_fifth_aa.description 'overlaps with [5,5] region of the protein encoded by NM_1234.5'
Create a predicate to test if the variant overlaps with the first 20 aminoacid residues of the same transcript:
>>> overlaps_with_first_20 = protein_region(region=(1, 20), tx_id="NM_1234.5") >>> overlaps_with_first_20.description 'overlaps with [1,20] region of the protein encoded by NM_1234.5'
- Parameters:
region – a
Regionthat gives the start and end coordinate of the region of interest on a protein strand or a tuple with 1-based coordinates.
- gpsea.analysis.predicate.is_large_imprecise_sv() VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant is a large structural variant (SV) without exact breakpoint coordinates.
- gpsea.analysis.predicate.is_structural_variant(threshold: int = 50) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant is a structural variant (SV).SVs are usually defined as variant affecting more than a certain number of base pairs. The thresholds vary in the literature, but here we use 50bp as a default.
Any variant that affects at least threshold base pairs is considered an SV. Large SVs with unknown breakpoint coordinates or translocations (
TRANSLOCATION) are always considered as an SV.- Parameters:
threshold – a non-negative int with the number of base pairs that must be affected
- gpsea.analysis.predicate.structural_type(curie: str | TermId) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant has a certain structural type.We recommend using a descendant of structural_variant (SO:0001537) as the structural type.
Example
Make a predicate for testing if the variant is a chromosomal deletion (SO:1000029):
>>> from gpsea.analysis.predicate import structural_type >>> predicate = structural_type('SO:1000029') >>> predicate.description 'structural type is SO:1000029'
- Parameters:
curie – compact uniform resource identifier (CURIE) with the structural type to test.
- gpsea.analysis.predicate.variant_class(variant_class: VariantClass) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant is of a certainVariantClass.Example
Make a predicate to test if the variant is a deletion:
>>> from gpsea.model import VariantClass >>> from gpsea.analysis.predicate import variant_class >>> predicate = variant_class(VariantClass.DEL) >>> predicate.description 'variant class is DEL'
- Parameters:
variant_class – the variant class to test.
- gpsea.analysis.predicate.ref_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], length: int) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the reference (REF) allele of variant is above, below, or (not) equal to certain length.See also
See Length of the reference allele for more info.
Example
Prepare a predicate that tests that the REF allele includes more than 5 base pairs:
>>> from gpsea.analysis.predicate import ref_length >>> predicate = ref_length('>', 5) >>> predicate.description 'reference allele length > 5'
- Parameters:
operator – a str with the desired test. Must be one of
{ '<', '<=', '==', '!=', '>=', '>' }.length – a non-negative int with the length threshold.
- gpsea.analysis.predicate.change_length(operator: Literal['<', '<=', '==', '!=', '>=', '>'], threshold: int) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant’s change length is above, below, or (not) equal to certain threshold.See also
See Change length of an allele for more info.
Example
Make a predicate for testing if the change length is less than or equal to -10, e.g. to test if a variant is a deletion leading to removal of at least 10 base pairs:
>>> from gpsea.analysis.predicate import change_length >>> predicate = change_length('<=', -10) >>> predicate.description 'change length <= -10'
- Parameters:
operator – a str with the desired test. Must be one of
{ '<', '<=', '==', '!=', '>=', '>' }.threshold – an int with the threshold. Can be negative, zero, or positive.
- gpsea.analysis.predicate.is_structural_deletion(threshold: int = -50) VariantPredicate[source]
Prepare a
VariantPredicatefor testing if the variant is a chromosomal deletion or a structural variant deletion that leads to removal of at least n base pairs (50bp by default).Note
The predicate uses
change_length()to determine if the length of the variant is above or below threshold.IMPORTANT: the change lengths of deletions are negative, since the alternate allele is shorter than the reference allele. See Change length of an allele for more info.
Example
Prepare a predicate for testing if the variant is a chromosomal deletion that removes at least 20 base pairs:
>>> from gpsea.analysis.predicate import is_structural_deletion >>> predicate = is_structural_deletion(-20) >>> predicate.description '(structural type is SO:1000029 OR (variant class is DEL AND change length <= -20))'
- Parameters:
threshold – an int with the change length threshold to determine if a variant is “structural” (-50 bp by default).
- gpsea.analysis.predicate.protein_feature_type(feature_type: FeatureType | str, protein_metadata: ProteinMetadata) VariantPredicate[source]
Prepare a
VariantPredicateto test if the variant affects a feature_type of a protein.- Parameters:
feature_type – the target protein
FeatureType(e.g.DOMAIN).protein_metadata – the information about the protein.
- gpsea.analysis.predicate.protein_feature(feature_id: str, protein_metadata: ProteinMetadata) VariantPredicate[source]
Prepare a
VariantPredicateto test if the variant affects a protein feature labeled with the provided feature_id.- Parameters:
feature_id – the id of the target protein feature (e.g. ANK 1)
protein_metadata – the information about the protein.