VICI: in silico controls for PPI variants
(⌘ or Ctrl + click to follow ↑)
As part of my project at SickKids Research Institute testing AlphaFold-Multimer’s capabilities in detecting the effects of missense residues at protein-protein interfaces (PPIs), I needed to find variants of known effect (pathogenic and benign), restricted to those that occur at PPI-encoding sequences.
This meant I needed three sets of data:
- Pathogenic/likely pathogenic variants. If AF-M can detect the impact of variants on PPIs, we’d expect these to have some effect on its predicted structures (positive control).
- Benign/likely benign variants. Similarly, we’d expect these variants to have no effect on PPIs/structures predicted by AF-M (negative control).
- Variants known to occur in PPI-encoding sequences. Since I was only interested variants directly involved in PPIs, I needed to restrict the variants in the first two bullets to the ones that also appear in this set.
In my research, I was able to easily get the first two sets of data from ClinVar. As well, PIONEER also offers a dataset of human variants occurring at PPI-encoding sequences. I packaged up the code I built to do this task into vici
, which takes data from both of these sources (as specified by the user) to output a JSON of positive and negative controls.
./vici.sh \
-B [path/to/benign/variant/table] \
-P [path/to/pathogenic/variant/table] \
-O [output_folder_name]
While I used this to test AlphaFold, it may find other uses in obtaining in silico controls for further study of PPI-perturbing variants. A full description of the pipeline can be found the in the README documentation for the tool.
vici
currently only accepts ClinVar search data as inputs — which may explain the 0 stars on this repo. However, that means there’s alway opportunity for improvement!