Democratizing protein 3D structure data across machine learning disciplines. Batteries included.
ProteinShake provides one-liner imports of large scale, preprocessed protein structure datasets for various model types and frameworks. You bring the model, we bring the data. Together with annotations, splits and metrics.
pip install proteinshake
Uniting RCSB PDB, AlphaFold DB, and annotated databases
Protein- and residue-level labels for classification and regression
Automatic conversion to graphs, voxels and point clouds
Support for all major frameworks
Task API for benchmarking
Data splits based on sequence and structure similarity
More than 500.000 structures
Atom- and residue-level resolution
Preprocessed and hosted