Quickstart Documentation Paper GitHub Contribute Leaderboard

Democratizing protein 3D structure data across machine learning disciplines. Batteries included.

ProteinShake provides one-liner imports of large scale, preprocessed protein structure datasets for various model types and frameworks. You bring the model, we bring the data. Together with annotations, splits and metrics.

pip install proteinshake
Uniting RCSB PDB, AlphaFold DB, and annotated databases
Protein- and residue-level labels for classification and regression
Automatic conversion to graphs, voxels and point clouds
Support for all major frameworks
Task API for benchmarking
Data splits based on sequence and structure similarity
More than 500.000 structures
Atom- and residue-level resolution
Preprocessed and hosted



Beat the baselines?

Let people know! Create a pull request with your performance metrics in the GitHub repository, we will review your submission and publish it on the leaderboard. See also the submission guide.

I want to contribute!

We are very happy to integrate contributions from the community. Got a new dataset and want to share it? Need a new representation for your project that others could profit from? You are more than welcome to add it to ProteinShake! Just create a pull request. We compiled more information in the contribution guide to help you understand the code structure.

Found a bug or want to suggest a feature?

Please let us know by opening an issue on GitHub. We are eager to improve ProteinShake and provide you with a great user experience. Feature requests are also welcome.