top of page

Your certificate is now private

CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE FALL 2022 DATA SCIENCE BOOT CAMP

Ayush Khaitan

Roman Holowinsky, PhD

DECEMBER 14, 2022

DIRECTOR

DATE

clear.png

TEAM

Pine

Erika Ordog, Richard van Krieken, Ayush Khaitan

clear.png

In this project, we develop a model to predict/rank the thermostability of enzyme variants based on experimental melting temperature data. We use a dataset provided by Novozymes through their Kaggle competition: Novozymes Enzyme Stability Prediction | Kaggle. This dataset provides the experimentally measured thermostability (melting temperature) data, natural enzyme sequences, as well as engineered sequences with single or multiple mutations upon the natural sequences.

We identify three predictive frameworks to explore the project, and perform exploratory data analysis with each framework. We also prepare a summary report on the performance of the framework, select the best operating framework, after initial optimization, utilizing Normalized Root Mean Squared Error and Spearman Correlation Coefficient, and optimize the selected framework through cross-validation. Finally, we prepare a project report with visualizations

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL
bottom of page