
Certificate of Completion
THIS ACKNOWLEDGES THAT
HAS COMPLETED THE SPRING 2026 DATA SCIENCE BOOT CAMP
Ayman Hussein
Roman Holowinsky, PhD
MARCH 25, 2026
DIRECTOR
DATE

TEAM
A global model of DNA melting temperature
Ayman Hussein

This project develops a global, physics-based, data-driven, interpretable model to predict DNA melting temperature (Tm) under mixed monovalent (Na⁺) and divalent (Mg²⁺) salt conditions.
Using a dataset of 516 measurements across 12 sequences, we compare Linear Regression, Generalized Additive Models (GAM), and Decision Trees with multiple feature sets capturing sequence composition, length, and salt interactions.
Leaving One Sequence Out CV strategy reveals that GAM with minimal “typical” features and salt interactions achieves the best balance between accuracy and generalization, with MAE comparable to state-of-the-art models (~1–2 °C).
Unlike prior approaches requiring multiple formulas or extensive data, our model provides a single, global, and physically interpretable framework, demonstrating that nonlinear salt effects can be captured with a compact feature set and limited data.
