top of page

Top ranked team projects from the May 2021 Data Science Boot Camp


Frank Hidalgo, Joseph Szabo, Christopher Zhang, Sean Perez, Kun Jin


Acronym/Abbreviation (short form) disambiguation is one of main challenges when using NLP methods to understand medical records. While this topic has long been studied, it is still a work in progress. Current strategies often involve having manually curated datasets of abbreviations and train classifiers. The main problem of that approach is that curated datasets are sparse and don't include all the short forms. In Dec 2020, a paper came out where they created a large dataset of short forms as one of their steps in their pipeline to pre-train models. The goal of our project would be to build upon their short form disambiguation piece and create a tool to disambiguate a medical short form using its context. Example of the usage of our tool: original_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, AB 2. She has no history of adverse reaction to anesthesia." AB could stand for "abortion", "ankle-brachial", "blood group in ABO system", "A, B lines in Kerley lines". disambiguated_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, abortion 2. She has no history of adverse reaction to anesthesia."

TEAM 7 MaizeFinder

Tuguldur Sukhbold, Michael Darcy, Pol Arranz-Gibert, AJ Adejare


Our problem is to accurately predict maize field centers in Africa using very low resolution satellite images. The dataset contains many disparate entries including two satellite imagery, one with higher spatial res (Planet) and one with higher temporal resolution and wavelength coverage (Sentinel-2), partial metadata about the crop fields including estimated yield, size, and subjective quality of the measurement. We are employing CNN based image segmentation models to compute displacement vector from the image center.

TEAM 35 Ruby

Rongqing Ye, Nakyung Lee, Rachel Domagalski, Hannah Pieper


ClassifyMyMeds: Predicting Prior Authorization Approval and Volume for CoverMyMeds

When a patient tries to get a prescription from a pharmacy, a claim is created against the patient's insurance (payer). Such a pharmacy claim might be rejected for various reasons and might require prior authorization (PA). A PA is a form that providers submit on behalf of a patient to the insurance making a case for the prescribed therapy. In this project, we surveyed many classifiers for predicting how likely a certain PA will be approved, and forecast future volume of PAs with time series analysis techniques. Additionally, we identify the formulary for each payer and predict the number of times certain drugs can be refilled.

TEAM 38 Amethyst

Jimin Kim, Francisco Martinez, Noah Schoem, Ifeoma Ugwuanyi


We aim to extract from Qarik's PDFs of World Bank loans the following:

  • Loan amounts

  • Borrower country

  • Loan purpose and targeted category/industry

and cross-reference these with region, income level, and other public data to identify historical trends in the World Bank's lending program.

TEAM 8 Will code for food

Kung-Ching Lin, Ghanashyam Khanal, Shahnawaz Khalid, Dyas Utomo


We are predicting the popularity of financial social media posts with intrinsic features of the posts.


This is a SIG corporate project.

TEAM 6 Sawbones

Ray Sharma, Amir Kazemi-Moridani, Ami Choi, Andrey Tarasov


Identifying presence of dementia from brain magnetic resonance imaging, using a convolutional neural network.

bottom of page