May 2021 Finalists

Top ranked team projects from the May 2021 Data Science Boot Camp

TEAM 10 NLPs

Frank Hidalgo, Joseph Szabo, Christopher Zhang, Sean Perez, Kun Jin

Acronym/Abbreviation (short form) disambiguation is one of main challenges when using NLP methods to understand medical records. While this topic has long been studied, it is still a work in progress. Current strategies often involve having manually curated datasets of abbreviations and train classifiers. The main problem of that approach is that curated datasets are sparse and don't include all the short forms. In Dec 2020, a paper came out where they created a large dataset of short forms as one of their steps in their pipeline to pre-train models. The goal of our project would be to build upon their short form disambiguation piece and create a tool to disambiguate a medical short form using its context. Example of the usage of our tool: original_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, AB 2. She has no history of adverse reaction to anesthesia." AB could stand for "abortion", "ankle-brachial", "blood group in ABO system", "A, B lines in Kerley lines". disambiguated_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, abortion 2. She has no history of adverse reaction to anesthesia."

TEAM10

TEAM 7 MaizeFinder

Tuguldur Sukhbold, Michael Darcy, Pol Arranz-Gibert, AJ Adejare

Our problem is to accurately predict maize field centers in Africa using very low resolution satellite images. The dataset contains many disparate entries including two satellite imagery, one with higher spatial res (Planet) and one with higher temporal resolution and wavelength coverage (Sentinel-2), partial metadata about the crop fields including estimated yield, size, and subjective quality of the measurement. We are employing CNN based image segmentation models to compute displacement vector from the image center.

TEAM7

TEAM 35 Ruby

Rongqing Ye, Nakyung Lee, Rachel Domagalski, Hannah Pieper

ClassifyMyMeds: Predicting Prior Authorization Approval and Volume for CoverMyMeds

When a patient tries to get a prescription from a pharmacy, a claim is created against the patient's insurance (payer). Such a pharmacy claim might be rejected for various reasons and might require prior authorization (PA). A PA is a form that providers submit on behalf of a patient to the insurance making a case for the prescribed therapy. In this project, we surveyed many classifiers for predicting how likely a certain PA will be approved, and forecast future volume of PAs with time series analysis techniques. Additionally, we identify the formulary for each payer and predict the number of times certain drugs can be refilled.

TEAM35

TEAM 38 Amethyst

Jimin Kim, Francisco Martinez, Noah Schoem, Ifeoma Ugwuanyi

We aim to extract from Qarik's PDFs of World Bank loans the following:

Loan amounts
Borrower country
Loan purpose and targeted category/industry

and cross-reference these with region, income level, and other public data to identify historical trends in the World Bank's lending program.

TEAM38

TEAM 8 Will code for food

Kung-Ching Lin, Ghanashyam Khanal, Shahnawaz Khalid, Dyas Utomo

We are predicting the popularity of financial social media posts with intrinsic features of the posts.

This is a SIG corporate project.

TEAM8

TEAM 6 Sawbones

Ray Sharma, Amir Kazemi-Moridani, Ami Choi, Andrey Tarasov

Identifying presence of dementia from brain magnetic resonance imaging, using a convolutional neural network.

TEAM6

THE ERDŐS INSTITUTE