MAY 2021 FINALISTS
Top ranked team projects from the May 2021 Data Science Boot Camp
-
TEAM 7 MaizeFinder - Predicting Maize Field Centers Using Very Low Res Images - 2nd Place
-
TEAM 35 Ruby - Predicting Prior Authorization Approval and Volume - 3rd Place
-
TEAM 38 Amethyst - Comparing World Bank Loan Impact Across Sectors - Finalist
-
TEAM 8 Will code for food - Predicting Popularity of Financial Social Media Posts - Finalist
-
TEAM 6 Sawbones - Identifying Dementia from Brain MRIs - Finalist
TEAM 10 NLPs
Frank Hidalgo, Joseph Szabo, Christopher Zhang, Sean Perez, Kun Jin
Acronym/Abbreviation (short form) disambiguation is one of main challenges when using NLP methods to understand medical records. While this topic has long been studied, it is still a work in progress. Current strategies often involve having manually curated datasets of abbreviations and train classifiers. The main problem of that approach is that curated datasets are sparse and don't include all the short forms. In Dec 2020, a paper came out where they created a large dataset of short forms as one of their steps in their pipeline to pre-train models. The goal of our project would be to build upon their short form disambiguation piece and create a tool to disambiguate a medical short form using its context. Example of the usage of our tool: original_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, AB 2. She has no history of adverse reaction to anesthesia." AB could stand for "abortion", "ankle-brachial", "blood group in ABO system", "A, B lines in Kerley lines". disambiguated_sentence = "The patient states that she has had dizziness, nausea, some heartburn, and some change in her vision. She is gravida 6, para 4, abortion 2. She has no history of adverse reaction to anesthesia."
TEAM 7 MaizeFinder
Tuguldur Sukhbold, Michael Darcy, Pol Arranz-Gibert, AJ Adejare
Our problem is to accurately predict maize field centers in Africa using very low resolution satellite images. The dataset contains many disparate entries including two satellite imagery, one with higher spatial res (Planet) and one with higher temporal resolution and wavelength coverage (Sentinel-2), partial metadata about the crop fields including estimated yield, size, and subjective quality of the measurement. We are employing CNN based image segmentation models to compute displacement vector from the image center.
TEAM 35 Ruby
Rongqing Ye, Nakyung Lee, Rachel Domagalski, Hannah Pieper
ClassifyMyMeds: Predicting Prior Authorization Approval and Volume for CoverMyMeds
When a patient tries to get a prescription from a pharmacy, a claim is created against the patient's insurance (payer). Such a pharmacy claim might be rejected for various reasons and might require prior authorization (PA). A PA is a form that providers submit on behalf of a patient to the insurance making a case for the prescribed therapy. In this project, we surveyed many classifiers for predicting how likely a certain PA will be approved, and forecast future volume of PAs with time series analysis techniques. Additionally, we identify the formulary for each payer and predict the number of times certain drugs can be refilled.
TEAM 38 Amethyst
Jimin Kim, Francisco Martinez, Noah Schoem, Ifeoma Ugwuanyi
We aim to extract from Qarik's PDFs of World Bank loans the following:
-
Loan amounts
-
Borrower country
-
Loan purpose and targeted category/industry
and cross-reference these with region, income level, and other public data to identify historical trends in the World Bank's lending program.