Team Erdio: Audio Classification for Urban Sounds

written by

Olivia Haimerl

Thursday, February 16, 2023

Congratulations to Team Erdio for being a Top 5 Project of the The Erdős Institute’s Spring 2022 Data Science Bootcamp with their project Erdio: Audio Classification for Urban Sounds!

Composed of McGill University students Matthew Frick, Matthew Heffernan, and Paul Jreidini, Team Erdio successfully created an open-source gunshot identification system trained on realistic audio that also classifies additional urban sounds and provides information on first-response via siren detection. In creating this system, the team noted that “timely identification of safety-critical events, such as gunshots, is of great importance to public safety stakeholders. However, existing systems only deliver limited value by not classifying additional urban sounds.” To accomplish this feat, Team Erdio utilized data from UrbanSound8K to classify urban field recordings such as air conditioners, children playing, drilling, jackhammers, street music, car horns, and others while also classifying gunshots as a separate urban sound. Through data cleaning, feature engineering (including breaking down each audio file into human audible frequency bins, decomposing into harmonic and percussive components, extracting relative power and other features), feature selection, and classifier training, Team Erdio was able to create a system that “achieved F ~ 85% for top models [on identifying gunshots], balancing recall and precision” tested on episodes of Futurama that had multiple urban sounds, including gunshots.

When determining what project to develop over the course of the Data Science Bootcamp, Matthew H. noted that the team wanted to find a good existing data set to work with, as they “wanted to focus on the machine learning and data science component” of the project as opposed to directly focusing on the data acquisition process. Paul further discussed that the team desired to challenge themselves: “we wanted to work with something outside of our comfort zone, but not so much so that we didn’t know where to start.” With the fast and intensive timeline of the Spring Bootcamp, Team Erdio highlighted the importance of splitting up the work according to each student’s strengths, with Matthew H. having experience in machine learning and Python, Paul having experience in coding, and Matthew F. having experience in data analysis. However, to successfully complete their project, the work often required them to source information elsewhere. As Matthew F. described, “I had previous knowledge of analysis on classifier data, but never in the audio medium, so I went and found a textbook on audio analysis to find some directions and inspiration.” Further, the team had to overcome difficulties when constructing the system, as there were slight code mismatches that persisted and challenges that arose with using audio, such as substantial foreground noise in the audio clips, competing sounds that were often confused, such as drills, engines, and jackhammers, and gunshot noises suffering from class imbalance during the training data.

Beyond completing the gunshot identification system equipped with urban sounds, Team Erdio also engineered general features of the system for off-line classification tasks and demonstrated how the system could provide additional value for other stakeholders beyond the government

and first responders, such as the film and television industry. Moreover, if given additional time to work on the audio classification system, Matthew H. noted that they “would like to expand the underlying data set, as it was very realistic but cleaner data would be better to train the system with.” Matthew F. noted that they “would like to streamline and speed up some of the data analysis so that we could more realistically make an app or something where an individual could input an audio recording and it could be used for a variety of applications.” The team suggested that it could even be used to help with Google traffic updates, as individuals could upload audio clips with live traffic and construction sounds.

The entirety of Team Erdio noted that the most rewarding part of the project was actually seeing the system work and classify gunshots in the episodes of Futurama (that were complete with a variety of other urban sounds). Matthew F. highlighted that “seeing the classifiers actually pick up on the gunshots in completely new data that had nothing to do with our data training was a really great moment.” Although the team agrees that their success was mostly due to hard work and intense focus during the bootcamp, they noted that to be successful, a team should initially start with a good data set that doesn’t require intensive data cleaning. Moreover, Team Erdio noted that future bootcamp participants shouldn’t be afraid to jump forward and pick any classifier while doing test cases for feature engineering as machine learning may not always correspond with human intuition.

Congratulations again to Team Erdio for being a Top 5 Project of The Erdős Institute’s Spring 2022 Data Science Bootcamp!

TEAM

Counting Crossings - Team 2

Jared Able

Image analysis: The initial goal is to count, as accurately as possible, the number of times an object crosses over itself (think knots or road systems). This count is a basic measure of the complexity of the object. Objects that don't cross themselves are simple, while objects that cross themselves many times are complicated.

Once we have this count in place, we'll try to reduce the complexity of our objects without changing their underlying structure. In the case of roads, this would result in a road system that accomplishes similar goals while being more easily navigated and less resource-intensive to build.

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Team Erdio: Audio Classification for Urban Sounds

TEAM

Counting Crossings - Team 2

Jared Able