Team Starry Night: Artist Classification and Stylization
Monday, February 13, 2023
Congratulations to Team Starry Night for being a Top 5 Project of the The Erdős Institute’s Spring 2022 Data Science Boot Camp with their project Starry Night: Artist Classification and Image Style Transfer with Neural Networks!
Composed of Ohio State University Physics Ph.D. Candidates Xiaozhou Feng, Xiaoyu Liu, Estefany Nunez, Kai Wei, and recent OSU Physics Ph.D. Bryan Reynolds, Team Starry Night successfully compiled a dataset of paintings by six famous artists (now available on Kaggle) and used it to create an Artist Classifier app. They created their dataset by combining images from an existing Kaggle dataset and scraping data from WikiArt, then removing duplicates and grayscale or CMYK images, and finally, manually sifting through the data to remove all works of different media (non-paintings). They utilized five pre-trained Convolutional Neural Networks (CNNs) from Pytorch to achieve approximately 90% accuracy, an improvement over their individual success of 83-88%. Unlike many projects where one will select a larger portion of the data for a training set, Team Starry Night used a smaller percentage for training and validation sets, preserving the bulk of their dataset for testing. Kai explained that this was to satisfy their optimization goal of seeing how small a sample they could use to get the desired result: “We want to show it can learn with just a few samples using a very small fraction of the whole dataset.” Bryan took a business mindset, “typically if you’re putting something in production you’re expecting to use it on a bunch of new data that comes in that you can classify with your algorithm.” However, these artists will not be producing more paintings, so “our hope was that a user could come in and input a painting that wasn’t something that was already in the training set so it wouldn’t just be like an automatic checkmark.” Furthermore, the fewer paintings needed to train the model, the more broadly it may be applied as they add in newer artists. They were still careful to enhance the diversity of their dataset through a random chop, rotation, and random hue resampling approach.
In choosing a project to develop over the course of the Data Science Boot Camp, Estefany emphasized that they took the time to discuss what they each wanted to get out of the project and that they agreed they wanted to solve a classification problem. Xiaoyu noted the group’s interest in Convolutional Neural Networks and that “instead of developing some algorithms ourselves, we wanted to utilize some already well-developed algorithms and do something fancy.” Computer vision stood out to them as an interesting direction, more so than working with a numeric data set. Ultimately, they were extremely pleased with their completed project. However, if given additional time, they agreed that they would add in more artists to enhance their app’s functionality. As Bryan noted, “the more artists we could use the more exciting it would be.”
Team Starry Night took the tools they learned in creating their artist classification system to further produce an Artist Style Transfer Tool, which allows the user to select the artist of their choice, upload a photo, and have their photo stylized like the artist’s most famous works. They accomplished this through fine-tuning their model to a famous painting representative of each artists’ style and used Magenta (from TensorFlow_Hub) to apply the style to an image input by the user. Given more time to work on their classifier, they would like to build on their stylization tool, potentially imitating a style category, instead of a specific artist. Alternatively, they considered a painter DNA test: Instead of identifying the painter, they think it would be interesting to have the classifier break down the style of an input. For instance, take a newer painting and create an app that would tell you the artists influencing the style. Xiaoyu imagines “users can use their own paintings, so we can tell them your painting is in some genre, which would be cool.” To which Bryan added, for instance it may say “you’re 98% Claude Monet.”
The whole team agreed that the most rewarding part of the project came from the most challenging aspect: they were initially unfamiliar with many of the tools they set out to use. Both Estefany and Xiaoyu remarked on the joy of how far they were able to take this project when, after choosing their topic, they didn’t know where to start. They also enjoyed the small moments of success along the way, and at the end of the project, it was clear just how much they had learned; they were ecstatic that they were able to apply that knowledge to produce a successful classifier. Xiaozhou further added how he enjoyed the different experience, as his research involves some coding, but is more focused on the math and physics applications. They all agreed, it was a marked change from their usual research, which stretched years with minimal feedback. Kai emphasized that to “see the output in a few weeks, and I can see it developing from scratch to a workable application,” it was very encouraging to have the immediate–and positive–feedback.
Reflecting on their process, Team Starry Night stressed that future boot camp participants should “be proactive and get started early,” and Estefany emphasized the importance of communicating with teammates. Team Starry Night attributes much of their success to strong communication and working together to solve problems when they got stuck. Kai noted the benefit of them setting a clear goal from the start: to complete a project, knowing they would learn a lot just by finishing. This was Xiaoyu’s second time doing the boot camp, and “by finishing the project [the first time], I learned a lot,” but “if you're not satisfied with the first project, then do the boot camp again and every time you learn something new!”
Congratulations again to Team Starry Night for being a Top 5 Project of The Erdős Institute’s Spring 2022 Data Science Boot Camp!