As part of my nightly piano key mashing sessions when I’m bored, I usually try to pick apart the notes of modern songs for my amusement. Usually I don’t get very far because having given up piano lessons years ago, I never picked up a musician’s ear. So this time, after getting hung up on an Anberlin song, I decided to finish a side project that I’d abandoned due to coding frustration. It was a remnant of the Conducting (of an orchestra) Robots course, where one of the tangential brainstorms of my group was to devise a way for our robot conductor to follow along with what it was hearing. We never got far, because we were using a language called ChucK, which was very unfamiliar and only moderately well documented.
After scanning through code libraries in Processing, I discovered a few functions that did everything ChucK did, but natively supported in Processing, so I didn’t have to port data from one language to another via OSC. It’s called Minim, and it does one important thing: Forward Fast Fourier Transforms. Using the FFT, I can take an audio sample (read: Music) and break it down into its basic frequencies. These frequencies correlate to notes (on a piano) which I rigged to display in a crude visualization.
There are a few areas that require polishing, most notably normalizing the audio, and statistically determining whether or not a note was actually played. At loud or soft volumes, absolute thresholds do not work well. Nor do they work when you start adding in extra instruments or vocals. I may also need to increase the resolution of the FFT, but with the way it’s coded, it requires no extra array manipulation on my end. I’ll keep working on it, so that maybe one day I can properly deconstruct a song. Not bad for a night’s work, I think… Proof of concept video below:
One Republic – All The Right Moves (Cover, by Will Ting)