Skip to Search
Skip to Navigation
Skip to Content

University of Connecticut School of Engineering Electrical and Computer Engineering - Senior Design

Aaron Spaulding Personal Weekly Update – October 25, 2020

Written by Aaron Spaulding

Feature Extraction

This week we decided on a list of features and split up the task of writing the feature extraction codes. Table 1 shows features I defined and wrote code for this week.

As usual our code is published and available on our GitHub here.

Dominant Frequency Percentiles

To capture the dominant frequency percentiles we take the Fourier transform, extract the highest frequency for every 12ms window with an overlap of 6ms, and then combine these to make a CDF of the audio clip. (At 44.1kHz this corresponds to 500 and 250 samples respectively.) The sound frequency percentiles for 16 levels are then extracted from the CDF. (30th, 40th, 50th, 60th, 70th, 80th, 90th, 91th, 92th, 93th, 94th, 95th, 96th, 97th, 98th, 99th, and 100th) (Y. Tseng (2020)) We use a larger range of levels than were used in the paper since we are not facing severe computational limits. (We have access to the UConn HPC for our models and we are planning to select the highest weighted features once the models are complete.) One note is that the 100th percentile also represents the largest frequencies in the sample. (This was another feature we were planning to add.)

Image 1 and 2 show CDF’s created by this method where dashed red lines show the sample locations. These were created from random inputs from the “BirdVox-DCASE-20k” dataset. (Clips “000db435-a40f-4ad9-a74e-d1af284d2c44.wav” and “00053d90-e4b9-4045-a2f1-f39efc90cfa9.wav”

“Better Than MFCC Audio Classification Features” (G. Ruben (2013))

The second set of features were included in “cepstralSpectral.py.” These include power, energy, spectral centroid, bandwidth, and zero_crossing_rate. (The zero_crossing_rate represents the number of times the time-series signal crosses the x-axis.) This code also defines the first fifteen spectral and cepstral coefficients as defined here. (G. Ruben (2013)) We extract the first fifteen from each analysis since we do not have any major computational limits. These coefficients represent a set of features that may include principal components and are detailed more in the paper.

Other Code

A large portion of the week was also spent writing supporting code for the feature extraction and for handling large amounts of data. I wrote a wrapper for the “pydub” python library and defined my own Audio object that contains the pre-processed time-series and Fourier series data that we will need. The import code I wrote also supports multithreading, which is new and should allow for faster imports when dealing with our larger datasets.

In addition to these functions, I also defined helper functions to make analysis easier. I also a wrote a python filter function, as well as functions to plot and export graphs and spectrograms. These all also support multithreading as well.

Beamforming

I also made some progress with the beamforming algorithm. I was able to finalize some of the code for the linear array which now does work.


Y. Tseng (2020) Yi-Chin Tseng, Bianca N. I. Eskelson, Kathy Martin & Valerie LeMay (2020) Automatic bird sound detection: logistic regression based acoustic occupancy model, Bioacoustics, DOI: 10.1080/09524622.2020.1730241

G. Ruben (2013) Gonzalez, Ruben (2013/10/01) Better Than MFCC Audio Classification Features, 10.1007/978-1-4614-3501-3_24 https://core.ac.uk/download/pdf/143870996.pdf

This entry was posted in Aaron Spaulding, Personal Update and tagged , , , , by acs14007. Bookmark the permalink. Posted on by