Skip to Search
Skip to Navigation
Skip to Content

University of Connecticut School of Engineering Electrical and Computer Engineering - Senior Design

Tag Archives: acs14007

Aaron Spaulding Personal Weekly Update – Week 12

Written By Aaron Spaulding

This week I updated the classification model from the baseline result. I tested multiple configurations changing the transfer learning approach, augmentations applied, and classification model architecture. Since I have many results I include just the worst and best performers.

Worst Model

This model was by far the worst performer with almost no predictive ability. The plot of accuracy vs. epochs shows how the model was unable to converge for either the test or train sets, never passing 20% accuracy.

The confusion table reveals more into this and shows how the model only predicts one type of bird, the cardinal.

Best Model

The best model I developed was developed using the “resnet_v2_152” as a base model with three dense layers for the classification model. The first two had sizes 64 and 32 with the ‘relu’ activation function and the final layer had size 19 (19 classes) with the ‘sigmoid’ function. For this, I used a learning rate of 0.001. My early stopping stopped training at 33 epochs after the model converged. To double-check I trained the model for 10 additional epochs however the model did not improve from before. The image below shows the model loss vs. epochs and may indicate some overfitting occurring. I also applied all the discussed augmentations and balances the train and test sets.

The confusion table also is much improved with many more predictions along the diagonal.

This model was able to achieve a final micro-F1 of 63.54. This is a fantastic result since this score on a recent bird classification challenge on Kaggle would be a top 2% submission.

Posted in Aaron Spaulding, Personal Update | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 11

Written By Aaron Spaulding

This week I received the initial results of our classification method. Most of the time this week was spent writing the code for the method and setting everything up to run properly. However, I was able to set a baseline result which we can improve on in the upcoming weeks!

Baseline Result

It is important to note that this is a baseline result. This means we use a simple model without applying augmentations or optimizations. This also means our next models should be improvements from this!

This baseline model was built with the “resnet_v2_152” as the base model for transfer learning. I used three dense layers as my classification model and saved 25% of the dataset for validation.

https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8

The figures below show the loss and accuracy as the model is trained. The model quickly overfits the training set and stops learning on the test set. This leaves us with accuracy just below 60%.

To also test performance I created a confusion table for all our species. It is clear that the model is not a strong performer. In addition, we can see that the classes are not balanced which may be contributing to the low accuracy score.

Posted in Aaron Spaulding, Group Update, Personal Update | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 10

Written By Aaron Spaulding

During the past week, I continued working on the classification model that will be used to determine the species of a bird sound after detection. Most of the week was spent processing the clips we labeled over break and applying augmentations. The image below shows the process I created where I import the audio data for each clip, mix it with Gaussian noise, and then generate a spectrogram with 251×251 shape. These processed clips will be used to train and validate our initial classification models.

One example of this process is shown below for the northern cardinal (Cardinalis cardinalis). The first image shows a picture of the bird taken this winter and the second shows the fully processed spectrogram ready for training.

Cardinalis cardinalis
Cardinalis cardinalis processed spectrogram

This week I also met with Paul to record the 24-hour dataset. We set the arrays and temperature sensor up to record in my yard. We were able to capture the full 24-hour dataset with birds and temperature!

Posted in Aaron Spaulding, Group Update, Personal Update | Tagged , , , ,

Aaron Spaulding Personal Weekly Update – Week 9

Written By Aaron Spaulding

During the past week, I started the classification model that will be used to determine the species of a bird sound after detection. Since this is an entirely new model a lot of time was spent reading literature and reviewing other work on this topic.

After discussion with the sponsor an outline for our method has been finalized. Some of the things we will try are augmentations, using transfer learning, and training on spectrograms instead of audio clips.

Augmentations will be useful since we have a limited dataset. I plan to apply the following after separating my train/test sets to help expand the data.

  • Time shifts
  • Time reversal
  • Time scaling
  • Vertical frequency shifts

Combinations of these with different values will be applied to artificially expand the data and could help the models.

I also plan to also use transfer learning. I found three models that may perform well. These include the VGG-19, Resnet V2, and EfficientNet models. Each of these has been pretrained on ImageNet, a collection of images with 1,000 classes, and may generate relevant features. The image below shows the VGG-19 architecture.

The final step will be to train our classifier model on top of the features generated by the pretrained model. For this, we will test a linear model and a simple neural net.

Posted in Aaron Spaulding, Group Update, Personal Update | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 8

Written By Aaron Spaulding

During the past week, I focused on adjusting the plane-wave simulation scripts to work on our multichannel arrays. This week I also wrote codes to extract metrics the sponsor defined. A list of these is shown in the image below.

The images below show the raw beampattern and labeled beampattern marked at the beam-width locations.

Posted in Aaron Spaulding, Group Update, Personal Update | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 7

Written By Aaron Spaulding

During the past week, I focused on developing the scripts to plane waves incident on the array. By calculating the time offsets of a single tone I was able to create synthetic multichannel audio clips. The images below show the beamformer beam pattern outputs as a source rotates around the 4-channel array. These were calculated on the outputs of my script.

This will be used in the next steps of beamformer validation where we will measure the beam responses, and sponsor defined metrics for our array and beamforming method!

Posted in Uncategorized | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 6

Written By Aaron Spaulding

During the past week, I explored other possible ensembling methods as we expand our model. I tested three methods, fitting a ridge regression, a linear model, and taking an average of the three models used last week. I limited the ensembling methods to more simple regression methods to help reduce overfitting that could develop when using larger models with more parameters.

Linear Ensemble

The first new method I tested was fitting a linear model to the model outputs. While this method should produce results close to the mean, actual performance with an AUC of 78.18 showed a decrease in predictive ability. An AUC of 78.18 is lower than each of the individual models.

Tikhonov Regularization (Ridge Regression)

I also tested Ridge Regression to see if ensemble performance could be improved with a more general solution. This method produced a slightly better result but still suffered for many of the cases with an AUC of 78.16.

Average Ensemble

This third method is the most simple and should not underperform any individual model. The average ensemble performs the strongest with an AUC of 86.28.

Posted in Uncategorized | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 4-5

Written By Aaron Spaulding

During the past two weeks, I focused on building the classification model and further expanding the current detection method.

Classification Method

The first step in developing our classification method is to construct our synthetic dataset of bird sounds native to our region. Last week I started writing the code to combine labeled bird sounds with our noise sources. The next step will be to run this with noise sounds captured by the array.

The Ensemble

This week I also expanded the detection model. Our current efforts have focussed on developing a single well-performing model, however building and testing an ensemble could improve overall performance. To test this I constructed an ensemble composed of three models, the GBM, RF, and the BART. Each of these was evaluated with a 4-fold random CV on the BirdVox-DCASE-20k dataset to help improve training speeds.

GBM

The GBM is the model currently in use by our detection method. This model achieved an AUC of 86.29 (the slight drop in performance is due to the low number of folds in the cross-validation) and had a good separation of variables.

RF

The random forest often performs similarly to the GBM when tuned. I used the ranger package on R with a default value of 500 trees and using Gini impurity to measure variable importance. This model performed similarly to the GBM with an AUC of 85.88. This may improve after tuning parameters (the GBM AUC increased almost by 1 point after parameter tuning and with a 100-fold CV. A similar increase would make this model out-perform the current GBM.) The image below shows the CDF of model predictions.

BART

A bayesian additive regression tree model was also built and measured. This model was built with 150 trees using the R BART package. This model performed the lowest by AUC standards at 84.74, however, this model also leaves the most room for improvement and could be tuned.

The Average Ensemble

Combing these models by averaging outputs produced a result better than the RF or BART individually but slightly worse than the GBM with an AUC of 86.28. This result shows that if the RF or BART models were tuned ensemble performance could be greater than just using the GBM. The image below shows the distribution of results for the combined model.

Future Work

The next steps on the ensemble should include tuning the BART and RF models. It may also be beneficial to compute optimized weighted averages for the ensemble as well as possibly building a GLM or linear model to combine the output.

Posted in Uncategorized | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – Week 3

Written By Aaron Spaulding

Variable Analysis

During the past semester and over the holiday I designed new features efforts to improve the detection model. While the detection model is now great, we now have over 240 predictors in our dataset which is not ideal and could lead to overfitting. This should also be addressed since we may need to design new predictors when building the classification model.

Variable Importance

I previously used relative variable importance when developing new features. Over the break, I removed a low weighted set of features, the frequency percentiles, and was able to slightly improve model performance. Figure 1 shows the variables with the highest relative importance to the GBM.

The “s_energy_band” feature set has almost half the relative importance. I designed the “energy_band” features to quantify the energy of normed bands of the spectrogram, an aspect that could be helpful when analyzing audio clips. Figure 2 shows the lowest and highest frequencies of the highest weighted bands.

A couple of conclusions could be possible.

  • Bird calls commonly occur inside these frequency bands.
  • Not bird calls such as noise or predators occur commonly insides these bands.

PCA

Principle component analysis (PCA) is another method used when analyzing variables. Using 240+ variables is possible when the dataset is large but is not possible when data is scarce. Reducing the number of dimensions can also decrease model training time and may provide insight into correlated variables.

Running PCA revealed that 90% of dataset variability could be contained in just 10 principal components. This also revealed a high correlation in the frequency percentile set. One surprising result was the correlation between some energy bands and some Spectral components, something we will have to discuss and investigate in the future.

Running the models on just principle components did reduce performance with K-Fold CV, bringing AUC to 81.00. This is expected however and could be improved by including more principle components in the model.

Posted in Uncategorized | Tagged , , , , ,

Aaron Spaulding Personal Weekly Update – November 29, 2020

Written By Aaron Spaulding

BirdVox-DCASE-20k Parameter File

This week I wrote the codes to build the parameter file for the BirdVox-DCASE-20k dataset. This parameter file includes each feature for each audio clip and will be used to train and test our initial models. Each feature was calculated after processing each audio clip with our bandpass filter and our pre-emphasis filter. On my laptop, each audio file took about a second of processing time and total runtime was a little more than 6 hours.

Variable Importance

Measuring variable importance can give valuable insight into how the model works and what features contribute the most to solving the bird-detection problem. Figure 1 shows the 18 highest weighted features of a GBM trained on 90% of the BirdVox-DCASE-20k dataset.

It is interesting that the model “likes” spectral features and does not heavily rely on frequency percentiles. (We expected otherwise.) It is possible the frequency percentile and frequency at the highest energy point features do not work well in extremely noisy environments and maybe steps will need to be taken to help reduce noise even further. It may be beneficial to use Mel-frequency spectral-based features since other spectral features have heavy weightings.

Detection Model

This week I also built the first detection model. This was run with 4-fold cross-validation to obtain predictions on the entire BirdVox-DCASE-20k dataset. Average NASH and RMSLE (0.21 and 0.31) both scored well across all four-folds and overall model performance was great for a baseline model with an AUC 79.37. (This model is untuned and has not been optimized.) This is a good baseline since similar models on similar data were able to achieve an AUC between 80 and 90 and none in “The first Bird Audio Detection challenge” [1] were able to get over 90 AUC.

Figure 2 shows a CDF of model predictions and is exactly what we would expect. Figure 3 shows the number of false positives and false negatives as functions of potential threshold values and represents decent, but not great, performance.

In order to more accurately visualize the model performance, I plotted the predicted class probabilities for each class. This clearly shows the separation of variables and clear separate peaks for the bird cases and nonbird cases

To see the model performance in greater detail I also plotted the normed predicted class probabilities of the predictions. A good model would have clear peaks and very little overlap in each region. While distinct peaks are visible the model does struggle on a number of bird calls and incorrectly identifies a large portion of the bird calls as having no bird call.

Next Steps

Next steps will include revisiting the beamforming algorithm to fix the errors in the FIR FD filter and to examine some of the cases that the model struggles on to help improve performance.

[1] Stowell, D., Stylianou, Y., Wood, M., et al.: Automatic acoustic detection of birds through deep learning: the first bird audio detection challenge. Methods Ecol. Evol. 10, 368–380 (2018)

Posted in Aaron Spaulding, Group Update, Personal Update, Uncategorized | Tagged , , , , , Posted on by