Android Question recognize a sound with the FFT library

Nkalampika

Active Member
Licensed User
Hello I would like to recognize a sound (beep) that is in a wav file and compare it with a direct sound! how to do it ?
 

stevel05

Expert
Licensed User
Identifying when a sound starts and ends is not too difficult, you would need to parse the incoming data and look for the sound start and end compared to the base silence level. Identifying what that sound is would be far more difficult. You would probably need to delve into the realms of Artificial Intelligence. I couldn't find a ready made java library that would do it.
 
Upvote 0

MarkusR

Well-Known Member
Licensed User
i think a beep is defined by some simple wave forms, fft should output the used frequencies and u define the freq. range.
for a compare i would put your wave or direct recorded into this fft and save the result, maybe in a sqlite database.
after that u can make a query that give u matching recordsets.

 
Upvote 0

canalrun

Well-Known Member
Licensed User
Hello I would like to recognize a sound (beep) that is in a wav file and compare it with a direct sound! how to do it ?

I have done that before – recognize a tone in real time using FFT's. I believe I may have used Klaus' FFT library or I may have used my own. Search this Forum for "Canalrun FFT", maybe I uploaded it, I don't remember.

I captured a short period, may be one half second, of raw, real-time microphone data, performed a 16 point FFT, computed the square root of sum magnitude squared, and checked the bin corresponding to the tone frequency that I was expecting for a power level above the computed background noise. Doing this requires a little signal processing knowledge, but it's not too bad.

I used the first FFT that detected the tone and counted the number of continuous FFT's that contained the tone to estimate the tone duration.

It can be done on a fairly recent device. I believe I was using hardware comparable to an LG G2 to do this.

Barry.
 
Last edited:
Upvote 0

techknight

Well-Known Member
Licensed User
I have done that before – recognize a tone in real time using FFT's. I believe I may have used Klaus' FFT library or I may have used my own. Search this Forum for "Canalrun FFT", maybe I uploaded it, I don't remember.

I captured a short period, may be one half second, of raw, real-time microphone data, performed a 16 point FFT, computed the square root of sum magnitude squared, and checked the bin corresponding to the tone frequency that I was expecting for a power level above the computed background noise. Doing this requires a little signal processing knowledge, but it's not too bad.

I used the first FFT that detected the tone and counted the number of continuous FFT's that contained the tone to estimate the tone duration.

It can be done on a fairly recent device. I believe I was using hardware comparable to an LG G2 to do this.

Barry.

I did a quick search, You didnt post anything of yours anywhere. Could you? I need something similar to detect a coach whistle which actually has 3 different frequencies an a beat frequency created by the 3 coupled together.
 
Upvote 0

canalrun

Well-Known Member
Licensed User
I did a quick search, You didnt post anything of yours anywhere. Could you? I need something similar to detect a coach whistle which actually has 3 different frequencies an a beat frequency created by the 3 coupled together.

I couldn't find it online in these forums either. I did this about four years ago, the software is on another computer, and unfortunately long gone.

Thinking about what I did:

I capture data from the microphone guided by one of Erels examples. I believe you are able to specify the number of samples you want and you will receive an event with a buffer containing those samples. I chose some power of two number of points – probably 1024. Once I have the buffer of points I add the buffer to a global list.

I had a timer firing at somewhere between one and 5 ms. In the timer routine I would check if the global list contained a buffer. If it had a buffer I would do multiple sliding, probably 128 point, FFT's on the data array. I would compute the square root of the real and imaginary sum of magnitude squares and look for the tones within the resulting bins.

The only tricky part is that the number of audio samples specifies a time constraint. If you're using 44K samples per second and 1024 points, the time constraint is about 1024/44,000 = about 20 ms. You need to complete the timer FFT computations within this 20 ms.

I did get this working, but it did take some testing and tweaking.

Barry.
 
Upvote 0

techknight

Well-Known Member
Licensed User
I couldn't find it online in these forums either. I did this about four years ago, the software is on another computer, and unfortunately long gone.

Thinking about what I did:

I capture data from the microphone guided by one of Erels examples. I believe you are able to specify the number of samples you want and you will receive an event with a buffer containing those samples. I chose some power of two number of points – probably 1024. Once I have the buffer of points I add the buffer to a global list.

I had a timer firing at somewhere between one and 5 ms. In the timer routine I would check if the global list contained a buffer. If it had a buffer I would do multiple sliding, probably 128 point, FFT's on the data array. I would compute the square root of the real and imaginary sum of magnitude squares and look for the tones within the resulting bins.

The only tricky part is that the number of audio samples specifies a time constraint. If you're using 44K samples per second and 1024 points, the time constraint is about 1024/44,000 = about 20 ms. You need to complete the timer FFT computations within this 20 ms.

I did get this working, but it did take some testing and tweaking.

Barry.

Whew. My brain is fried. That one went over my head like a fart in a fan factory.

I'm not a math guy so its one of those projects that just continue to sit on the backburner. I appreciate the details though.
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Whew. My brain is fried. That one went over my head like a fart in a fan factory.

I'm not a math guy so its one of those projects that just continue to sit on the backburner. I appreciate the details though.

You might also have a look at the OpenCV B4A library. It will do FFT's, magnitude squared, and maybe microphone input. I've never used the B4A version of this library, but have used OpenCV in projects.

Barry.
 
Upvote 0

klaus

Expert
Licensed User
I had a deeper look into your problem, the attached project is a demonstrator to your request.

To test beeps, the program has three beep mp3 files included.

w_500_4.mp3 is the reference beep it is composed by 4 frequencies (500, 1000, 1500, 2000 Hz)
w_520_4.mp3 a comparative beep it is composed by 4 frequencies (520, 1020, 1520, 2020 Hz)
w_530_4.mp3 a comparative beep it is composed by 4 frequencies (530, 1030, 1530, 2030 Hz)

Program flow:
1. Record of a sound signal (the beeps)
The program reads 8192 time samples (can be changed).
2. FFT calculation
3. Peak detection, there is a peak threshold which means that only peaks with a magnitude higher than the threshold are taken into account.
The threshold level, in the program, is 15% of the max peak level (can be changed).
4. After a click on Beep, the beep is compared to the reference beep by their number of frequency components and their frequencies.
If the number of frequency components is different, the beeps are considered being different.
If all frequencies of the different components are within a limit (25Hz in the program) the beeps are considered being the same.

upload_2018-9-5_19-44-18.png


Test:

1. Click on Sound
This records the reference beep.
The time signal is shown.

2. Click on FFT
Shows the FFT graph.
You see a horizontal red line, which is the peak detector threshold level.
On the right you see the detected peaks with their frequency.

3. Click on Beep
You see a red FFT graph for the generated beep.
A Toastmessage appears showing if the beep is considered being the same or not.

A click on REC records the mic input, like a spectrum analyser.

Some information about FFT.
You need to know the relationship between the sampling frequency, the number of time samples, the acquisition time and the frequency resolution.

The table below shows it:

upload_2018-9-5_19-44-45.png


In the first line we have 44100, which is the sampling frequency and I put it in the table only for comparison, it cannot be used for FFT calculations, the number of samples, for FFT, must be a power of 2.

I found that the number of 8192 time signal samples is a good compromise.

Acquisition time less than 200ms and a frequency resolution about 5 Hz.
 

Attachments

  • upload_2018-9-5_19-43-17.png
    upload_2018-9-5_19-43-17.png
    22.7 KB · Views: 235
  • BeepTest.zip
    147.2 KB · Views: 280
Upvote 0

jemajuca

Member
Licensed User
I had a deeper look into your problem, the attached project is a demonstrator to your request.

...

In the first line we have 44100, which is the sampling frequency and I put it in the table only for comparison, it cannot be used for FFT calculations, the number of samples, for FFT, must be a power of 2.

I found that the number of 8192 time signal samples is a good compromise.

Acquisition time less than 200ms and a frequency resolution about 5 Hz.


Hi klaus!
I need to detect a known frequency from a morse code using your example.
The morse code dot duration is 100ms and dash is 300ms and the frequency of the signal is 2KHz.
For this application it is a must to measure time the freq is being generated, or the start and ending of the pulse, or take enough measures to determine the pulse duration.
So I tested different combinations of sampling freq with signal samples, but only these three run:
11025/512
22050/1024
44100/2048
all three takes similar time, around 160ms, wich is excesive.
I think that I should select a lower signal sampling, i.e. 512 at 44100, as you commented on your post https://www.b4x.com/android/forum/threads/fft-fast-fourier-transform-library.6989/page-3#post-296146 but then the app does not run at all.
Only the three combinations listed works.
Any idea?
 
Upvote 0

canalrun

Well-Known Member
Licensed User
Klaus is the person to ask, but I'll chime in since I did a similar project.

I reduced the FFT size to 64 or 16. These should be somewhat faster.

I sampled the audio at 22050, mono, I think.

I acquired an audio buffer whose size was a power of two, about one half second long.

I then computed consecutive FFTs "sliding" the start of the FFT along the buffer of data samples. The number of consecutive FFT's where there was a signal detected gave me the time length of the signal.

I also did something similar on an Arduino type processor using B4R. For that I found an integer FFT on the web that was significantly faster. That took a lot of searching, however.

I did all this four or five years ago. Unfortunately, the software I developed is long gone.

Barry.
 
Upvote 0

klaus

Expert
Licensed User
I'm not sure that FFT is the best solution for this.
Couldn't you test the amplitudes of the time signal is above a given level checking if there is a signal or not.
Similar to what canalrun explined but without the FFT.
 
Upvote 0

jemajuca

Member
Licensed User
Hi.
Do you think there is any reason why I can not select i.e. 22050/512 or 44100/1024?
Klaus, yes I was thinking about that, measuring amplitudes only, but using FFT I can discrimine some noise sources.
 
Upvote 0

klaus

Expert
Licensed User
Do you think there is any reason why I can not select i.e. 22050/512 or 44100/1024?
What exactly is the problem?
There could be a problem when the calculation time is longer than the aquisition time.
 
Upvote 0
Top