For a traditional waveform you don't need an FFT ... sum the channels (divided by the number of channels) to give you a mono view if required, then average (or peak, your choice) the maximum and minimum values every 'n' samples. Higher values of 'n' will give you a more compressed waveform.
Your source data must be an array of doubles TimeReal in the example below.
Example:
Number of samples 1024
B4X:
Dim NN = 1024 As Int
Dim NN2 = NN / 2 As Int
Dim TimeReal(NN) As Double
Dim FFTReal(NN2) As Double
Dim FFTImag(NN2) As Double
Dim FFTAmpl(NN2) As Double
Dim FFTPhas(NN2) As Double
FFT.Transform2(TimeReal, FFTReal, FFTImag)
FFTAmpl = FFT.ToAmplitude(FFTReal, FFTImag)
FFTPhas = FFT.ToPhase(FFTReal, FFTImag)