I've read that there's an extremely slight advantage in speed to C over C++. However, that slight advantage is probably outweighed by the inconvenience of not having classes.
Have you tried parallelizing your code in B4A before trying to port it to C/C++? If you're running single-threaded right now, you could cut your execution time down by a factor equal to the number of processors on your device, provided the complexity of your calculations is high enough.
Also, if your calculations are what are called "SIMD" (single instruction, multiple data), you can get massive speed improvements by running them on the GPU with RenderScript. For example, if you are performing essentially the same transformations on 32768 geometrical points and you can accept single-precision floating point results, you can perform those transformations on all your points simultaneously by running them on the GPU. On the desktop, I've seen performance improvements of 50-100x going from single-threaded CPU to the GPU (via OpenCL). I haven't done any GPU computing on an Android device however I'd imagine the GPU/CPU performance ratios are probably similar. And the effective ratio is probably a good deal better since Dalvik/ART are so much slower than Oracle's HotSpot JVM.