Working example for STM32H7 (Cortex-M7, 32-bit)

Hi all.

I have finally successfully managed to get ACADOS and the main_pendulum_ode example up and running on the STM32H743, a 32-bit ARM Cortex-M7 device running at 480 MHz and with double-precision floating point accelerator (FPU) :tada:

The example and I believe any useful problems with ACADOS will be quite memory expensive. This particular example consumed 468 KB of RAM which is why I had to utilize the D1 memory bank of the STM32H7 which is still within the MCU but it lives outside the CPU. This causes a slight memory bottleneck but luckily the STM32H7 implements a D-Cache for speeding it up.

  • Without any code optimizations (-O0) and no cache the solve time is 1.27 seconds.
  • With code optimizations (-O3) and no cache the solve time is 463 milliseconds.
  • With code optimizations (-O3) and with D-Cache enabled the solve time is 182 milliseconds.

In my opinion 182 ms on an embedded platform compared to the 19.20 ms on my x64 machine is not too bad.

And yes, I did verify the results against the same problem and solver compiled on my x64 machine - and they were identical :partying_face:

It took quite some effort and tweaks and it also included some modifications to ACADOS to help with the 32-bit alignment issue (yes, I will opening a PR as well).
I will be uploading the project to my GitHub soon, so keep posted :smiley:

Best regards
Thomas Jespersen

6 Likes

PR with the necessary memory alignment fix can be found here: Add missing pointer alignments necessary on 32-bit architectures by mindThomas · Pull Request #704 · acados/acados · GitHub

Congrats! I was trying to do something similar last year. Did you have to use non standard building options? I will try on the Teensy 4.0 ( IMXRT1062DVL6A) which should perform better (600Mhz 512kb tightly coupled RAM)

I have now uploaded both an STM32CubeIDE version and a CMake-based version of the project

I apologize for the CMake not being as polished as I want it to be, but I wanted to get it out to you as quick as possible.

@tmmsartor I did have to specify the build options to match the architecture (mostly covered by setting TARGET=GENERIC). The build options can be seen here: acados-STM32/acados.cmake at main · mindThomas/acados-STM32 · GitHub
But more importantly I had to disable Position Independent code and compile all libraries as STATIC. If I didn’t do the latter the assigned memory addresses in callbacks would be incorrect causing Hard-Faults. This is done here: Disable position independent code · mindThomas/acados@3732ff4 · GitHub
Finally I also had to fix a 32-bit memory alignment issue with regards to pointers (see my PR mentioned above) with the fix applied to my own ACADOS branch (which is already automatically included in the example project): C: pointer alignment check for 4 / 8 byte alignment depending on size… · mindThomas/acados@fe3e5ea · GitHub

Just a slight correction to the original post. When I performed these tests the processor was only running at 400 MHz, not 480 MHz.