Hi,
I am currently running a simulation using an SQP_RTI solver generated by acados. I am deploying this solver to a dSPACE MicroAutoBox 3 (MABX3) using Simulink Coder (dSPACE RTI).
I am observing a significant execution time discrepancy between the Simulink simulation (Host PC) and the compiled code running on the target hardware (MABX3).
System Setup:
- Solver: SQP_RTI
- QP Solver: PARTIAL_CONDENSING_HPIPM
- Prediction Horizon: 100 steps
- Target Hardware: dSPACE MicroAutoBox 3
- Interface: Simulink (S-Function) → dSPACE RTI Build
The Issue:
- In Simulink (Host PC): The solver execution time is approx. 5 ms.
- On MicroAutoBox 3 (Target): The solver execution time increases to 35 ms.
Context: In my previous experience with other optimization setups (e.g., fmincon-based generated code), the performance gap between the Host PC and the MABX3 was usually negligible. However, with acados, I am seeing a 7x slowdown.
Questions:
- Is this performance drop expected due to the difference in CPU architecture/clock speed between a standard PC and the MABX3?
- Could this be related to compiler optimization flags (e.g.,
-O2vs-O3) during the cross-compilation for dSPACE? I suspect the library or the generated code might not be fully optimized for the target. - Given that I am using N=100, are there specific memory or cache considerations on embedded targets that could cause this drastic slowdown compared to a PC?
Any insights on how to debug this latency or specific flags to check for dSPACE deployment would be greatly appreciated.
Thanks!