Hi

I am using python to generate C code for an MPC controller and then use that C code in a C++ environment. Everything works as expected with my MPC and on the desktop computer, I get less than 1ms solve times, which is awesome. However, when I run the same code in an NVIDIA Jetson TX2 platform, I get solve times that are 5x-8x times larger. Is this normal? Is there anything to be done to improve the solve times in arm-based platforms?

Right now I compile acados in the Jetson and I generate the code in the Jetson (kudos to this topic, was necessary: Problems with t_renderer).

These are the options I use:

```
ocp.solver_options.qp_solver = "PARTIAL_CONDENSING_HPIPM" # "PARTIAL_CONDENSING_HPIPM", "FULL_CONDENSING_HPIPM"
ocp.solver_options.nlp_solver_type = "SQP_RTI" # "SQP", "SQP_RTI"
ocp.solver_options.hessian_approx = "GAUSS_NEWTON" # "GAUSS_NEWTON", "EXACT"
ocp.solver_options.integrator_type = "ERK" # "ERK", "IRK", "GNSF"
```

Maybe any tips to get it to run faster?

Thanks a lot!

Best,

Angel.