Solver runs slower in NVIDIA Jetson TX2 platform?

Hi :wave:

I am using python to generate C code for an MPC controller and then use that C code in a C++ environment. Everything works as expected with my MPC and on the desktop computer, I get less than 1ms solve times, which is awesome. However, when I run the same code in an NVIDIA Jetson TX2 platform, I get solve times that are 5x-8x times larger. Is this normal? Is there anything to be done to improve the solve times in arm-based platforms?

Right now I compile acados in the Jetson and I generate the code in the Jetson (kudos to this topic, was necessary: Problems with t_renderer).

These are the options I use:

    ocp.solver_options.qp_solver = "PARTIAL_CONDENSING_HPIPM"  # "PARTIAL_CONDENSING_HPIPM", "FULL_CONDENSING_HPIPM"
    ocp.solver_options.nlp_solver_type = "SQP_RTI"     # "SQP", "SQP_RTI"
    ocp.solver_options.hessian_approx = "GAUSS_NEWTON"  # "GAUSS_NEWTON", "EXACT"
    ocp.solver_options.integrator_type = "ERK"   # "ERK", "IRK", "GNSF"

Maybe any tips to get it to run faster?

Thanks a lot!

Best,

Angel.

Hi,

I guess it is to be expected that the solver is slower on the platform.

Maybe any tips to get it to run faster?

Did you compile BLASFEO with the dedicated target?
I guess, you should go for ARMV8A_ARM_CORTEX_A57.

Cheers,
Jonathan

Hi,

Thanks a lot for this suggestion, I get an improvement of around 15-20% in solve times with this :slight_smile:

Another thing I’ll try is to pose my problem as a NONLINEAR_LS (would need to do this trick then: Gradient term in Linear LS cost function - #6 by jdeschut) instead of EXTERNAL cost module and see if there is any performance improvement.

Thanks a lot!

Hi,

Did you get any performance improvement after implementing
the trick?

Yes, I got another ~20% performance improvement!