Static acados deployment on Zynq-7000 (Cortex-A9, no NEON) – minimal build and BLASFEO configuration

I am trying to deploy acados on a Zynq-7000 platform (Cortex-A9) used in the Inteco UnTrans (two-wheeled unstable transporter). The system runs PetaLinux on a single Cortex-A9 core. According to the hardware vendor, NEON is not used. The Linux side occupies around 750 MB RAM and about 250 MB RAM is available on the FPGA side. The controller is intended to run in real-time and the final output must be a standalone ELF binary.

My workflow is the following: I generate an NMPC solver using acados from MATLAB/Simulink. The solver is integrated as an S-function inside a Simulink model. Code is generated via Simulink Coder, and the acados-generated C files (from c_generated_code) are manually added to a custom .tmf makefile template used by the vendor’s ARM cross-compilation toolchain (arm-xilinx-eabi-gcc). The final goal is to build a single .elf binary for the Cortex-A9 target.

I am not using shared libraries – everything is compiled and linked statically into the final ELF. During compilation, I include acados core sources, HPIPM, BLASFEO and the generated solver sources. However, I am running into multiple compilation/linking issues (undefined references and inconsistencies related to BLASFEO/HPIPM symbols). It is not entirely clear whether the problem comes from an incomplete set of source files, inconsistent compile-time flags, or an incorrect BLASFEO target configuration.

My main questions are:

  1. What is the minimal required subset of acados components for a standalone embedded NMPC deployment?
    As far as I understand, for a typical OCP NLP solver with HPIPM backend, I should only need:
  • acados core (ocp_nlp, sim, utils, ocp_qp interface),

  • HPIPM (ocp_qp and/or dense_qp parts actually used),

  • BLASFEO,

  • generated solver sources.
    Is that correct? Are any other components strictly required at link time?

  1. For Cortex-A9 (ARMv7-A) without NEON, what is the correct BLASFEO target configuration?
    Should I explicitly set something like TARGET=ARMV7A or force a reference implementation?
    Is it recommended to compile BLASFEO in REFERENCE mode (e.g. LA_REFERENCE) for maximum portability?
    Are there specific compiler flags required to avoid accidental NEON usage?

  2. Is there an officially recommended way to build acados for bare-metal / embedded ARM targets without the full CMake install workflow (i.e. by directly compiling source files into a custom toolchain build system)?
    If yes, which directories/files are mandatory and which can be safely excluded (e.g. examples, MATLAB interface, other QP solvers, etc.)?

  3. Has anyone successfully deployed acados on Zynq-7000 (Cortex-A9) without NEON?
    If so, could you share the relevant build configuration (BLASFEO target, compiler flags, static vs shared, etc.)?

The key constraints in my case are:

  • static linking only,

  • minimal binary size,

  • no NEON,

  • cross-compilation with arm-xilinx-eabi-gcc,

  • integration inside a Simulink-generated project (custom .tmf-based build system).

Any guidance on the minimal required source set and correct BLASFEO/HPIPM configuration for ARMv7-A without NEON would be highly appreciated.

Hi,

that sounds interesting, and i am not sure if that was done yet.

  1. The minimal required subset of acados components, strongly depends on your problem.
    In addition to what you listed, the interfaces/acados_c folder is needed.
    Otherwise, nothing else should be needed. And you could also remove redundant components from HPIPM and BLASFEO (e.g. all single routines).

  2. TARGET=ARMV7A should be the best for performance in general. However, for very small problems REFERENCE can be even faster, and I would suggest trying that first just to rule out any issues.

  3. Unfortunately, such a workflow is not established yet.
    But I think just using the components mentioned in point 1) should work.

Best,
Jonathan

Hi,

thank you for the clarification.

Regarding the BLASFEO target: unfortunately TARGET=ARMV7A does not pass with my toolchain. The cross-compiler (arm-xilinx-eabi-gcc provided with the Zynq toolchain) reports “unknown architecture” when ARMV7A is selected. Because of that, I currently have to use the REFERENCE implementation together with TARGET_GENERIC. So at the moment I am compiling BLASFEO in reference mode only.

My problem size is relatively small: 6 states and 2 control inputs. I am using SQP_RTI (not full SQP). So there is only one QP per sampling step and no full nonlinear convergence loop. Given this size and RTI scheme, the computational load for a Cortex-A9 should be very small, so performance is not my main concern right now – correctness and a clean minimal build are.

The difficulty I am facing is determining exactly which source files are strictly required for compilation and linking when building everything statically. Since I am not using CMake but integrating acados manually into a custom .tmf-based build system (Simulink S-function + cross-compilation), I need to explicitly list all source files. It is not entirely clear to me how to systematically determine the minimal required subset.

From your previous answer I understand that I need:

  • acados core (ocp_nlp, sim, utils, ocp_qp, dense_qp),
  • interfaces/acados_c,
  • HPIPM (only the parts actually used),
  • BLASFEO (double precision only, no single routines),
  • generated solver sources.

However, I am not sure how to verify whether I am missing any required components other than by trial-and-error (i.e. waiting for undefined reference errors at link time). Is there a recommended way to determine the exact dependency set for a given solver configuration (e.g. SQP_RTI with HPIPM backend)?

To summarize:

  • problem size: nx = 6, nu = 2
  • solver: SQP_RTI
  • static linking only
  • Cortex-A9 (Zynq-7000), no NEON
  • BLASFEO: currently REFERENCE + TARGET_GENERIC
  • custom cross-compilation workflow (no standard CMake install)

If you have any guidance on how to systematically identify the minimal required source set for SQP_RTI + HPIPM, that would be very helpful.

You could systematically determine which functions are used by running callgrind on a minimal main with your controller and then include any files which contain one of the functions used.

Sorry for the late reply, but I had a lot of work responsibilities and couldn’t work on this project every day. I just wanted to let you know that I finally managed to successfully compile the solver and link everything into a single .elf binary. I will be testing it on the actual hardware a bit later.

To resolve the linking issues in my custom makefile, I had to do a few things:

  1. Heavily filter the BLASFEO and HPIPM source files – I excluded all hardware and vector-optimized variants (e.g., hp_cm, lib4, lib8, and the blasfeo_api folder), keeping only the pure C REFERENCE implementations.
  2. Manually include the BLASFEO auxiliary files (like blasfeo_stdlib.c and blasfeo_processor_features.c), which were missing for proper memory allocation.

Before running it on the Zynq, I will probably also need to adjust the memory available for malloc in the linker script (heap_size and stack_size) and tweak the sampling time so the processor can keep up with the calculations.

1 Like