Solver getting slow with a large number of parameters

Hi.

I am trying to understand what exactly it is that slows down the solver when the number of parameters gets large.
So far I have identified Casadi to create some massive global variables, but why would this slow down each single iteration of the solver assuming that these variables are only loaded once?

A code example is shown below where I create a parameter vector with 4 million parameters.
Try and run the script to create and run the solver and then try and run just the solver by running the last section only. Running an already created solver still takes quite some time?

clear all
import casadi.*

%% Longitudinal model
% States
x = SX.sym('x');
vel = SX.sym('vel');
accel = SX.sym('accel');
x_ = vertcat(x, vel, accel);
nx = length(x_);

% Controls
jerk = SX.sym('jerk');
u_ = vertcat(jerk);
nu = length(u_);

% Parameters
a = SX.sym('a');
b = SX.sym('b');
c = SX.sym('c', 4000000);
p_ = vertcat(a, b, c);

% Dynamics
f = vertcat(...
            vel,...   % dx
            accel,... % dvel
            jerk...   % daccel
);

%% Problem parameters
ts = 0.1;
N = 50;
T = N * ts; % time horizon length

moved_lower_constraint = true; % try and toggle this

% Initial condition
x0 = [10, 0, 0]';
u0 = 0;

%% Construct problem
ocp_model = acados_ocp_model();

ocp_model.set('name', 'point_mass');
ocp_model.set('T', N * ts); % time horizon length

% Symbolics
ocp_model.set('sym_x', x_);
ocp_model.set('sym_u', u_);
ocp_model.set('sym_p', p_);

% Dynamics
ocp_model.set('dyn_type', 'explicit');
ocp_model.set('dyn_expr_f', f);

% Constraints
% States
ocp_model.set('constr_Jbx', eye(nx));
if (moved_lower_constraint)
    ocp_model.set('constr_lbx', [0,  -0.1, -1]);
else
    ocp_model.set('constr_lbx', [0,  0, -1]);
end
ocp_model.set('constr_ubx', [100, 10,  1]);
% Input
ocp_model.set('constr_Jbu', eye(nu));
ocp_model.set('constr_lbu', [-1]);
ocp_model.set('constr_ubu', [1]);

% Cost
% Define cost on all states and input
Vx = [eye(nx); zeros(nu,nx)];
Vu = [zeros(nx,nu); eye(nu)];
Vx_e = eye(nx);
% Reference value for state and input
y_ref = [x0; u0];
y_ref_e = [x0];
% Diagonal cost
W   = diag([1, 0.01, 0.01, 0.1]);
W_e = diag([1, 0.01, 0.01]);
ocp_model.set('cost_type', 'linear_ls');
ocp_model.set('cost_type_e', 'linear_ls');
ocp_model.set('cost_Vx', Vx);
ocp_model.set('cost_Vu', Vu);
ocp_model.set('cost_Vx_e', Vx_e);
ocp_model.set('cost_y_ref', y_ref);
ocp_model.set('cost_y_ref_e', y_ref_e);
ocp_model.set('cost_W', W);
ocp_model.set('cost_W_e', W_e);

% Initial state
% Fix the initial state
ocp_model.set('constr_x0', x0);


%% Solver parameters
ocp_opts = acados_ocp_opts();
ocp_opts.set('compile_interface', 'auto');
ocp_opts.set('codgen_model', 'true');
ocp_opts.set('param_scheme_N', N);
ocp_opts.set('nlp_solver', 'sqp');
ocp_opts.set('qp_solver', 'partial_condensing_hpipm');
ocp_opts.set('qp_solver_cond_N', 5);
ocp_opts.set('sim_method', 'erk');

%% Construct solver and run
ocp = acados_ocp(ocp_model, ocp_opts);

%% Enable verbose output
ocp.set('print_level', 5);

% Set solver initial guess (if not, set internally using previous solution)
ocp.set('init_x', repmat(x0, [1,N+1]));
ocp.set('init_u', repmat(u0, [1,N]));

% Set reference
y_ref(1) = 20;
y_ref_e(1) = 20;
ocp.set('cost_y_ref', y_ref);
ocp.set('cost_y_ref_e', y_ref_e);
ocp.set('cost_y_ref', y_ref_e, N); 

% Run the solver
ocp.solve();
x_opt = ocp.get('x')
u_opt = ocp.get('u');

Best regards
Thomas Jespersen

Hi Thomas,

I think 4 million parameters is quite a big number.
I guess the main overhead here is that we copy the parameter values into the function at every call.
This is done here:

So, this overhead should scale linearly in the number of solver iterations, the number of shooting nodes and the number of parameters.

I guess one could add some flag to the external function memory that indicates if the parameter values changed and have to be copied again.

Best,
Jonathan

For now I solved it by only storing a memory pointer in the parameter vector together with mex functions for allocating and freeing the memory portion. This works as intended and without the big memory copy overhead.

This seems quite Matlab specific, but if you have a commit with this, I’d be interested in having a look.

Greetings!

This PR https://github.com/acados/acados/pull/681
avoids the copying for parameter values at each call by directly setting them in
fun->args (for external_function_param_casadi).

Hi all,

I want to understand a bit better what is the practical issue here.
Surely copying the parameter vector as here (which may be slightly optimized with some temporary pointers, in case the compiler is dumb enough to perform all struct accesses and array indexing at each loop iteration)


is slow when the vector size is 4 millions (which BTW is a massive 32 MB of memory, more than would fit in cache in most machines).
But, in practice, merely copying the parameter vector is surely cheaper than anything else you may want to do with that parameters, with the only exception of not using the parameters at all.

So, in practice, I don’t see a case where the parameter vector copy would become the bottleneck, except the case where you have millions of parameter which you simply don’t use.
Do you see other cases where this would indeed get the bottleneck in practice?

Best regards,

Gianluca

That is probably right.
I was working with Rudolf, who uses parametric splines to globally describe boundaries of a race track in his formulation.
Here you get a lot of parameter values very fast, but probably each function doesn’t use most of them, since the car is in a specific region of the track and each shooting node in an even smaller region.
I guess it would be more efficient to sense to switch to a more local formulation.

Moreover, from the Python & Matlab interfaces each problem function is built to have the same number of parameters.
Thus these spline parameters are also in the integrator, although they are only used in the constraints.
Avoiding the copying resulted in a speedup of more than 10% for his problem.

@FreyJo Thanks for the PR. I haven’t tried to test it and see if it yields the same speedup as the pointer-based method. But I am exactly doing something similar to global parametric splines where only a subset of the parameters is needed in each iteration, but the subset depends on the current state (i.e. some of the optimization variables).