Solver getting slow with a large number of parameters

Hi.

I am trying to understand what exactly it is that slows down the solver when the number of parameters gets large.
So far I have identified Casadi to create some massive global variables, but why would this slow down each single iteration of the solver assuming that these variables are only loaded once?

A code example is shown below where I create a parameter vector with 4 million parameters.
Try and run the script to create and run the solver and then try and run just the solver by running the last section only. Running an already created solver still takes quite some time?

clear all
import casadi.*

%% Longitudinal model
% States
x = SX.sym('x');
vel = SX.sym('vel');
accel = SX.sym('accel');
x_ = vertcat(x, vel, accel);
nx = length(x_);

% Controls
jerk = SX.sym('jerk');
u_ = vertcat(jerk);
nu = length(u_);

% Parameters
a = SX.sym('a');
b = SX.sym('b');
c = SX.sym('c', 4000000);
p_ = vertcat(a, b, c);

% Dynamics
f = vertcat(...
            vel,...   % dx
            accel,... % dvel
            jerk...   % daccel
);

%% Problem parameters
ts = 0.1;
N = 50;
T = N * ts; % time horizon length

moved_lower_constraint = true; % try and toggle this

% Initial condition
x0 = [10, 0, 0]';
u0 = 0;

%% Construct problem
ocp_model = acados_ocp_model();

ocp_model.set('name', 'point_mass');
ocp_model.set('T', N * ts); % time horizon length

% Symbolics
ocp_model.set('sym_x', x_);
ocp_model.set('sym_u', u_);
ocp_model.set('sym_p', p_);

% Dynamics
ocp_model.set('dyn_type', 'explicit');
ocp_model.set('dyn_expr_f', f);

% Constraints
% States
ocp_model.set('constr_Jbx', eye(nx));
if (moved_lower_constraint)
    ocp_model.set('constr_lbx', [0,  -0.1, -1]);
else
    ocp_model.set('constr_lbx', [0,  0, -1]);
end
ocp_model.set('constr_ubx', [100, 10,  1]);
% Input
ocp_model.set('constr_Jbu', eye(nu));
ocp_model.set('constr_lbu', [-1]);
ocp_model.set('constr_ubu', [1]);

% Cost
% Define cost on all states and input
Vx = [eye(nx); zeros(nu,nx)];
Vu = [zeros(nx,nu); eye(nu)];
Vx_e = eye(nx);
% Reference value for state and input
y_ref = [x0; u0];
y_ref_e = [x0];
% Diagonal cost
W   = diag([1, 0.01, 0.01, 0.1]);
W_e = diag([1, 0.01, 0.01]);
ocp_model.set('cost_type', 'linear_ls');
ocp_model.set('cost_type_e', 'linear_ls');
ocp_model.set('cost_Vx', Vx);
ocp_model.set('cost_Vu', Vu);
ocp_model.set('cost_Vx_e', Vx_e);
ocp_model.set('cost_y_ref', y_ref);
ocp_model.set('cost_y_ref_e', y_ref_e);
ocp_model.set('cost_W', W);
ocp_model.set('cost_W_e', W_e);

% Initial state
% Fix the initial state
ocp_model.set('constr_x0', x0);


%% Solver parameters
ocp_opts = acados_ocp_opts();
ocp_opts.set('compile_interface', 'auto');
ocp_opts.set('codgen_model', 'true');
ocp_opts.set('param_scheme_N', N);
ocp_opts.set('nlp_solver', 'sqp');
ocp_opts.set('qp_solver', 'partial_condensing_hpipm');
ocp_opts.set('qp_solver_cond_N', 5);
ocp_opts.set('sim_method', 'erk');

%% Construct solver and run
ocp = acados_ocp(ocp_model, ocp_opts);

%% Enable verbose output
ocp.set('print_level', 5);

% Set solver initial guess (if not, set internally using previous solution)
ocp.set('init_x', repmat(x0, [1,N+1]));
ocp.set('init_u', repmat(u0, [1,N]));

% Set reference
y_ref(1) = 20;
y_ref_e(1) = 20;
ocp.set('cost_y_ref', y_ref);
ocp.set('cost_y_ref_e', y_ref_e);
ocp.set('cost_y_ref', y_ref_e, N); 

% Run the solver
ocp.solve();
x_opt = ocp.get('x')
u_opt = ocp.get('u');

Best regards
Thomas Jespersen

Hi Thomas,

I think 4 million parameters is quite a big number.
I guess the main overhead here is that we copy the parameter values into the function at every call.
This is done here:

So, this overhead should scale linearly in the number of solver iterations, the number of shooting nodes and the number of parameters.

I guess one could add some flag to the external function memory that indicates if the parameter values changed and have to be copied again.

Best,
Jonathan

For now I solved it by only storing a memory pointer in the parameter vector together with mex functions for allocating and freeing the memory portion. This works as intended and without the big memory copy overhead.