Summary error with cmdstan [number of columns in sample does not match chains]

Jochen · June 29, 2021, 8:27am

Hi everyone,

I’ve been running a model using cmdstan with 2 chains and 2000 iterations each. When running the summary comand (I’m running it ex post, once both chains are done):

cd $HOME/cmdstan-2.26.0/
./bin/stansummary ./jobfiles/model1/output_*.csv

I get the following error message:

Error during processing. add(stan_csv): number of columns in sample does not match chains
terminate called after throwing an instance of 'std::invalid_argument'
  what():  add(stan_csv): number of columns in sample does not match chains

I’ve checked the two .csv output files and they have exactly the same structure (number of columns, rows etc.).

Any ideas what’s going on?

rok_cesnovar · June 29, 2021, 8:37am

Hi,

does running the summary on one of the CSVs work?

./bin/stansummary ./jobfiles/model1/output_1.csv
./bin/stansummary ./jobfiles/model1/output_2.csv

And if both of these work, does this work?

./bin/stansummary ./jobfiles/model1/output_1.csv ./jobfiles/model1/output_2.csv

Jochen · June 29, 2021, 9:45am

Hi Rok,

Thanks for the quick response.

Running the summary of each CSV separately works!
Says that Error during processing. add(stan_csv): number of columns in sample does not match chains

rok_cesnovar · June 29, 2021, 9:48am

What are the ids in the CSV files? Search for a line “id =. …”. Are they the same by any chance?

Jochen · June 29, 2021, 10:00am

The ids are similar:
# id = 0 (Default)

Can I now fix this ex post setting a different id in the .csv output files? How should I fix this when starting the estimation? I was wondering because other estimations with exactly the same structure didn’t throw that error message.

Thanks.

rok_cesnovar · June 29, 2021, 10:11am

You can go and change the id in the CSV file and you will be able to read in the CSV files with stansummary. Before you do that, make sure the seeds are different though (line starting with "seed = "). If those are also the same then the results should also be the same.

If the seeds are different, you are most likely fine (that is, the pseudorandom generated numbers are not correlated), though I would suggest rerunning the experiments if this isnt a really long experiment. Run the chains with the same seed and different ids. That is the recommended way of using multiple chains. The actual seed used in sampling is the supplied seed with a stride based on the id values.

Example

./bernoulli sample data file=bernoulli.data.json output file=output_1.csv id = 1 random seed = 123 &
./bernoulli sample data file=bernoulli.data.json output file=output_2.csv id = 2 random seed = 123 &

Thanks for reporting this. The error message should be improved in this case as should the documentation example in 4 MCMC Sampling | CmdStan User’s Guide

Jochen · June 29, 2021, 11:37am

Thank you. I will add the ids in the sampling call in the future. I ran this model for ~10days, so it wouId be quite helpful to get access to the summary now. I followed your recommendation and changed the id’s to 1 and 2 using Excel and now stansummary throws the following error message:

Warning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2656 instead for row 3
Warning: non-fatal error reading samples
Error during processing. No sampling draws found in Stan CSV file: ./jobfiles/model1/output_1.csv.
Processing csv files: ./jobfiles/model1/output_1.csvWarning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2656 instead for row 3
Warning: non-fatal error reading samples
, ./jobfiles/model1/output_2.csvWarning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2655 instead for row 3
Warning: non-fatal error reading samples
terminate called after throwing an instance of 'std::invalid_argument'
  what():  add(stan_csv): number of columns in sample does not match chains
/var/log/slurm/spool_slurmd//job15759076/slurm_script: line 18:  4421 Aborted                 (core dumped) ./bin/diagnose ./jobfiles/model1/output_*.csv

rok_cesnovar · June 29, 2021, 11:43am

Hm, do you still have the original files? You may have broken the structure saving it in excel :/
Are you able to read one CSV at a time?

Jochen · June 29, 2021, 12:00pm

I have backuped the output files, no worries :) I’ve tried changing the ids using Notepad++ instead of Excel, now it again reports that

Error during processing. add(stan_csv): number of columns in sample does not match chains

There’s nothing suspicious about the .csv files, and I ran jobs with a similar structure over and over again, and it worked:

It does not look like two jobs have been writing to the same output file or so…

Jochen · June 29, 2021, 12:04pm

I have spotted the error. The two jobs have been using the same model, but different datasets. Sorry for taking your time, the id thing is still useful, thank you! :)

rok_cesnovar · June 29, 2021, 12:07pm

Haha, happens to everyone. No worries, glad you figured it out.

Topic		Replies	Views
CmdStan output file import error [Supplied CSV files were not generated with the same model!] CmdStan	2	627	May 17, 2022
CmdStanR reports error "All variables in all chains must have the same length" after apparently successful sampling CmdStan	2	848	August 29, 2022
CmdstanR - Error in validate_sample_args Other cmdstanr	2	1113	July 23, 2020
Error: Supplied CSV file is corrupt! CmdStan	12	839	January 19, 2023
Rstan::read_stan_csv throwing error with cmdstan models (versions 2.35) General rstan , cmdstanr	8	450	November 10, 2024

Summary error with cmdstan [number of columns in sample does not match chains]

Related topics