Summary error with cmdstan [number of columns in sample does not match chains]

Hi everyone,

I’ve been running a model using cmdstan with 2 chains and 2000 iterations each. When running the summary comand (I’m running it ex post, once both chains are done):

cd $HOME/cmdstan-2.26.0/
./bin/stansummary ./jobfiles/model1/output_*.csv

I get the following error message:

Error during processing. add(stan_csv): number of columns in sample does not match chains
terminate called after throwing an instance of 'std::invalid_argument'
  what():  add(stan_csv): number of columns in sample does not match chains

I’ve checked the two .csv output files and they have exactly the same structure (number of columns, rows etc.).

Any ideas what’s going on?

Hi,

does running the summary on one of the CSVs work?

./bin/stansummary ./jobfiles/model1/output_1.csv
./bin/stansummary ./jobfiles/model1/output_2.csv

And if both of these work, does this work?

./bin/stansummary ./jobfiles/model1/output_1.csv ./jobfiles/model1/output_2.csv

Hi Rok,

Thanks for the quick response.

  1. Running the summary of each CSV separately works!

  2. Says that Error during processing. add(stan_csv): number of columns in sample does not match chains

What are the ids in the CSV files? Search for a line “id =. …”. Are they the same by any chance?

The ids are similar:
# id = 0 (Default)

Can I now fix this ex post setting a different id in the .csv output files? How should I fix this when starting the estimation? I was wondering because other estimations with exactly the same structure didn’t throw that error message.

Thanks.

You can go and change the id in the CSV file and you will be able to read in the CSV files with stansummary. Before you do that, make sure the seeds are different though (line starting with "seed = "). If those are also the same then the results should also be the same.

If the seeds are different, you are most likely fine (that is, the pseudorandom generated numbers are not correlated), though I would suggest rerunning the experiments if this isnt a really long experiment. Run the chains with the same seed and different ids. That is the recommended way of using multiple chains. The actual seed used in sampling is the supplied seed with a stride based on the id values.

Example

./bernoulli sample data file=bernoulli.data.json output file=output_1.csv id = 1 random seed = 123 &
./bernoulli sample data file=bernoulli.data.json output file=output_2.csv id = 2 random seed = 123 &

Thanks for reporting this. The error message should be improved in this case as should the documentation example in 4 MCMC Sampling | CmdStan User’s Guide

1 Like

Thank you. I will add the ids in the sampling call in the future. I ran this model for ~10days, so it wouId be quite helpful to get access to the summary now. I followed your recommendation and changed the id’s to 1 and 2 using Excel and now stansummary throws the following error message:

Warning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2656 instead for row 3
Warning: non-fatal error reading samples
Error during processing. No sampling draws found in Stan CSV file: ./jobfiles/model1/output_1.csv.
Processing csv files: ./jobfiles/model1/output_1.csvWarning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2656 instead for row 3
Warning: non-fatal error reading samples
, ./jobfiles/model1/output_2.csvWarning: non-fatal error reading adaptation data
Error: expected 2731 columns, but found 2655 instead for row 3
Warning: non-fatal error reading samples
terminate called after throwing an instance of 'std::invalid_argument'
  what():  add(stan_csv): number of columns in sample does not match chains
/var/log/slurm/spool_slurmd//job15759076/slurm_script: line 18:  4421 Aborted                 (core dumped) ./bin/diagnose ./jobfiles/model1/output_*.csv

Hm, do you still have the original files? You may have broken the structure saving it in excel :/
Are you able to read one CSV at a time?

I have backuped the output files, no worries :) I’ve tried changing the ids using Notepad++ instead of Excel, now it again reports that

Error during processing. add(stan_csv): number of columns in sample does not match chains

There’s nothing suspicious about the .csv files, and I ran jobs with a similar structure over and over again, and it worked:

It does not look like two jobs have been writing to the same output file or so…

I have spotted the error. The two jobs have been using the same model, but different datasets. Sorry for taking your time, the id thing is still useful, thank you! :)

Haha, happens to everyone. No worries, glad you figured it out.