Summary

We can easily customise the summary statistics reported by $summary() and $print().

fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
fit$summary()
Warning: 130 of 4000 (3.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
   variable  mean median  sd mad     q5 q95 rhat ess_bulk ess_tail
1      lp__ -58.9  -59.2 5.0 5.1 -66.97 -50    1      224       84
2        mu   6.6    6.7 4.3 4.3  -0.55  14    1      394      115
3       tau   5.8    5.0 3.7 3.4   1.33  13    1      223       92
4  theta[1]   9.7    9.0 7.3 6.3  -0.99  23    1     1066     2034
5  theta[2]   7.0    6.8 5.9 5.6  -2.54  16    1      900     2321
6  theta[3]   5.5    5.8 6.9 6.1  -6.25  16    1      841     2201
7  theta[4]   6.8    6.9 6.2 6.0  -3.13  17    1      781     2193
8  theta[5]   4.6    5.0 6.0 5.8  -5.63  14    1      513      940
9  theta[6]   5.5    5.7 6.3 5.6  -5.61  15    1      782     1784
10 theta[7]   9.5    9.1 6.2 5.8   0.14  20    1      882     2164
11 theta[8]   7.1    7.0 7.3 6.3  -4.80  18    1      976     2151

By default all variables are summaries with the follow functions:

[1] "mean"      "median"    "sd"        "mad"       "quantile2"

To change the variables summarised, we use the variables argument

fit$summary(variables = c("mu", "tau"))
  variable mean median  sd mad    q5 q95 rhat ess_bulk ess_tail
1       mu  6.6    6.7 4.3 4.3 -0.55  14    1      394      115
2      tau  5.8    5.0 3.7 3.4  1.33  13    1      223       92

We can additionally change which functions are used

fit$summary(variables = c("mu", "tau"), mean, sd)
  variable mean  sd
1       mu  6.6 4.3
2      tau  5.8 3.7

To summarise all variables with non-default functions, it is necessary to set explicitly set the variables argument, either to NULL or the full vector of variable names.

fit$metadata()$model_params
fit$summary(variables = NULL, "mean", "median")
 [1] "lp__"     "mu"       "tau"      "theta[1]" "theta[2]" "theta[3]"
 [7] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
   variable  mean median
1      lp__ -58.9  -59.2
2        mu   6.6    6.7
3       tau   5.8    5.0
4  theta[1]   9.7    9.0
5  theta[2]   7.0    6.8
6  theta[3]   5.5    5.8
7  theta[4]   6.8    6.9
8  theta[5]   4.6    5.0
9  theta[6]   5.5    5.7
10 theta[7]   9.5    9.1
11 theta[8]   7.1    7.0

Summary functions can be specified by character string, function, or using a formula (or anything else supported by [rlang::as_function]). If these arguments are named, those names will be used in the tibble output. If the summary results are named they will take precedence.

my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
  c("mu", "tau"), 
  MEAN = mean, 
  "median",
  my_sd,
  ~quantile(.x, probs = c(0.1, 0.9)),
  Minimum = function(x) min(x)
)        
  variable MEAN median My_SD  10% 90% Minimum
1       mu  6.6    6.7   4.3 0.98  12   -11.7
2      tau  5.8    5.0   3.7 1.81  11     0.9

Arguments to all summary functions can also be specified with .args.

fit$summary(c("mu", "tau"), quantile, .args = list(probs = c(0.025, .05, .95, .975)))
  variable 2.5%    5% 95% 97.5%
1       mu -2.0 -0.55  14    15
2      tau  1.1  1.33  13    15

The summary functions are applied to the array of sample values, with dimension iter_samplingxchains.

fit$summary(variables = NULL, dim, colMeans)
   variable dim.1 dim.2     1     2     3     4
1      lp__  1000     4 -58.8 -58.4 -59.0 -59.4
2        mu  1000     4   6.8   6.7   6.6   6.1
3       tau  1000     4   5.7   5.6   5.7   6.1
4  theta[1]  1000     4   9.9   9.5   9.8   9.5
5  theta[2]  1000     4   7.4   7.2   7.0   6.3
6  theta[3]  1000     4   5.8   5.7   5.6   4.8
7  theta[4]  1000     4   6.9   6.7   7.0   6.7
8  theta[5]  1000     4   4.9   4.8   4.6   4.1
9  theta[6]  1000     4   5.7   5.8   5.6   4.8
10 theta[7]  1000     4   9.6   9.8   9.4   9.2
11 theta[8]  1000     4   7.0   7.3   7.0   7.0

For this reason users may have unexpected results if they use stats::var() directly, as it will return a covariance matrix. An alternative is the distributional::variance() function, which can also be accessed via posterior::variance().

fit$summary(c("mu", "tau"), posterior::variance, ~var(as.vector(.x)))
  variable posterior::variance ~var(as.vector(.x))
1       mu                  19                  19
2      tau                  14                  14

Summary functions need not be numeric, but these won’t work with $print().

strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
   variable Strictly Positive
1      lp__                no
2        mu                no
3       tau               yes
4  theta[1]                no
5  theta[2]                no
6  theta[3]                no
7  theta[4]                no
8  theta[5]                no
9  theta[6]                no
10 theta[7]                no
11 theta[8]                no

For more information, see posterior::summarise_draws(), which is called by $summary().