We can easily customise the summary statistics reported by
$summary()
and $print()
.
fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
fit$summary()
Warning: 130 of 4000 (3.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
1 lp__ -58.9 -59.2 5.0 5.1 -66.97 -50 1 224 84
2 mu 6.6 6.7 4.3 4.3 -0.55 14 1 394 115
3 tau 5.8 5.0 3.7 3.4 1.33 13 1 223 92
4 theta[1] 9.7 9.0 7.3 6.3 -0.99 23 1 1066 2034
5 theta[2] 7.0 6.8 5.9 5.6 -2.54 16 1 900 2321
6 theta[3] 5.5 5.8 6.9 6.1 -6.25 16 1 841 2201
7 theta[4] 6.8 6.9 6.2 6.0 -3.13 17 1 781 2193
8 theta[5] 4.6 5.0 6.0 5.8 -5.63 14 1 513 940
9 theta[6] 5.5 5.7 6.3 5.6 -5.61 15 1 782 1784
10 theta[7] 9.5 9.1 6.2 5.8 0.14 20 1 882 2164
11 theta[8] 7.1 7.0 7.3 6.3 -4.80 18 1 976 2151
By default all variables are summaries with the follow functions:
posterior::default_summary_measures()
[1] "mean" "median" "sd" "mad" "quantile2"
To change the variables summarised, we use the variables argument
fit$summary(variables = c("mu", "tau"))
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
1 mu 6.6 6.7 4.3 4.3 -0.55 14 1 394 115
2 tau 5.8 5.0 3.7 3.4 1.33 13 1 223 92
We can additionally change which functions are used
fit$summary(variables = c("mu", "tau"), mean, sd)
variable mean sd
1 mu 6.6 4.3
2 tau 5.8 3.7
To summarise all variables with non-default functions, it is
necessary to set explicitly set the variables argument, either to
NULL
or the full vector of variable names.
fit$metadata()$model_params
fit$summary(variables = NULL, "mean", "median")
[1] "lp__" "mu" "tau" "theta[1]" "theta[2]" "theta[3]"
[7] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
variable mean median
1 lp__ -58.9 -59.2
2 mu 6.6 6.7
3 tau 5.8 5.0
4 theta[1] 9.7 9.0
5 theta[2] 7.0 6.8
6 theta[3] 5.5 5.8
7 theta[4] 6.8 6.9
8 theta[5] 4.6 5.0
9 theta[6] 5.5 5.7
10 theta[7] 9.5 9.1
11 theta[8] 7.1 7.0
Summary functions can be specified by character string, function, or using a formula (or anything else supported by [rlang::as_function]). If these arguments are named, those names will be used in the tibble output. If the summary results are named they will take precedence.
my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
c("mu", "tau"),
MEAN = mean,
"median",
my_sd,
~quantile(.x, probs = c(0.1, 0.9)),
Minimum = function(x) min(x)
)
variable MEAN median My_SD 10% 90% Minimum
1 mu 6.6 6.7 4.3 0.98 12 -11.7
2 tau 5.8 5.0 3.7 1.81 11 0.9
Arguments to all summary functions can also be specified with
.args
.
variable 2.5% 5% 95% 97.5%
1 mu -2.0 -0.55 14 15
2 tau 1.1 1.33 13 15
The summary functions are applied to the array of sample values, with
dimension iter_sampling
xchains
.
fit$summary(variables = NULL, dim, colMeans)
variable dim.1 dim.2 1 2 3 4
1 lp__ 1000 4 -58.8 -58.4 -59.0 -59.4
2 mu 1000 4 6.8 6.7 6.6 6.1
3 tau 1000 4 5.7 5.6 5.7 6.1
4 theta[1] 1000 4 9.9 9.5 9.8 9.5
5 theta[2] 1000 4 7.4 7.2 7.0 6.3
6 theta[3] 1000 4 5.8 5.7 5.6 4.8
7 theta[4] 1000 4 6.9 6.7 7.0 6.7
8 theta[5] 1000 4 4.9 4.8 4.6 4.1
9 theta[6] 1000 4 5.7 5.8 5.6 4.8
10 theta[7] 1000 4 9.6 9.8 9.4 9.2
11 theta[8] 1000 4 7.0 7.3 7.0 7.0
For this reason users may have unexpected results if they use
stats::var()
directly, as it will return a covariance
matrix. An alternative is the distributional::variance()
function, which can also be accessed via
posterior::variance()
.
variable posterior::variance ~var(as.vector(.x))
1 mu 19 19
2 tau 14 14
Summary functions need not be numeric, but these won’t work with
$print()
.
strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
variable Strictly Positive
1 lp__ no
2 mu no
3 tau yes
4 theta[1] no
5 theta[2] no
6 theta[3] no
7 theta[4] no
8 theta[5] no
9 theta[6] no
10 theta[7] no
11 theta[8] no
For more information, see posterior::summarise_draws()
,
which is called by $summary()
.