CmdStanModel-method-sample.Rd
The sample
method of a CmdStanModel
object runs the default
MCMC algorithm in CmdStan (algorithm=hmc engine=nuts
), to produce a set
of draws from the posterior distribution of a model conditioned on some
data.
$sample( num_chains = 1, # num_cores = NULL, # not yet available data = NULL, num_warmup = NULL, num_samples = NULL, save_warmup = FALSE, thin = NULL, refresh = NULL, init = NULL, seed = NULL, max_depth = NULL, metric = NULL, stepsize = NULL, adapt_engaged = NULL, adapt_delta = NULL )
The following arguments can
be specified for any of the fitting methods (sample
, optimize
,
variational
). Arguments left at NULL
default to the default used by the
installed version of CmdStan.
data
(multiple options): The data to use:
A named list of R objects like for RStan;
A path to a data file compatible with CmdStan (R dump or JSON). See the appendices in the CmdStan manual for details on using these formats.
seed
: (positive integer) A seed for the (P)RNG to pass to CmdStan.
refresh
: (non-negative integer) The number of iterations between
screen updates.
init
: (multiple options) The initialization method:
A real number x>0
initializes randomly between [-x,x]
(on the
unconstrained parameter space);
0
initializes to 0
;
A character vector of data file paths (one per chain) to initialization files.
sample
methodIn addition to the
arguments above, the sample
method also has its own set of arguments.
These arguments are described briefly here and in greater detail in the
CmdStan manual. Arguments left at NULL
default to the default used by the
installed version of CmdStan.
num_samples
: (positive integer) The number of sampling iterations.
num_warmup
: (positive integer) The number of warmup iterations.
save_warmup
: (logical) Should warmup iterations also be streamed
to the output?
thin
: (positive integer) The period between saved samples. This should
typically be left at its default (no thinning).
adapt_engaged
: (logical) Do warmup adaptation?
adapt_delta
: (real in (0,1)
) The adaptation target acceptance
statistic.
stepsize
: (positive real) The initial step size for the discrete
approximation to continuous Hamiltonian dynamics. This is further tuned
during warmup.
metric
: (character) The geometry of the base manifold. One of the
following:
A single string from among "diag_e"
, "dense_e"
, "unit_e"
;
A character vector containing paths to files (one per chain)
compatible with CmdStan that contain precomputed metrics.
Each path must be to a JSON or Rdump file that contains an entry
inv_metric
whose value is either the diagonal vector or the full
covariance matrix.
If you want to turn off adaptation when using a precomuted metric set
adapt_engaged=FALSE
, otherwise it will use the precomputed metric just
as an initial guess during adaptation. See the Euclidean Metric section
of the CmdStan manual for more details on these options.
max_depth
: (positive integer) The maximum allowed tree depth. See the
Tree Depth section of the CmdStan manual for more details.
The sample
method returns a CmdStanMCMC
object.
The CmdStanR website (mc-stan.org/cmdstanr) for online documentation and tutorials.
The Stan and CmdStan documentation:
Stan doc (html or pdf): mc-stan.org/users/documentation/
CmdStan doc (pdf): (github.com/stan-dev/cmdstan/releases/).
Other CmdStanModel methods: CmdStanModel-method-compile
,
CmdStanModel-method-optimize
,
CmdStanModel-method-variational
# \dontrun{ # Set path to cmdstan # Note: if you installed CmdStan via install_cmdstan() with default settings # then default below should work. Otherwise use the `path` argument to # specify the location of your CmdStan installation. set_cmdstan_path(path = NULL)#># Create a CmdStan model object from a Stan program, # here using the example model that comes with CmdStan stan_program <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.stan") mod <- cmdstan_model(stan_program) mod$print()#> data { #> int<lower=0> N; #> int<lower=0,upper=1> y[N]; #> } #> parameters { #> real<lower=0,upper=1> theta; #> } #> model { #> theta ~ beta(1,1); #> for (n in 1:N) #> y[n] ~ bernoulli(theta); #> }# Compile to create executable mod$compile()#> Running make /Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli #> make: `/Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli' is up to date.# Run sample method (MCMC via Stan's dynamic HMC/NUTS), # specifying data as a named list (like RStan) standata <- list(N = 10, y =c(0,1,0,0,0,0,0,0,0,1)) fit_mcmc <- mod$sample(data = standata, seed = 123, num_chains = 2)#> method = sample (Default) #> sample #> num_samples = 1000 (Default) #> num_warmup = 1000 (Default) #> save_warmup = 0 (Default) #> thin = 1 (Default) #> adapt #> engaged = 1 (Default) #> gamma = 0.050000000000000003 (Default) #> delta = 0.80000000000000004 (Default) #> kappa = 0.75 (Default) #> t0 = 10 (Default) #> init_buffer = 75 (Default) #> term_buffer = 50 (Default) #> window = 25 (Default) #> algorithm = hmc (Default) #> hmc #> engine = nuts (Default) #> nuts #> max_depth = 10 (Default) #> metric = diag_e (Default) #> metric_file = (Default) #> stepsize = 1 (Default) #> stepsize_jitter = 0 (Default) #> id = 1 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b02199f3176.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> #> Gradient evaluation took 1.7e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.17 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 100 / 2000 [ 5%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 300 / 2000 [ 15%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 500 / 2000 [ 25%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 700 / 2000 [ 35%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 900 / 2000 [ 45%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1100 / 2000 [ 55%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1300 / 2000 [ 65%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1500 / 2000 [ 75%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1700 / 2000 [ 85%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 1900 / 2000 [ 95%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.012149 seconds (Warm-up) #> 0.018859 seconds (Sampling) #> 0.031008 seconds (Total) #> #> method = sample (Default) #> sample #> num_samples = 1000 (Default) #> num_warmup = 1000 (Default) #> save_warmup = 0 (Default) #> thin = 1 (Default) #> adapt #> engaged = 1 (Default) #> gamma = 0.050000000000000003 (Default) #> delta = 0.80000000000000004 (Default) #> kappa = 0.75 (Default) #> t0 = 10 (Default) #> init_buffer = 75 (Default) #> term_buffer = 50 (Default) #> window = 25 (Default) #> algorithm = hmc (Default) #> hmc #> engine = nuts (Default) #> nuts #> max_depth = 10 (Default) #> metric = diag_e (Default) #> metric_file = (Default) #> stepsize = 1 (Default) #> stepsize_jitter = 0 (Default) #> id = 2 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b02199f3176.data.R #> init = 2 (Default) #> random #> seed = 124 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-2.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> #> Gradient evaluation took 1.8e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 100 / 2000 [ 5%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 300 / 2000 [ 15%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 500 / 2000 [ 25%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 700 / 2000 [ 35%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 900 / 2000 [ 45%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1100 / 2000 [ 55%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1300 / 2000 [ 65%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1500 / 2000 [ 75%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1700 / 2000 [ 85%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 1900 / 2000 [ 95%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.011733 seconds (Warm-up) #> 0.019533 seconds (Sampling) #> 0.031266 seconds (Total) #>#> Running bin/stansummary \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-1.csv \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-sample-2.csv #> Inference for Stan model: bernoulli_model #> 2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved. #> #> Warmup took (0.012, 0.012) seconds, 0.024 seconds total #> Sampling took (0.019, 0.020) seconds, 0.038 seconds total #> #> Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat #> lp__ -7.3 2.7e-02 7.3e-01 -8.9 -7.0 -6.8 738 19235 1.0e+00 #> accept_stat__ 0.92 3.1e-03 1.3e-01 0.64 0.97 1.0 1670 43487 1.0e+00 #> stepsize__ 0.92 1.7e-03 1.7e-03 0.92 0.92 0.92 1.0 26 1.4e+12 #> treedepth__ 1.3 1.1e-02 4.7e-01 1.0 1.0 2.0 1968 51255 1.0e+00 #> n_leapfrog__ 2.4 2.5e-02 1.0e+00 1.0 3.0 3.0 1640 42721 1.0e+00 #> divergent__ 0.00 0.0e+00 0.0e+00 0.00 0.00 0.00 1000 26047 nan #> energy__ 7.8 4.0e-02 1.0e+00 6.8 7.5 9.8 647 16846 1.0e+00 #> theta 0.24 4.6e-03 1.2e-01 0.077 0.22 0.47 720 18757 1.0e+00 #> #> Samples were drawn using hmc with nuts. #> For each parameter, N_Eff is a crude measure of effective sample size, #> and R_hat is the potential scale reduction factor on split chains (at #> convergence, R_hat=1). #># Run optimization method (default is Stan's LBFGS algorithm) # and also demonstrate specifying data as a path to a file (readable by CmdStan) my_data_file <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.data.R") fit_optim <- mod$optimize(data = my_data_file, seed = 123)#> Warning: Optimization method is experimental and the structure of returned object may change.#> method = optimize #> optimize #> algorithm = lbfgs (Default) #> lbfgs #> init_alpha = 0.001 (Default) #> tol_obj = 9.9999999999999998e-13 (Default) #> tol_rel_obj = 10000 (Default) #> tol_grad = 1e-08 (Default) #> tol_rel_grad = 10000000 (Default) #> tol_param = 1e-08 (Default) #> history_size = 5 (Default) #> iter = 2000 (Default) #> save_iterations = 0 (Default) #> id = 1 #> data #> file = /Users/jgabry/.cmdstanr/cmdstan/examples/bernoulli/bernoulli.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-optimize-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> Initial log joint probability = -9.51104 #> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes #> 6 -5.00402 0.000103557 2.55661e-07 1 1 9 #> Optimization terminated normally: #> Convergence detected: relative gradient magnitude is below tolerance#> Estimates from optimization:#> theta lp__ #> 0.20000 -5.00402# Run variational Bayes method (default is meanfield ADVI) fit_vb <- mod$variational(data = standata, seed = 123)#> Warning: Variational inference method is experimental and the structure of returned object may change.#> method = variational #> variational #> algorithm = meanfield (Default) #> meanfield #> iter = 10000 (Default) #> grad_samples = 1 (Default) #> elbo_samples = 100 (Default) #> eta = 1 (Default) #> adapt #> engaged = 1 (Default) #> iter = 50 (Default) #> tol_rel_obj = 0.01 (Default) #> eval_elbo = 100 (Default) #> output_samples = 1000 (Default) #> id = 1 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/Rtmpy8TKSY/standata-b022268471e.data.R #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-variational-1.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> ------------------------------------------------------------ #> EXPERIMENTAL ALGORITHM: #> This procedure has not been thoroughly tested and may be unstable #> or buggy. The interface is subject to change. #> ------------------------------------------------------------ #> #> #> #> Gradient evaluation took 2.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds. #> Adjust your expectations accordingly! #> #> #> Begin eta adaptation. #> Iteration: 1 / 250 [ 0%] (Adaptation) #> Iteration: 50 / 250 [ 20%] (Adaptation) #> Iteration: 100 / 250 [ 40%] (Adaptation) #> Iteration: 150 / 250 [ 60%] (Adaptation) #> Iteration: 200 / 250 [ 80%] (Adaptation) #> Success! Found best value [eta = 1] earlier than expected. #> #> Begin stochastic gradient ascent. #> iter ELBO delta_ELBO_mean delta_ELBO_med notes #> 100 -6.258 1.000 1.000 #> 200 -6.475 0.517 1.000 #> 300 -6.228 0.358 0.040 #> 400 -6.220 0.269 0.040 #> 500 -6.379 0.220 0.034 #> 600 -6.195 0.188 0.034 #> 700 -6.262 0.163 0.030 #> 800 -6.345 0.144 0.030 #> 900 -6.201 0.131 0.025 #> 1000 -6.307 0.119 0.025 #> 1100 -6.290 0.020 0.023 #> 1200 -6.238 0.017 0.017 #> 1300 -6.182 0.014 0.013 #> 1400 -6.167 0.014 0.013 #> 1500 -6.219 0.012 0.011 #> 1600 -6.164 0.010 0.009 MEDIAN ELBO CONVERGED #> #> Drawing a sample of size 1000 from the approximate posterior... #> COMPLETED.#> Running bin/stansummary \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T//Rtmpy8TKSY/bernoulli-stan-variational-1.csv #> Warning: non-fatal error reading adapation data #> Inference for Stan model: bernoulli_model #> 1 chains: each with iter=(1001); warmup=(0); thin=(0); 1001 iterations saved. #> #> Warmup took (0.00) seconds, 0.00 seconds total #> Sampling took (0.00) seconds, 0.00 seconds total #> #> Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat #> lp__ 0.00 0.0e+00 0.00 0.00 0.00 0.0e+00 500 inf nan #> log_p__ -7.2 2.5e-02 0.72 -8.6 -7.0 -6.8e+00 789 inf 1.0e+00 #> log_g__ -0.54 2.9e-02 0.76 -2.1 -0.27 -1.5e-03 679 inf 1.0e+00 #> theta 0.26 4.2e-03 0.12 0.091 0.23 4.9e-01 823 inf 1.0e+00 #> #> Samples were drawn using meanfield with . #> For each parameter, N_Eff is a crude measure of effective sample size, #> and R_hat is the potential scale reduction factor on split chains (at #> convergence, R_hat=1). #># For models fit using MCMC, if you like working with RStan's stanfit objects # then you can create one with rstan::read_stan_csv() if (require(rstan, quietly = TRUE)) { stanfit <- rstan::read_stan_csv(fit_mcmc$output_files()) print(stanfit) }#> Inference for Stan model: bernoulli-stan-sample-1. #> 2 chains, each with iter=2000; warmup=1000; thin=1; #> post-warmup draws per chain=1000, total post-warmup draws=2000. #> #> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat #> theta 0.24 0.00 0.12 0.06 0.14 0.22 0.32 0.52 720 1 #> lp__ -7.32 0.03 0.73 -9.37 -7.53 -7.05 -6.81 -6.75 737 1 #> #> Samples were drawn using NUTS(diag_e) at Mon Oct 14 21:41:31 2019. #> For each parameter, n_eff is a crude measure of effective sample size, #> and Rhat is the potential scale reduction factor on split chains (at #> convergence, Rhat=1).# }