bootclust

 Performs balanced bootstrap (or bootknife) resampling of clustered data and 
 calculates bootstrap bias, standard errors and confidence intervals.

 -- Function File: bootclust (DATA)
 -- Function File: bootclust (DATA, NBOOT)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN)
 -- Function File: bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)
 -- Function File: bootclust (DATA, NBOOT, {BOOTFUN, ...})
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO)
 -- Function File: bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED)
 -- Function File: STATS = bootclust (...)
 -- Function File: [STATS, BOOTSTAT] = bootclust (...)

     'bootclust (DATA)' uses nonparametric balanced bootstrap resampling
     to generate 1999 resamples from clusters of rows of the DATA (column
     vector or matrix). By default, each rows is it's own cluster (i.e. no
     clustering). The means of the resamples are then computed and the
     following statistics are displayed:
        - original: the original estimate(s) calculated by BOOTFUN and the DATA
        - bias: bootstrap bias of the estimate(s)
        - std_error: bootstrap standard error of the estimate(s)
        - CI_lower: lower bound(s) of the 95% bootstrap confidence interval
        - CI_upper: upper bound(s) of the 95% bootstrap confidence interval

     'bootclust (DATA, NBOOT)' specifies the number of bootstrap resamples,
     where NBOOT is a scalar, positive integer corresponding to the number
     of bootstrap resamples. THe default value of NBOOT is the scalar: 1999.

     'bootclust (DATA, NBOOT, BOOTFUN)' also specifies BOOTFUN: the function
     calculated on the original sample and the bootstrap resamples. BOOTFUN
     must be either a:
       <> function handle or anonymous function,
       <> string of function name, or
       <> a cell array where the first cell is one of the above function
          definitions and the remaining cells are (additional) input arguments 
          to that function (other than the data arguments).
        In all cases BOOTFUN must take DATA for the initial input argument(s).
        BOOTFUN can return a scalar or any multidimensional numeric variable,
        but the output will be reshaped as a column vector. BOOTFUN must
        calculate a statistic representative of the finite data sample; it
        should NOT be an estimate of a population parameter (unless they are
        one of the same). If BOOTFUN is @mean or 'mean', narrowness bias of
        the confidence intervals for single bootstrap are reduced by expanding
        the probabilities of the percentiles using Student's t-distribution
        [1]. By default, BOOTFUN is @mean.

     'bootclust ({D1, D2, ...}, NBOOT, BOOTFUN)' resamples from the clusters
     of rows of the data vectors D1, D2 etc and the resamples are passed onto
     BOOTFUN as multiple data input arguments. All data vectors and matrices
     (D1, D2 etc) must have the same number of rows.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA)', where ALPHA is numeric
     and sets the lower and upper bounds of the confidence interval(s). The
     value(s) of ALPHA must be between 0 and 1. ALPHA can either be:
       <> scalar: To set the (nominal) central coverage of equal-tailed
                  percentile confidence intervals to 100*(1-ALPHA)%.
       <> vector: A pair of probabilities defining the (nominal) lower and
                  upper percentiles of the confidence interval(s) as
                  100*(ALPHA(1))% and 100*(ALPHA(2))% respectively. The
                  percentiles are bias-corrected and accelerated (BCa) [2].
        The default value of ALPHA is the vector: [.025, .975], for a 95%
        BCa confidence interval.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID)' also sets CLUSTID,
     which are identifiers that define the grouping of the DATA rows for
     cluster bootstrap case resampling. CLUSTID should be a column vector or
     cell array with the same number of rows as the DATA. Rows in DATA with
     the same CLUSTID value are treated as clusters of observations that are
     resampled together.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTSZ)' groups consecutive
     DATA rows into clusters of length CLUSTSZ. This is equivalent to block
     bootstrap resampling. By default, CLUSTSZ is 1.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO)' sets the
     resampling method. If LOO is false, the resampling method used is
     balanced bootstrap resampling. If LOO is true, the resampling method used
     is balanced bootknife resampling [3]. Where N is the number of clusters,
     bootknife cluster resampling involves creating leave-one-out jackknife
     samples of size N - 1, and then drawing resamples of size N with
     replacement from the jackknife samples, thereby incorporating Bessel's
     correction into the resampling procedure. LOO must be a scalar logical
     value. The default value of LOO is false.

     'bootclust (DATA, NBOOT, BOOTFUN, ALPHA, CLUSTID, LOO, SEED)' initialises
     the Mersenne Twister random number generator using an integer SEED value
     so that bootclust results are reproducible.

     'STATS = bootclust (...)' returns a structure with the following fields
     (defined above): original, bias, std_error, CI_lower, CI_upper.

     '[STATS, BOOTSTAT] = bootclust (...)' returns BOOTSTAT, a vector or matrix
     of bootstrap statistics calculated over the bootstrap resamples.

  REQUIREMENTS:
    The function file boot.m (or better boot.mex) and bootcdf, which are
    distributed with the statistics-resampling package.

  BIBLIOGRAPHY:
  [1] Hesterberg, Tim (2014), What Teachers Should Know about the 
        Bootstrap: Resampling in the Undergraduate Statistics Curriculum, 
        http://arxiv.org/abs/1411.5279
  [2] Efron, and Tibshirani (1993) An Introduction to the Bootstrap. 
        New York, NY: Chapman & Hall
  [3] Hesterberg T.C. (2004) Unbiasing the Bootstrap—Bootknife Sampling 
        vs. Smoothing; Proceedings of the Section on Statistics & the 
        Environment. Alexandria, VA: American Statistical Association.

  bootclust (version 2023.09.20)
  Author: Andrew Charles Penn
  https://www.researchgate.net/profile/Andrew_Penn/

  Copyright 2019 Andrew Charles Penn
  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see http://www.gnu.org/licenses/

Demonstration 1

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 95% expanded BCa bootstrap confidence intervals for the mean
 bootclust (data, 1999, @mean);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mean
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (1.1%, 97.4%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +29.65       +5.329e-14   +2.476       +23.84       +34.21

Demonstration 2

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 95% expanded BCa bootstrap confidence intervals for the mean with
 % cluster resampling
 bootclust (data, 1999, @mean, [0.025,0.975], clustid);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: mean
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Expanded bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (1.1%, 98.8%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +29.65       -0.03235     +2.918       +23.05       +36.04

Demonstration 3

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 90% equal-tailed percentile bootstrap confidence intervals for
 % the variance
 bootclust (data, 1999, {@var, 1}, 0.1);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Percentile (equal-tailed)
 Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -6.745       +41.93       +96.66       +236.5

Demonstration 4

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 90% equal-tailed percentile bootstrap confidence intervals for
 % the variance
 bootclust (data, 1999, {@var, 1}, 0.1, clustid);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Percentile (equal-tailed)
 Nominal coverage (and the percentiles used): 90% (5.0%, 95.0%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -9.772       +33.42       +104.2       +214.3

Demonstration 5

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';

 % 90% BCa bootstrap confidence intervals for the variance
 bootclust (data, 1999, {@var, 1}, [0.05 0.95]);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 90% (10.7%, 98.4%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -6.598       +42.13       +113.2       +257.0

Demonstration 6

The following code


 % Input univariate dataset
 data = [48 36 20 29 42 42 20 42 22 41 45 14 6 ...
         0 33 28 34 4 32 24 47 41 24 26 30 41].';
 clustid = {'a';'a';'b';'b';'a';'c';'c';'d';'e';'e';'e';'f';'f'; ...
            'g';'g';'g';'h';'h';'i';'i';'j';'j';'k';'l';'m';'m'};

 % 90% BCa bootstrap confidence intervals for the variance
 bootclust (data, 1999, {@var, 1}, [0.05 0.95], clustid);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: var
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 90% (12.1%, 98.5%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +171.5       -9.417       +32.82       +122.2       +227.7

Demonstration 7

The following code


 % Input dataset
 y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];

 % 90% BCa confidence interval for regression coefficients 
 bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95]); % Could also use @regress

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: @(y, X) X \ y
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage: 90%

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 -0.1792      +0.008118    +0.2790      -0.7047      +0.2353    
 -0.2602      +0.01050     +0.1644      -0.5326      -0.004009

Demonstration 8

The following code


 % Input dataset
 y = randn (20,1); x = randn (20,1); X = [ones(20,1), x];
 clustid = [1;1;1;1;2;2;2;3;3;3;3;4;4;4;4;4;5;5;5;6];

 % 90% BCa confidence interval for regression coefficients 
 bootclust ({y,X}, 1999, @(y,X) X\y, [0.05 0.95], clustid);

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: @(y, X) X \ y
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage: 90%

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +0.1049      +0.01881     +0.3165      -0.5583      +0.5361    
 +0.09241     -0.05815     +0.3103      -0.4276      +0.5900

Demonstration 9

The following code


 % Input bivariate dataset
 x = [576 635 558 578 666 580 555 661 651 605 653 575 545 572 594].';
 y = [3.39 3.3 2.81 3.03 3.44 3.07 3 3.43 ...
      3.36 3.13 3.12 2.74 2.76 2.88 2.96].';
 clustid = [1;1;3;1;1;2;2;2;2;3;1;3;3;3;2];

 % 95% BCa bootstrap confidence intervals for the correlation coefficient
 bootclust ({x, y}, 1999, @cor, [], clustid);

 % Please be patient, the calculations will be completed soon...

Produces the following output

Summary of nonparametric cluster bootstrap estimates of bias and precision
******************************************************************************

Bootstrap settings: 
 Function: cor
 Resampling method: Balanced, bootstrap cluster resampling 
 Number of resamples: 1999 
 Confidence interval (CI) type: Bias-corrected and accelerated (BCa) 
 Nominal coverage (and the percentiles used): 95% (1.8%, 96.8%)

Bootstrap Statistics: 
 original     bias         std_error    CI_lower     CI_upper  
 +0.7764      -0.02407     +0.1452      +0.3905      +0.9958

Package: statistics-resampling