Resampling Methods — A Simple Introduction to The Bootstrap Method

Intuition, motivation, and how it works on the bootstrap method

Introduction

In wikipedia — the bootstrap is statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient.

Motivation - why we need to strong estimate population parameter

Intuition — How does the bootstrap method estimate a population parameter ?

figure 1

Just looking at the figure 1, there are several new populations (blue boxes) that refer to the original population. If we analyze the process of forming a new population, each item drawn from the sample space is replaced back into the sample space. Hence the sample space remains the same for all the items drawn from it, this process is called the sampling with replacement method.

every new population formation, the resulting population parameters are recorded. In this case, the results of the first population parameter yield mean 4.11, the second population parameter yield mean 5.55, this process is repeated until it has a strong estimate of a population parameter.

Generally, bootstrap involves the following steps:

  1. Original Population With Sample Size N — A sample from population with sample size N.
  2. B set Bootstrap Sample Size of N — Create a random sample with replacement from the original sample with sample size N as the original sample, replicate B time, and there will totally B Bootstrap Samples
  3. B set Bootstrap Estimate Population Parameters — evaluate the resulting of population parameters
  4. Further Inference — strong estimate of population parameters such as mean, standard error, confidence interval, etc.

Notes of The Bootstrap Method

  1. The bootstrap distribution usually estimates the shape, spread, and bias of the actual sampling distribution.
  2. The bootstrap process does not replace or add new data.
  3. Bootstrapping cannot be done when :
  • The data are so small that they do not approach values in the population
  • The data has many outliers
  • Time series data, the bootstrap is based on the assumption of data independence

Continue Learning — How Bootstrap Method work in Machine Learning

About Me

Reference

  1. Wikipedia — Bootstrapping (statistics)
  2. Widhiarso, Whayu. Introduce to the bootstrap
  3. Yen, Lorna. An Introduction to the Bootstrap Method

Data Scientist and Artificial Intelligence Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store