联系客服
客服二维码

联系客服获取更多资料

微信号:LingLab1

客服电话:010-82185409

意见反馈
关注我们
关注公众号

关注公众号

linglab语言实验室

回到顶部
重采样与模拟 | Resampling and Simulation​

233 阅读 2020-08-18 09:15:02 上传

以下文章来源于 神经语用学


Resampling

01



In statistics, resampling is any of a variety of methods for doing one of the following:

1. Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)

2. Exchanging labels on data points when performing significance tests (permutation tests, also called exact tests, randomization tests, or re-randomization tests)

3. Validating models by using random subsets (bootstrapping, cross validation)

“ Offered the choice between mastery of a five-foot shelf of analytical Statistics books and middling ability at performing statistical Monte Carlo simulations , we would surely choose to have the latter skill.”    



Monte Carlo simulation 

02



The concept of Monte Carlo Simulation was devised by the mathematicians Stan Ulam and Nicholas Metropolis  who were working to develop an atomic weapon as part of the Manhattan Project. They needed to compute the average distance that a neutron would travel in a substance before it collided with an atomic nucleus , but they could not compute this using standard mathematics.

“ Ulam realized that these computations could be simulated using random numbers , just like a casino game . His uncle had gambled at Monte Carlo , which is apparently where the name came from for their new technique. “

Four steps to performing a Monte Carlo Simulation

1. Define a domain of possible values

2. Generate random numbers within that domain from a probability distribution

3. Perform a computation using the random numbers.

4. Combine the results across many repetitions.



Randomness in Statistics

03



In statistics , random means unpredictable , But unpredictable doesn’t means “ not deterministic “.

People have a fairly bad senses of randomness 

  • We tend to see patterns when they don’t exist. ( “ Pareidolia “ )

  • People tend to think of random processes as self-correcting . ( gambler’s fallacy )


Generate Random numbers

04



A truly random number can only be generated through physical process. In R , we use a computer algorithm to generate a pseudo-random number.

In R ,  there is a function to generate random number for each of the major probability distribution :

  • runif()        ————— uniform distribution 

  • rnorm()      —————  normal distribution

  • rbinom()    ————— binomial distribution


Using Monte Carlo Simulation

05



We want to know how much time to allow for an in-class quiz. The distribution of the quiz completion time is normally distributed , the average time is 5min , and standard deviation of 1min. We expect the time could be sufficient to everyone to finish their test 99% of the time.

  • Using mathematical theory : Statistic of extreme value  

  • Using Monte Carlo Simulation —— in R;


Using Simulation for statistics 

06



If we can’t assume that the estimates are normally distributed , or we don’t know their distribution :

The idea to use the data themselves to estimate an answer : bootstrap

The idea behind the Bootstrap is that we repeatedly sample from the actual dataset; importantly, we sample with replacement , such that the same data point will often end up being represented multiple times within one of the samples.

Bootstrap:Discussion

07



Advantages : 

  • Simplicity ; 

  • a straightforward way to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution;

  •  a convenient way avoids the cost of repeating experiments;


We would not usually employ the bootstrap to compute confidence intervals for the mean (since we can generally assume that the normal distribution is appropriate for the sampling distribution of the mean, as long as our sample is large enough), but this example shows how the method gives us roughly the same result as the standard method based on the normal distribution. The bootstrap would more often be used to generate standard errors for estimates of other statistics where we know or suspect that the normal distribution is not appropriate.





点赞
收藏
表情
图片
附件