profitsbas.blogg.se - Stata bootstrap

#STATA BOOTSTRAP HOW TO#
#STATA BOOTSTRAP PDF#
#STATA BOOTSTRAP CODE#
#STATA BOOTSTRAP SERIES#
#STATA BOOTSTRAP DOWNLOAD#

Ordered probit, Self reported health statusĬonditional Poisson and NB fixed-effect models Odds/ratio in logit smoking/low birth weig Probit/logit, Evans, Farrelly, Montgomery, AER The examples have been executed on a Dell Vostro 3300 notebook running Ubuntu 14. Programs and data sets to accompany the tutorialĬps87.csv the same data set but in csv (comma delimited format)Ĭps87.do STATA program that generates all the results in the tutorial Please notice that the only examples actually designed to show potential speed gains are parfor and bootstrap.

#STATA BOOTSTRAP PDF#

Introduction to STATA tutorial (in pdf format)

#STATA BOOTSTRAP DOWNLOAD#

The tutorial is designed to be interactive where you type along with the worksheet.� Please download and print the tutorial, find a computer with STATA, then start rsample2 <- function(data=tdt, id.unit=id.u, id.cluster=id.Below is an introduction to STATA that accompanies this class.

#STATA BOOTSTRAP CODE#

I have written a speedier (but less flexible) version of the code snippet linked to above - check here for updates and details. Thanks to the pointer from the UCLA Statistical Consulting Group. There is a nice explanation here (along with some R code to implement this). The answer seems to be that the resampling process needs to take account of the structure of the data.

#STATA BOOTSTRAP SERIES#

Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(3), 369-390. Nonparametric bootstrapping for hierarchical data. Ren, S., Lai, H., Tong, W., Aminzadeh, M., Hou, X., & Lai, S. In practice there is no one-size-fits-all solution but with complicated data structures you should choose such bootstrap sampling scheme that best fits your data and your problem and if possible use a simulation study to compare different solutions.ĭavison, A.C.

whole season if we assume seasonality) rather than individual observations because otherwise the time structure would get destroyed. Notice that we have similar problem with bootstrapping time-series data and in this case we also rather sample whole blocks of series (e.g. You can find more information about this topic also in books by Efron and Tibshirani (1994) and Davison and Hinkley (1997). Generally options (2) or (3b) are preferable as it seems that including too much levels of sampling with replacement leads to biased results. Choosing between (2) and (3) is more complicated, but hopefully you can find research papers considering this topic (e.g. Recall that bootstrap sampling should somehow imitate the sampling process in your study and you were sampling schools rather than individual students. It appears that the first approach is the worst one. first sample schools with replacement and then sample students (a) with replacement, or (b) without replacement.If there are similarities within schools then if you sampled pupils at random, not taking into consideration their school membership you could possibly destroy the hierarchical structure of your data. We can assume that there are some similarities within schools and differences between schools. Since there is only one class per school, so the second level is nonexistent in your data.

#STATA BOOTSTRAP HOW TO#

How to do it?įirst, notice that your data is hierarchical, it has several levels: schools, classes within schools, and students within classes. You conducted analysis and now want to use bootstrap to obtain confidence intervals for your estimates. You took a random sample of schools from some area and from each school one class was included in the study. Imagine that you conducted a study about children educational achievements. Local mhr = exp(sqrt(2*e(theta))*invF(`twoinvtheta2',`twoinvtheta2',0.75))īootstrap r(mhr), reps(50) cluster(hospital): est_mhr I have listed the stata code below in case that helps. I am wondering if the answer depends on the parameter of interest, and so would be different if the target was something that was relevant at the patient level rather than the hospital level? resample hospitals, rather than patients? But I don't know if I should cluster the hospitals too (i.e. It seems obvious that I need to cluster the patient observations when re-sampling. The data is survival data, and hence there are multiple observations per patient, and multiple patients per hospital. However, now I wish to report the uncertainty associated with this estimate using the bootstrap. I have found the following references which use the Median Hazard Ratio (a bit like the Median Odds Ratio), and calculated this.īengtsson T, Dribe M: Historical Methods 43:15, 2010 The random effect is gamma-distributed, and I am trying to report the 'relevance' of this term on a scale that is easily understood. I have a survival model with patients nested in hospitals that includes a random-effect for the hospitals.