Population estimates from samples
gbnewby at pglaf.org
Mon May 31 08:51:59 PDT 2021
Karl had been asking about how to estimate population statistics from a sample.
I came across a fascinating approach to this, called bootstrapping or the bootstrap method. I learned about this an EdX data science course offered by Berkeley.
The method is described in the course's free online textbook:
This method makes some assumptions, including that your sample is reasonably large, and that the population distribution is approximately normal.
The basic approach is to take your sample, and then randomly re-sample from the sample. This lets you build up a probability distribution of samples which, in turn, is representative of the population.
The text includes some worked examples. The course uses Python (Jupyter notebooks) and a computational framework based on Pandas.
The course sequence: https://www.edx.org/professional-certificate/berkeleyx-foundations-of-data-science
It's the second course, "Inferential thinking through simulations," which introduces and builds on the bootstrap concept. Lots of the materials are free - including the computational framework and examples. I'm not sure whether you can audit the (self-paced) course for free.
I found the bootstrap method to be very interesting. It is not something that came up in my many grad and undergrad statistics courses or other research methods (maybe it emerged since I was a university student).
I hope this helps.
More information about the cypherpunks