Population and Sample
In statistics as well as in quantitative methodology, the set of data are collected and selected from a statistical population with the help of some defined procedures. There are two different types of data sets namely, population and sample. So basically when we calculate the mean deviation, variance and standard deviation, it is necessary for us to know if we are referring to the entire population or to only sample data. Suppose the size of the population is denoted by ‘n’ then the sample size of that population is denoted by n -1. Let us take a look of population data sets and sample data sets in detail.
It includes all the elements from the data set and measurable characteristics of the population such as mean and standard deviation are known as a parameter. For example, All people living in India indicates the population of India.
There are different types of population. They are:
- Finite Population
- Infinite Population
- Existent Population
- Hypothetical Population
Let us discuss all the types one by one.
The finite population is also known as a countable population in which the population can be counted. In other words, it is defined as the population of all the individuals or objects that are finite. For statistical analysis, the finite population is more advantageous than the infinite population. Examples of finite populations are employees of a company, potential consumer in a market.
The infinite population is also known as an uncountable population in which the counting of units in the population is not possible. Example of an infinite population is the number of germs in the patient’s body is uncountable.
The existing population is defined as the population of concrete individuals. In other words, the population whose unit is available in solid form is known as existent population. Examples are books, students etc.
The population in which whose unit is not available in solid form is known as the hypothetical population. A population consists of sets of observations, objects etc that are all something in common. In some situations, the populations are only hypothetical. Examples are an outcome of rolling the dice, the outcome of tossing a coin.
It includes one or more observations that are drawn from the population and the measurable characteristic of a sample is a statistic. Sampling is the process of selecting the sample from the population. For example, some people living in India is the sample of the population.
Basically, there are two types of sampling. They are:
- Probability sampling
- Non-probability sampling
In probability sampling, the population units cannot be selected at the discretion of the researcher. This can be dealt with following certain procedures which will ensure that every unit of the population consists of one fixed probability being included in the sample. Such a method is also called random sampling. Some of the techniques used for probability sampling are:
- Simple random sampling
- Cluster sampling
- Stratified Sampling
- Disproportionate sampling
- Proportionate sampling
- Optimum allocation stratified sampling
- Multi-stage sampling
Non Probability Sampling
In non-probability sampling, the population units can be selected at the discretion of the researcher. Those samples will use the human judgements for selecting units and has no theoretical basis for estimating the characteristics of the population. Some of the techniques used for non-probability sampling are
- Quota sampling
- Judgement sampling
- Purposive sampling
Population and Sample Examples
- All the people who have the ID proofs is the population and a group of people who only have voter id with them is the sample.
- All the students in the class are population whereas the top 10 students in the class are the sample.
- All the members of the parliament is population and the female candidates present there is the sample.
Populations and Samples
- Population : it’s a number of something we are observing, humans, events, animals etc. It has some parameters such as the mean, median, mode, standard deviation, among others.
- Sample: it is a random subset from the population. Usually you use samples when the population is big enough to difficult the analysis of the whole set. In a sample you don’t have parameters you have statistics.
Central Limit Theorem
It is said to be the most important theorem of Statistics as well as Mathematics. It can be very powerful when assessing problems and world situations! The Central Limit Theorem states that
the sampling distribution will look like a normal distribution regardless of the population you are analyzing.
As we’ve seen you take a sample to estimate the parameters of the whole population. However, not always only by sampling are you to retrieve the correct estimate of the population’s real parameters.
Instead of taking a single sample, what about if we take several samples from our population? For each sample, we’ll calculate the mean. So, in the end, we’ll have several values of mean estimation and then we can plot them on a chart.
This will be called the sampling distribution of the sample mean.
Central Limit Theorem — Intuition
Let’s learn by looking at an example. Imagine we wanted to see the distribution of the heights in every male in the Portuguese population.
First we take several samples (heights of different man) from our population and for each sample group we calculate the respective mean. For example, we can have groups where the height is 176 cm, others with 182cm, others with 172cm, and so on. We then plot this sample mean distribution. The following picture depicts the distribution of our several samples with the x mark, in each, referring to the mean value.
You can see that although you sampling means (red X marks) might be on the extremes of the general distribution most of them tend to be closer to the center.
In the end, the distribution of all these samples’ mean will present a normal distribution. Take a look at the last chart which is composed of the distribution of all the samples’ mean.
For only 5 samples you can already see that most of the means tend to concentrate towards the center of the sampling distribution. And amazingly, in the end, the mean of the sampling distribution will match the mean of the original distribution of the population.
So there are two certainties with the Central Limit Theorem:
- The sampling distribution will always be normal or close to normal;
- The mean of sampling distribution will be equal to the population’s mean distribution;
- The standard error of your sampling distribution is directly linked to the standard deviation of the original population. nis equal to the number of values you take for each sample.
Let’s look at a more visual example with this GIF. The population already showed a normal distribution but later you can try with other shapes and even draw your own. We start by taking samples of size n=5 from the population and calculating the respective mean. As you increase the number of samples with n= 5 you see that the distribution of the means starts to be shaped like a normal distribution. When we increase the process several times in the thousands we get a normal distribution with a mean equal to the population’s mean. The more samples you get the more narrower the Normal distribution will be.
Now you try it! You can literally draw your distribution on this website using your mouse. Like the one you see here.
So if we start taking samples and calculating the mean of each one of those samples then plot them in a sampling distribution then you’ll obtain a normal distribution centered on the initial mean.