sampling error and how many responses to collect

Why do quantitative research?

We need quantitative research in order to obtain more or less accurate data that can be used for analysis and data-driven decisions. They help us find out how many people face a certain problem, or how common certain preferences/interests/pains/needs/fears are among a selected audience segment. We need such data when we want to draw conclusions based on facts and not on guesswork.

Quantitative research helps test hypotheses and predict what might happen in the future. If a company thinks that improving a feature of a product (or developing a new feature) will make it more popular, then quantitative research can either confirm this idea or show that this is not the case. This allows you to minimize risks and allocate resources more wisely, without wasting time and money on things that will not benefit the product.

Those. data obtained from quantitative research can be subjected to statistical analysis and answered in advance similar questions:

  • What percentage of the target audience is interested in the product and what factors influence their choice.

  • Is it worth launching a new product, knowing the number of audiences interested in it?

  • Is there sufficient interest in the product in the new GEO to minimize risks when entering new markets?

  • What percentage of the audience prefers competitors' products and for what reasons, in order to understand their weaknesses and improve the product or service.

  • How in demand are certain product functions among the target audience. For example, 70% of respondents can indicate the need to add a new feature, or the absolute uselessness of another feature.

Quantitative research should be carried out when it is necessary to collect information from large group of peopleto obtain representative and statistical data and draw conclusions about the entire target audience (or a specific segment/person).

In order for the results to be reliable so that we can apply them somewhere and trust them, it is important to understand how many respondents need to be surveyed and to estimate the sampling error, which will indicate how much the sample results may deviate.

What is sampling error?

Sampling error: This is an indicator that reflects the degree of deviation of sample data from the population data.

Now in simple words: Sampling error shows how the results of a survey of a small group of people may differ from the actual opinions of the entire target audience.

Even simpler and with an example: Let's say the entire audience of a site or application is 10,000 users(registered and paid for subscription), and you surveyed 500 of them. Sampling error indicates the extent to which the results of a survey of these 500 people may differ from what the entire audience thinks from 10,000 users.

Sampling error arises due to the fact that we cannot survey absolutely everyone, but collect information only from a certain part (sample) of the entire target audience.

For example, if we want to find out the opinion of all users of a social network, but conduct a survey only among a small group of users, then the results obtained will not completely coincide with the opinion of the entire audience. This difference is the sampling error. The larger the sample size, the smaller its error.

How is sampling error calculated?

Now there will be formulas and definitions, but it’s better to understand the mat. partly at once. Below I will explain everything in simple words and give real examples.

Formula for calculating sampling error:

Formula for calculating sampling error.

Formula for calculating sampling error.

Where:

  • E — sampling error (the error we want to calculate).

  • Z — coefficient associated with the level of trust. For example, for a confidence level of 95% the coefficient Z equals 1.96.

  • p — a share that expresses the probability that a respondent will choose a specific answer option. For example, p = 0.5 used for maximum uncertainty.

  • n — sample size, or number of respondents.

  • q = 1 – p. This simplifies writing and makes the formula more convenient to use.

Let's, without leaving the cash register, let's deal with what's already on the table before moving on. But if on the mat. part of it is crystal clear to you, then move on directly to the next part.

E – sampling error

Sampling error measures how different survey results may be from actual values ​​for the entire audience. It measures the error that arises due to the fact that we do not study the entire audience, but only a part of it (sample). The less Ethe more accurate our results are considered.

Let's say we want to find out what percentage of the aforementioned app/site subscribers prefer a dark color theme.
We have 10,000 subscribers in the application, but we conducted a survey randomly among 500 of them. After analysis, we get the result: 60% of respondents prefer a dark color theme.

But since we only surveyed a small part of the population, there may be errors, e.g. ±5%. This means that the actual percentage of people who prefer a dark color theme ranges from 55% to 65%. Eor sampling error, shows this possible range of deviations. It explains how our results may differ from the actual situation on the site.


Z is a coefficient associated with the level of trust.

This coefficient is determined based on the normal distribution (Gaussian distribution).

Bell curve

Bell curve

Imagine that you are measuring the heights of people in a large group, and all the results are arranged in a graph that resembles a bell (this is called normal distribution). In the middle of the graph – average height (Two zones of 34.1%). Most people have a height close to this average, and the further away from the average, the fewer people with this height. This is the range where most people are within the average height plus or minus some values.

Now when we talk about Z factorwe are trying to understand how wide we need to take this range from the average to cover most of the values. For example, if you want to be sure that 95% of all values ​​fall within your range (for example, people's heights are within 95% of all cases), then you need to “stretch” this range by 1.96 standard deviations from the average – this value is 1.96.

That is, a range is an area on a graph that covers the mean plus or minus a certain number of standard deviations. Meaning Z (For example, 1.96 for the level of trust 95%) tells us how wide we need to take this range to cover 95% of all possible values. This gives us confidence that our study results fall within this range. 95% of cases.

If it's still not clear, then just accept the fact that in most cases you just need to leave this ratio as 1.96. And if you want to dig deeper into the mat. statistics – just look at a couple of videos on YouTube. That's not what this article is about.


p—sample fraction or sample proportion

p is a number that shows how likely it is that a respondent will choose a certain answer option. For example, p = 0.5 means that the probability is equal 50%that is, the chance that a person will choose a specific answer is the same as the chance that he will not choose it (i.e., choose any other). This is used when we don't know anything in advance and assume that both options (will choose or not choose) are equally likely – this is called maximum uncertainty.

If p = 0.9this means that the probability that the respondent will choose a specific answer option (any, but specific) is 90%. This is a fairly high probability, and, accordingly, our confidence in this result is higher. Remaining 10% (because 1 – p = 0.1) is the probability that the respondent will choose any other option.

At p = 0.9 the sampling error will be smaller compared to p = 0.5with the same sample size. The reason is that there is a high probability (e.g. 90%) means that most respondents will choose the same option and there is less variation in the data.

Therefore, for maximum uncertainty in the respondent’s choice, we accept p = 0.5.


n – sample size

n – This sample sizeor the number of people (respondents) you survey in your study. This number shows how many people you have selected from the entire target audience in order to obtain results based on which you can draw conclusions about the entire audience.

For example, if you want to find out what users of an app that has 10,000 subscribers think, then surveying all 10,000 is not always possible. Instead you choose a certain amount person, for example, 500and it is these 500 people who make up your sampleA n = 500.


Examples of error calculations

Example 1

  • Confidence level – 95%;

  • The share of uncertainty is 50% (maximum uncertainty);

  • Sample size – 400;

The formula will look like this:

Calculation of sampling error.

Calculation of sampling error.

Calculation of sampling error.

Calculation of sampling error.

In this case, the sampling error is about 4.9%.


Example 2

The formula will look like this:

Calculation of sampling error.

Calculation of sampling error.

The sampling error in this case is about 3.1%.


Relationship between sample size, audience and margin of error

Consider the size of the audience

To take into account audience size (general population), in the formula for calculating the sampling error, use population adjustment. This is especially important when the sample size is a significant proportion of the entire audience, for example when studying a small group of people.

The final population adjustment is applied as follows:

Calculation of sampling error with correction

Calculation of sampling error with correction

Where:

E — sampling error.

Z – coefficient associated with the level of confidence (for example, 1.96 for 95%).

p — success rate (probability of choosing a specific answer option).

n — sample size.

N — size of the audience (general population).

Why is the correction not so important for a large audience?

When the audience is huge, e.g. 1 million peoplethen the poll 400 people still provides representative data because the normal distribution results show that as the sample size increases above a certain level (500 / 600 / 700 people instead of 400), accuracy begins to improve slightly. In other words, when the audience is very large, even a small sample will still reflect the distribution of opinions and preferences in the group fairly well, and further increase in the sample will give an increasingly smaller increase in accuracy. This is the reason why, with large audiences, the correction for the final population has practically no effect.

Why is correction important for small audiences?

Why is it especially important to consider sample size and apply population adjustment during research small groups of people?

When the audience is small and you survey a significant portion of it, the sample becomes more representativesince you cover a large percentage of the entire group.

Imagine that you have a small audience – just 100 peopleand you decided to survey 80 of them. In this case, your 80 people represent almost the entire audience, that is 80% all people. This means that your research is very close to getting the opinions of the entire group. Therefore, the error will be small, because you interviewed almost everyone.

But if we apply the formula, we get:

Calculation of sampling error.

Calculation of sampling error.

The calculated error was 10.96%. This value shows that if the audience were large, the margin of error would be very high (10.96%), since only 80 people.

If we didn't take into account the fact that the audience small (100 people, not 1,000,000), and the sample covers almost all of it (80 out of 100), the formula would show the same error as if you polled a small part of a large audience (80 out of 1,000,000). But in fact, since you polled almost all the people, your results are more accurate, and therefore an adjustment is needed – to reduce the calculated error, since it is actually lower.

Population adjustment does just that—it reduces the margin of error because your survey includes a significant portion of the entire audience, making your results more reliable.

Let's recalculate with an amendment using this formula:

Calculation of sampling error with correction

Calculation of sampling error with correction

Let's substitute the values:

Calculation of sampling error with correction

Calculation of sampling error with correction

The calculated error, taking into account the correction for the finite population, is 4.89%. This is significantly lower than the error 10.96% without taking into account the adjustment, which shows how much the size of the audience influences when the sample makes up a significant part of it.

So how many responses should you collect during your research?

To determine the required sample size in a quantitative study, it is necessary to consider desired error.

Determining sample size and desired margin of error

If we want the results to be error no more than 5% and level of trust 95%this means that in 95% of cases the survey results will be close to the real value, and the deviation will be no more ±5%. To do this you will need to poll approximately 400 people. Moreover, the higher the desired accuracy (smaller error), the more respondents need to be surveyed.

Example:

Let's say we have a large audience – 1 million people.

  • To determine what percentage of the audience will use the new paid feature”XXX“, we decide to survey 400 people.

  • If 20% respondents voted for “Yes we will“then we can be on 95% are sure that in reality 15-25% the audience will use it.

  • That is, the research results will be close to real values ​​with a high probability, with an error of no more than ±5%.

  • That is, in practice 15-25% from 1 million a person will use a paid feature”XXX“.

  • That is, after the release we can expect from 150,000to 250,000 users of the new feature”XXX“.

If we want to reduce the error to 3%you will need to increase the sample to 1000 people. As the sample size increases, accuracy increases and error decreases, but the costs of collecting and processing data also increase.


Effect of sample size on uncertainty

The larger the sample size, the smaller error. For example, for audiences with millions of people, the difference between a survey 400 people And 1000 people insignificant in terms of error (5% and 3%, respectively). However, each additional survey increases the costs and resources spent on the study.

Example for a large audience:

If the audience consists of 1 million peopleand we asked 400 person, the error will be approximately 5%. By increasing the sample to 1000 person, the error will be reduced to 3%. But with a further increase in the sample (for example, to 2000 respondents), the reduction in error will no longer be so significant and may amount to 2% or even less, which is not always justified in terms of the costs and resources required for such a large-scale survey.

Example for a small audience:

Now imagine the audience from 1000 people. If we asked 400 man, we've got you covered 40% the entire audience, and our margin of error will be significantly smaller than if it were a large audience. Adding a finite population correction will further reduce the bias since the sample covers a significant portion of the entire population. In this case, the survey 400 the person will give very accurate results, close to what the entire audience thinks.


General rules for determining sample size

  1. The larger the sample size, the smaller the margin of error. However, after a certain level (for example, 1000-2000 respondents), an increase in sample size begins to bring a slight decrease in error, which may not be justified from the point of view of research costs.

  2. Finite Population Adjustment matters when the sample makes up a significant portion of the audience. In the case of a small audience (for example, several thousand people), if the sample covers a larger part, the accuracy increases and the error decreases.

To determine the required sample size in a quantitative study, the desired level of error must be considered.

  • If you want to get error 5% with a confidence level of 95%, you need to interview approximately 400 people.

  • For error of 3% it will take about 1000 respondents.

  • When sampling in 200 person, with a 95% confidence level, you will receive error of 6.93%

Conclusion

To determine sample size in quantitative research, it is important to consider the desired margin of error and level of confidence. The larger the sample, the smaller the error, however, after a certain level, further increases in the sample do not significantly improve the results, and research costs continue to rise.

When working with small audiences, adjustment for the finite population reduces the error, since a significant part of the audience is surveyed. For large audiences, standard calculation methods provide reasonable accuracy at a reasonable cost, even if the sample is relatively small.

My telegram: t.me/mr_ponder

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *