How Does Sample Size Effect Confidence Interval

Understanding Conviction Intervals | Easy Examples & Formulas

Published on Baronial seven, 2020 by Rebecca Bevans. Revised on February eleven, 2021.

When you make an approximate in statistics, whether information technology is a summary statistic or a test statistic, in that location is ever dubiousness around that approximate because the number is based on a sample of the population you are studying.

The confidence interval is the range of values that you lot look your estimate to fall between a certain pct of the time if you lot run your experiment again or re-sample the population in the same way.

The confidence level is the percentage of times y'all await to reproduce an estimate betwixt the upper and lower premises of the confidence interval, and is set by the alpha value.

What exactly is a conviction interval?

A confidence interval is the mean of your estimate plus and minus the variation in that estimate. This is the range of values y'all look your estimate to autumn between if you redo your test, inside a sure level of confidence.

Confidence, in statistics, is another way to describe probability. For example, if yous construct a confidence interval with a 95% confidence level, you are confident that 95 out of 100 times the estimate will fall between the upper and lower values specified by the conviction interval.

Your desired confidence level is ordinarily i minus the alpha ( a ) value you used in your statistical test:

Confidence level = 1 − a

So if you lot employ an alpha value of p < 0.05 for statistical significance, then your confidence level would be 1 − 0.05 = 0.95, or 95%.

When do yous use confidence intervals?

You tin calculate confidence intervals for many kinds of statistical estimates, including:

Proportions
Population means
Differences between population means or proportions
Estimates of variation amongst groups

These are all point estimates, and don't give whatever information nigh the variation around the number. Confidence intervals are useful for communicating the variation effectually a bespeak judge.

Graph showing two sample populations with the same mean but different levels of variation around the mean. — Instance: Variation around an estimate

Calculating a confidence interval: what you demand to know

Most statistical programs will include the confidence interval of the estimate when you lot run a statistical exam.

If you want to calculate a conviction interval on your own, you need to know:

The point approximate you are constructing the conviction interval for
The disquisitional values for the examination statistic
The standard deviation of the sample
The sample size

Once y'all know each of these components, you tin can calculate the confidence interval for your gauge past plugging them into the confidence interval formula that corresponds to your data.

Point gauge

The point estimate of your confidence interval will be whatsoever statistical estimate you are making (due east.yard. population mean, the difference between population means, proportions, variation among groups).

Example: Point estimate

In the Television set-watching example, the point estimate is the mean number of hours watched: 35.

Finding the critical value

Critical values tell you how many standard deviations away from the mean you need to go in order to reach the desired confidence level for your conviction interval.

There are three steps to find the disquisitional value.

Choose your alpha ( a ) value.

The blastoff value is the probability threshold for statistical significance. The well-nigh common alpha value is p = 0.05, but 0.i, 0.01, and even 0.001 are sometimes used. It's best to look at the papers published in your field to decide which alpha value to use.

Decide if you demand a ane-tailed interval or a 2-tailed interval.

Yous will most likely employ a two-tailed interval unless yous are doing a ane-tailed t-test.

For a two-tailed interval, divide your alpha past two to get the alpha value for the upper and lower tails.

Look up the critical value that corresponds with the blastoff value.

If your data follows a normal distribution, or if you lot take a big sample size (northward > thirty) that is approximately ordinarily distributed, yous can use the z-distribution to find your critical values.

For a z-statistic, some of the nearly mutual values are shown in this table:

Confidence level	90%	95%	99%
alpha for 1-tailed CI	0.i	0.05	0.01
alpha for 2-tailed CI	0.05	0.025	0.005
z-statistic	1.64	i.96	2.57

If y'all are using a small dataset (n ≤ 30) that is approximately commonly distributed, use the t-distribution instead.

The t-distribution follows the aforementioned shape as the z-distribution, merely corrects for small sample sizes. For the t-distribution, you demand to know your degrees of freedom (sample size minus 1).

Check out this gear up of t tables to find your t-statistic. The writer has included the confidence level and p-values for both i-tailed and two-tailed tests to help you lot find the t-value you need.

For normal distributions, like the t-distribution and z-distribution, the critical value is the same on either side of the hateful.

Example: Critical value

In the Television receiver-watching survey, at that place are more than 30 observations and the information follow an approximately normal distribution (bell curve), and so we tin utilise the z-distribution for our test statistics.

For a two-tailed 95% confidence interval, the alpha value is 0.025, and the corresponding critical value is 1.96.

This means that to summate the upper and lower bounds of the conviction interval, we can take the mean ±1.96 standard deviations from the hateful.

Finding the standard deviation

Most statistical software will have a born function to summate your standard divergence, simply to find it by hand yous tin start find your sample variance, and so take the square root to go the standard difference.

Observe the sample variance

Sample variance is divers as the sum of squared differences from the mean, also known as the mean-squared-fault (MSE):

Formula to find the mean-squared-error

To discover the MSE, subtract your sample mean from each value in the dataset, square the resulting number, and divide that number by n − 1 (sample size minus 1).

And then add up all of these numbers to go your total sample variance (south ²). For larger sample sets, information technology'due south easiest to exercise this in Excel.

Notice the standard divergence.

The standard departure of your estimate (s) is equal to the square root of the sample variance/sample fault (s ²):

Formula to calculate standard deviation

Example: Standard divergence

In the tv-watching survey, the variance in the GB gauge is 100, while the variance in the USA estimate is 25. Taking the foursquare root of the variance gives the states a sample standard deviation (due south) of:

10 for the GB judge.
5 for the USA gauge.

Sample size

The sample size is the number of observations in your information set up.

Example: Sample size

In our survey of Americans and Brits, the sample size is 100 for each group.

What is your plagiarism score?

Compare your paper with over 60 billion web pages and 30 million publications.

Best plagiarism checker of 2021
Plagiarism report & percentage
Largest plagiarism database

Scribbr Plagiarism Checker

Confidence interval for the mean of normally-distributed information

Normally-distributed data forms a bong shape when plotted on a graph, with the sample mean in the centre and the rest of the data distributed fairly evenly on either side of the mean.

The confidence interval for data which follows a standard normal distribution is:

Formula for calculating a confidence interval for data with a standard normal distribution

Where:

CI = the confidence interval
X̄ = the population hateful
Z* = the critical value of the z-distribution
σ = the population standard deviation
√n = the foursquare root of the population size

The confidence interval for the t-distribution follows the same formula, but replaces the Z* with the t*.

In real life, you never know the true values for the population (unless you can do a complete demography). Instead, we supervene upon the population values with the values from our sample data, then the formula becomes:

Formula for calculating the confidence interval of a sample

Where:

ˆx = the sample mean
s = the sample standard deviation

Example of how to calculate the confidence interval (US data) — Case: Computing the confidence interval

Example of how to calculate the confidence interval (UK data) — Case: Computing the confidence interval

Confidence interval for proportions

The conviction interval for a proportion follows the same pattern as the conviction interval for means, but identify of the standard departure y'all utilise the sample proportion times one minus the proportion:

Formula to calculate the confidence interval for a proportion

Where:

ˆp = the proportion in your sample (e.g. the proportion of respondents who said they watched whatever tv set at all)
Z*= the critical value of the z-distribution
n = the sample size

Confidence interval for non-normally distributed data

To calculate a conviction interval effectually the mean of data that is not unremarkably distributed, you lot accept 2 choices:

You can notice a distribution that matches the shape of your information and use that distribution to calculate the conviction interval.
You tin can perform a transformation on your information to brand it fit a normal distribution, and and then find the confidence interval for the transformed data.

Performing data transformations is very common in statistics, for example, when data follows a logarithmic curve simply we want to employ information technology alongside linear data. You merely have to remember to do the reverse transformation on your data when you calculate the upper and lower bounds of the confidence interval.

Reporting confidence intervals

Confidence intervals are sometimes reported in papers, though researchers more often written report the standard divergence of their estimate.

If you are asked to written report the confidence interval, y'all should include the upper and lower bounds of the confidence interval.

Instance: Reporting a confidence interval

"Nosotros found that both the US and Slap-up Britain averaged 35 hours of television watched per week, although in that location was more variation in the estimate for Corking Great britain (95% CI = 33.04, 36.96) than for the United states (95% CI = 34.02, 35.98)."

One place that confidence intervals are frequently used is in graphs. When showing the differences between groups, or plotting a linear regression, researchers volition often include the confidence interval to give a visual representation of the variation around the estimate.

The mean and 95% confidence interval around the mean for the average hours of television watched. — Instance: Confidence interval in a graph

Caution when using conviction intervals

Confidence intervals are sometimes interpreted as saying that the 'true value' of your estimate lies within the premises of the conviction interval.

This is not the instance. The confidence interval cannot tell y'all how likely it is that you lot found the truthful value of your statistical estimate because it is based on a sample, non on the whole population.

The confidence interval only tells you what range of values yous tin expect to find if you re-practise your sampling or run your experiment again in the exact same way.

The more than authentic your sampling plan, or the more realistic your experiment, the greater the take chances that your conviction interval includes the true value of your estimate. But this accuracy is determined by your research methods, non past the statistics you do subsequently you take collected the data!

Often asked questions about conviction intervals

What is the departure between a confidence interval and a confidence level?: The confidence level is the percent of times you expect to get close to the same estimate if yous run your experiment again or resample the population in the same way.

The conviction interval consists of the upper and lower premises of the guess you expect to find at a given level of conviction.

For example, if yous are estimating a 95% confidence interval around the hateful proportion of female person babies born every twelvemonth based on a random sample of babies, you might find an upper spring of 0.56 and a lower bound of 0.48. These are the upper and lower bounds of the conviction interval. The conviction level is 95%.

This means that 95% of the calculated conviction intervals (for this sample) contains the true hateful of the population.
What are z-scores and t-scores?: The z-score and t-score (aka z-value and t-value) testify how many standard deviations abroad from the mean of the distribution you are, bold your information follow a z-distribution or a t-distribution.

These scores are used in statistical tests to testify how far from the mean of the predicted distribution your statistical estimate is. If your examination produces a z-score of 2.5, this means that your estimate is ii.5 standard deviations from the predicted hateful.

The predicted mean and distribution of your estimate are generated by the nothing hypothesis of the statistical test you are using. The more standard deviations away from the predicted hateful your estimate is, the less likely information technology is that the judge could have occurred under the nil hypothesis.
What is a disquisitional value?: A critical value is the value of the exam statistic which defines the upper and lower bounds of a confidence interval, or which defines the threshold of statistical significance in a statistical test. Information technology describes how far from the hateful of the distribution you have to get to cover a sure amount of the total variation in the information (i.e. 90%, 95%, 99%).

If you are constructing a 95% confidence interval and are using a threshold of statistical significance of p = 0.05, then your critical value volition be identical in both cases.
What does it mean if my confidence interval includes zero?: If your confidence interval for a difference between groups includes zero, that means that if you run your experiment again yous have a skilful chance of finding no difference between groups.

If your confidence interval for a correlation or regression includes zero, that means that if y'all run your experiment again there is a expert chance of finding no correlation in your data.

In both of these cases, yous volition also find a loftier p-value when you run your statistical test, meaning that your results could have occurred nether the zilch hypothesis of no human relationship betwixt variables or no difference between groups.

Is this commodity helpful?

You lot have already voted. Thanks :-) Your vote is saved :-) Processing your vote...