Hypothesis Testing with Z-Test: Significance Level and Rejection Region
Join over 2 million students who advanced their careers with 365 Data Science. Learn from instructors who have worked at Meta, Spotify, Google, IKEA, Netflix, and Coca-Cola and master Python, SQL, Excel, machine learning, data analysis, AI fundamentals, and more.
If you want to understand why hypothesis testing works, you should first have an idea about the significance level and the reject region . We assume you already know what a hypothesis is , so let’s jump right into the action.
What Is the Significance Level?
First, we must define the term significance level .
Normally, we aim to reject the null if it is false.
However, as with any test, there is a small chance that we could get it wrong and reject a null hypothesis that is true.
How Is the Significance Level Denoted?
The significance level is denoted by α and is the probability of rejecting the null hypothesis , if it is true.
So, the probability of making this error.
Typical values for α are 0.01, 0.05 and 0.1. It is a value that we select based on the certainty we need. In most cases, the choice of α is determined by the context we are operating in, but 0.05 is the most commonly used value.
A Case in Point
Say, we need to test if a machine is working properly. We would expect the test to make little or no mistakes. As we want to be very precise, we should pick a low significance level such as 0.01.
The famous Coca Cola glass bottle is 12 ounces. If the machine pours 12.1 ounces, some of the liquid would be spilled, and the label would be damaged as well. So, in certain situations, we need to be as accurate as possible.
Higher Degree of Error
However, if we are analyzing humans or companies, we would expect more random or at least uncertain behavior. Hence, a higher degree of error.
For instance, if we want to predict how much Coca Cola its consumers drink on average, the difference between 12 ounces and 12.1 ounces will not be that crucial. So, we can choose a higher significance level like 0.05 or 0.1.
Hypothesis Testing: Performing a Z-Test
Now that we have an idea about the significance level , let’s get to the mechanics of hypothesis testing.
Imagine you are consulting a university and want to carry out an analysis on how students are performing on average.
The university dean believes that on average students have a GPA of 70%. Being the data-driven researcher that you are, you can’t simply agree with his opinion, so you start testing.
The null hypothesis is: The population mean grade is 70%.
This is a hypothesized value.
The alternative hypothesis is: The population mean grade is not 70%. You can see how both of them are denoted, below.
Visualizing the Grades
Assuming that the population of grades is normally distributed, all grades received by students should look in the following way.
That is the true population mean .
Performing a Z-test
Now, a test we would normally perform is the Z-test . The formula is:
Z equals the sample mean , minus the hypothesized mean , divided by the standard error .
The idea is the following.
We are standardizing or scaling the sample mean we got. (You can quickly obtain it with our Mean, Median, Mode calculator .) If the sample mean is close enough to the hypothesized mean , then Z will be close to 0. Otherwise, it will be far away from it. Naturally, if the sample mean is exactly equal to the hypothesized mean , Z will be 0.
In all these cases, we would accept the null hypothesis .
What Is the Rejection Region?
The question here is the following:
How big should Z be for us to reject the null hypothesis ?
Well, there is a cut-off line. Since we are conducting a two-sided or a two-tailed test, there are two cut-off lines, one on each side.
When we calculate Z , we will get a value. If this value falls into the middle part, then we cannot reject the null. If it falls outside, in the shaded region, then we reject the null hypothesis .
That is why the shaded part is called: rejection region , as you can see below.
What Does the Rejection Region Depend on?
The area that is cut-off actually depends on the significance level .
Say the level of significance , α , is 0.05. Then we have α divided by 2, or 0.025 on the left side and 0.025 on the right side.
Now these are values we can check from the z-table . When α is 0.025, Z is 1.96. So, 1.96 on the right side and minus 1.96 on the left side.
Therefore, if the value we get for Z from the test is lower than minus 1.96, or higher than 1.96, we will reject the null hypothesis . Otherwise, we will accept it.
That’s more or less how hypothesis testing works.
We scale the sample mean with respect to the hypothesized value. If Z is close to 0, then we cannot reject the null. If it is far away from 0, then we reject the null hypothesis .
Example of One Tailed Test
What about one-sided tests? We have those too!
Let’s consider the following situation.
Paul says data scientists earn more than $125,000. So, H 0 is: μ 0 is bigger than $125,000.
The alternative is that μ 0 is lower or equal to 125,000.
Using the same significance level , this time, the whole rejection region is on the left. So, the rejection region has an area of α . Looking at the z-table, that corresponds to a Z -score of 1.645. Since it is on the left, it is with a minus sign.
Accept or Reject
Now, when calculating our test statistic Z , if we get a value lower than -1.645, we would reject the null hypothesis . We do that because we have statistical evidence that the data scientist salary is less than $125,000. Otherwise, we would accept it.
Another One-Tailed Test
To exhaust all possibilities, let’s explore another one-tailed test.
Say the university dean told you that the average GPA students get is lower than 70%. In that case, the null hypothesis is:
μ 0 is lower than 70%.
While the alternative is:
μ 0` is bigger or equal to 70%.
In this situation, the rejection region is on the right side. So, if the test statistic is bigger than the cut-off z-score, we would reject the null, otherwise, we wouldn’t.
Importance of the Significance Level and the Rejection Region
To sum up, the significance level and the reject region are quite crucial in the process of hypothesis testing. The level of significance conducts the accuracy of prediction. We (the researchers) choose it depending on how big of a difference a possible error could make. On the other hand, the reject region helps us decide whether or not to reject the null hypothesis . After reading this and putting both of them into use, you will realize how convenient they make your work.
Interested in taking your skills from good to great? Try statistics course for free !
Next Tutorial: Providing a Few Linear Regression Examples
World-Class
Data Science
Learn with instructors from:
Iliya Valchanov
Co-founder of 365 Data Science
Iliya is a finance graduate with a strong quantitative background who chose the exciting path of a startup entrepreneur. He demonstrated a formidable affinity for numbers during his childhood, winning more than 90 national and international awards and competitions through the years. Iliya started teaching at university, helping other students learn statistics and econometrics. Inspired by his first happy students, he co-founded 365 Data Science to continue spreading knowledge. He authored several of the program’s online courses in mathematics, statistics, machine learning, and deep learning.
We Think you'll also like
Statistics Tutorials
Hypothesis Testing: Null Hypothesis and Alternative Hypothesis
False Positive vs. False Negative: Type I and Type II Errors in Statistical Hypothesis Testing
Calculating and Using Covariance and Linear Correlation Coefficient
Examples of Numerical and Categorical Variables
Z-Test for Statistical Hypothesis Testing Explained
The Z-test is a statistical hypothesis test that determines where the distribution of the statistic we are measuring, like the mean, is part of the normal distribution.
The Z-test is a statistical hypothesis test used to determine where the distribution of the test statistic we are measuring, like the mean , is part of the normal distribution .
There are multiple types of Z-tests, however, we’ll focus on the easiest and most well known one, the one sample mean test. This is used to determine if the difference between the mean of a sample and the mean of a population is statistically significant.
What Is a Z-Test?
A Z-test is a type of statistical hypothesis test where the test-statistic follows a normal distribution.
The name Z-test comes from the Z-score of the normal distribution. This is a measure of how many standard deviations away a raw score or sample statistics is from the populations’ mean.
Z-tests are the most common statistical tests conducted in fields such as healthcare and data science . Therefore, it’s an essential concept to understand.
Requirements for a Z-Test
In order to conduct a Z-test, your statistics need to meet a few requirements, including:
- A Sample size that’s greater than 30. This is because we want to ensure our sample mean comes from a distribution that is normal. As stated by the c entral limit theorem , any distribution can be approximated as normally distributed if it contains more than 30 data points.
- The standard deviation and mean of the population is known .
- The sample data is collected/acquired randomly .
More on Data Science: What Is Bootstrapping Statistics?
Z-Test Steps
There are four steps to complete a Z-test. Let’s examine each one.
4 Steps to a Z-Test
- State the null hypothesis.
- State the alternate hypothesis.
- Choose your critical value.
- Calculate your Z-test statistics.
1. State the Null Hypothesis
The first step in a Z-test is to state the null hypothesis, H_0 . This what you believe to be true from the population, which could be the mean of the population, μ_0 :
2. State the Alternate Hypothesis
Next, state the alternate hypothesis, H_1 . This is what you observe from your sample. If the sample mean is different from the population’s mean, then we say the mean is not equal to μ_0:
3. Choose Your Critical Value
Then, choose your critical value, α , which determines whether you accept or reject the null hypothesis. Typically for a Z-test we would use a statistical significance of 5 percent which is z = +/- 1.96 standard deviations from the population’s mean in the normal distribution:
This critical value is based on confidence intervals.
4. Calculate Your Z-Test Statistic
Compute the Z-test Statistic using the sample mean, μ_1 , the population mean, μ_0 , the number of data points in the sample, n and the population’s standard deviation, σ :
If the test statistic is greater (or lower depending on the test we are conducting) than the critical value, then the alternate hypothesis is true because the sample’s mean is statistically significant enough from the population mean.
Another way to think about this is if the sample mean is so far away from the population mean, the alternate hypothesis has to be true or the sample is a complete anomaly.
More on Data Science: Basic Probability Theory and Statistics Terms to Know
Z-Test Example
Let’s go through an example to fully understand the one-sample mean Z-test.
A school says that its pupils are, on average, smarter than other schools. It takes a sample of 50 students whose average IQ measures to be 110. The population, or the rest of the schools, has an average IQ of 100 and standard deviation of 20. Is the school’s claim correct?
The null and alternate hypotheses are:
Where we are saying that our sample, the school, has a higher mean IQ than the population mean.
Now, this is what’s called a right-sided, one-tailed test as our sample mean is greater than the population’s mean. So, choosing a critical value of 5 percent, which equals a Z-score of 1.96 , we can only reject the null hypothesis if our Z-test statistic is greater than 1.96.
If the school claimed its students’ IQs were an average of 90, then we would use a left-tailed test, as shown in the figure above. We would then only reject the null hypothesis if our Z-test statistic is less than -1.96.
Computing our Z-test statistic, we see:
Therefore, we have sufficient evidence to reject the null hypothesis, and the school’s claim is right.
Hope you enjoyed this article on Z-tests. In this post, we only addressed the most simple case, the one-sample mean test. However, there are other types of tests, but they all follow the same process just with some small nuances.
Recent Data Science Articles
10 Chapter 10: Hypothesis Testing with Z
Setting up the hypotheses.
When setting up the hypotheses with z, the parameter is associated with a sample mean (in the previous chapter examples the parameters for the null used 0). Using z is an occasion in which the null hypothesis is a value other than 0. For example, if we are working with mothers in the U.S. whose children are at risk of low birth weight, we can use 7.47 pounds, the average birth weight in the US, as our null value and test for differences against that. For now, we will focus on testing a value of a single mean against what we expect from the population.
Using birthweight as an example, our null hypothesis takes the form: H 0 : μ = 7.47 Notice that we are testing the value for μ, the population parameter, NOT the sample statistic ̅X (or M). We are referring to the data right now in raw form (we have not standardized it using z yet). Again, using inferential statistics, we are interested in understanding the population, drawing from our sample observations. For the research question, we have a mean value from the sample to use, we have specific data is – it is observed and used as a comparison for a set point.
As mentioned earlier, the alternative hypothesis is simply the reverse of the null hypothesis, and there are three options, depending on where we expect the difference to lie. We will set the criteria for rejecting the null hypothesis based on the directionality (greater than, less than, or not equal to) of the alternative.
If we expect our obtained sample mean to be above or below the null hypothesis value (knowing which direction), we set a directional hypothesis. O ur alternative hypothesis takes the form based on the research question itself. In our example with birthweight, this could be presented as H A : μ > 7.47 or H A : μ < 7.47.
Note that we should only use a directional hypothesis if we have a good reason, based on prior observations or research, to suspect a particular direction. When we do not know the direction, such as when we are entering a new area of research, we use a non-directional alternative hypothesis. In our birthweight example, this could be set as H A : μ ≠ 7.47
In working with data for this course we will need to set a critical value of the test statistic for alpha (α) for use of test statistic tables in the back of the book. This is determining the critical rejection region that has a set critical value based on α.
Determining Critical Value from α
We set alpha (α) before collecting data in order to determine whether or not we should reject the null hypothesis. We set this value beforehand to avoid biasing ourselves by viewing our results and then determining what criteria we should use.
When a research hypothesis predicts an effect but does not predict a direction for the effect, it is called a non-directional hypothesis . To test the significance of a non-directional hypothesis, we have to consider the possibility that the sample could be extreme at either tail of the comparison distribution. We call this a two-tailed test .
Figure 1. showing a 2-tail test for non-directional hypothesis for z for area C is the critical rejection region.
When a research hypothesis predicts a direction for the effect, it is called a directional hypothesis . To test the significance of a directional hypothesis, we have to consider the possibility that the sample could be extreme at one-tail of the comparison distribution. We call this a one-tailed test .
Figure 2. showing a 1-tail test for a directional hypothesis (predicting an increase) for z for area C is the critical rejection region.
Determining Cutoff Scores with Two-Tailed Tests
Typically we specify an α level before analyzing the data. If the data analysis results in a probability value below the α level, then the null hypothesis is rejected; if it is not, then the null hypothesis is not rejected. In other words, if our data produce values that meet or exceed this threshold, then we have sufficient evidence to reject the null hypothesis ; if not, we fail to reject the null (we never “accept” the null). According to this perspective, if a result is significant, then it does not matter how significant it is. Moreover, if it is not significant, then it does not matter how close to being significant it is. Therefore, if the 0.05 level is being used, then probability values of 0.049 and 0.001 are treated identically. Similarly, probability values of 0.06 and 0.34 are treated identically. Note we will discuss ways to address effect size (which is related to this challenge of NHST).
When setting the probability value, there is a special complication in a two-tailed test. We have to divide the significance percentage between the two tails. For example, with a 5% significance level, we reject the null hypothesis only if the sample is so extreme that it is in either the top 2.5% or the bottom 2.5% of the comparison distribution. This keeps the overall level of significance at a total of 5%. A one-tailed test does have such an extreme value but with a one-tailed test only one side of the distribution is considered.
Figure 3. Critical value differences in one and two-tail tests. Photo Credit
Let’s re view th e set critical values for Z.
We discussed z-scores and probability in chapter 8. If we revisit the z-score for 5% and 1%, we can identify the critical regions for the critical rejection areas from the unit standard normal table.
- A two-tailed test at the 5% level has a critical boundary Z score of +1.96 and -1.96
- A one-tailed test at the 5% level has a critical boundary Z score of +1.64 or -1.64
- A two-tailed test at the 1% level has a critical boundary Z score of +2.58 and -2.58
- A one-tailed test at the 1% level has a critical boundary Z score of +2.33 or -2.33.
Review: Critical values, p-values, and significance level
There are two criteria we use to assess whether our data meet the thresholds established by our chosen significance level, and they both have to do with our discussions of probability and distributions. Recall that probability refers to the likelihood of an event, given some situation or set of conditions. In hypothesis testing, that situation is the assumption that the null hypothesis value is the correct value, or that there is no effec t. The value laid out in H 0 is our condition under which we interpret our results. To reject this assumption, and thereby reject the null hypothesis, we need results that would be very unlikely if the null was true.
Now recall that values of z which fall in the tails of the standard normal distribution represent unlikely values. That is, the proportion of the area under the curve as or more extreme than z is very small as we get into the tails of the distribution. Our significance level corresponds to the area under the tail that is exactly equal to α: if we use our normal criterion of α = .05, then 5% of the area under the curve becomes what we call the rejection region (also called the critical region) of the distribution. This is illustrated in Figure 4.
Figure 4: The rejection region for a one-tailed test
The shaded rejection region takes us 5% of the area under the curve. Any result which falls in that region is sufficient evidence to reject the null hypothesis.
The rejection region is bounded by a specific z-value, as is any area under the curve. In hypothesis testing, the value corresponding to a specific rejection region is called the critical value, z crit (“z-crit”) or z* (hence the other name “critical region”). Finding the critical value works exactly the same as finding the z-score corresponding to any area under the curve like we did in Unit 1. If we go to the normal table, we will find that the z-score corresponding to 5% of the area under the curve is equal to 1.645 (z = 1.64 corresponds to 0.0405 and z = 1.65 corresponds to 0.0495, so .05 is exactly in between them) if we go to the right and -1.645 if we go to the left. The direction must be determined by your alternative hypothesis, and drawing then shading the distribution is helpful for keeping directionality straight.
Suppose, however, that we want to do a non-directional test. We need to put the critical region in both tails, but we don’t want to increase the overall size of the rejection region (for reasons we will see later). To do this, we simply split it in half so that an equal proportion of the area under the curve falls in each tail’s rejection region. For α = .05, this means 2.5% of the area is in each tail, which, based on the z-table, corresponds to critical values of z* = ±1.96. This is shown in Figure 5.
Figure 5: Two-tailed rejection region
Thus, any z-score falling outside ±1.96 (greater than 1.96 in absolute value) falls in the rejection region. When we use z-scores in this way, the obtained value of z (sometimes called z-obtained) is something known as a test statistic, which is simply an inferential statistic used to test a null hypothesis.
Calculate the test statistic: Z
Now that we understand setting up the hypothesis and determining the outcome, let’s examine hypothesis testing with z! The next step is to carry out the study and get the actual results for our sample. Central to hypothesis test is comparison of the population and sample means. To make our calculation and determine where the sample is in the hypothesized distribution we calculate the Z for the sample data.
Make a decision
To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.
The formula for our z- statistic has not changed:
To formally test our hypothesis, we compare our obtained z-statistic to our critical z-value. If z obt > z crit , that means it falls in the rejection region (to see why, draw a line for z = 2.5 on Figure 1 or Figure 2) and so we reject H 0 . If z obt < z crit , we fail to reject. Remember that as z gets larger, the corresponding area under the curve beyond z gets smaller. Thus, the proportion, or p-value, will be smaller than the area for α, and if the area is smaller, the probability gets smaller. Specifically, the probability of obtaining that result, or a more extreme result, under the condition that the null hypothesis is true gets smaller.
Conversely, if we fail to reject, we know that the proportion will be larger than α because the z-statistic will not be as far into the tail. This is illustrated for a one- tailed test in Figure 6.
Figure 6. Relation between α, z obt , and p
When the null hypothesis is rejected, the effect is said to be statistically significant . Do not confuse statistical significance with practical significance. A small effect can be highly significant if the sample size is large enough.
Why does the word “significant” in the phrase “statistically significant” mean something so different from other uses of the word? Interestingly, this is because the meaning of “significant” in everyday language has changed. It turns out that when the procedures for hypothesis testing were developed, something was “significant” if it signified something. Thus, finding that an effect is statistically significant signifies that the effect is real and not due to chance. Over the years, the meaning of “significant” changed, leading to the potential misinterpretation.
Review: Steps of the Hypothesis Testing Process
The process of testing hypotheses follows a simple four-step procedure. This process will be what we use for the remained of the textbook and course, and though the hypothesis and statistics we use will change, this process will not.
Step 1: State the Hypotheses
Your hypotheses are the first thing you need to lay out. Otherwise, there is nothing to test! You have to state the null hypothesis (which is what we test) and the alternative hypothesis (which is what we expect). These should be stated mathematically as they were presented above AND in words, explaining in normal English what each one means in terms of the research question.
Step 2: Find the Critical Values
Next, we formally lay out the criteria we will use to test our hypotheses. There are two pieces of information that inform our critical values: α, which determines how much of the area under the curve composes our rejection region, and the directionality of the test, which determines where the region will be.
Step 3: Compute the Test Statistic
Once we have our hypotheses and the standards we use to test them, we can collect data and calculate our test statistic, in this case z . This step is where the vast majority of differences in future chapters will arise: different tests used for different data are calculated in different ways, but the way we use and interpret them remains the same.
Step 4: Make the Decision
Finally, once we have our obtained test statistic, we can compare it to our critical value and decide whether we should reject or fail to reject the null hypothesis. When we do this, we must interpret the decision in relation to our research question, stating what we concluded, what we based our conclusion on, and the specific statistics we obtained.
Example: Movie Popcorn
Let’s see how hypothesis testing works in action by working through an example. Say that a movie theater owner likes to keep a very close eye on how much popcorn goes into each bag sold, so he knows that the average bag has 8 cups of popcorn and that this varies a little bit, about half a cup. That is, the known population mean is μ = 8.00 and the known population standard deviation is σ =0.50. The owner wants to make sure that the newest employee is filling bags correctly, so over the course of a week he randomly assesses 25 bags filled by the employee to test for a difference (n = 25). He doesn’t want bags overfilled or under filled, so he looks for differences in both directions. This scenario has all of the information we need to begin our hypothesis testing procedure.
Our manager is looking for a difference in the mean cups of popcorn bags compared to the population mean of 8. We will need both a null and an alternative hypothesis written both mathematically and in words. We’ll always start with the null hypothesis:
H 0 : There is no difference in the cups of popcorn bags from this employee H 0 : μ = 8.00
Notice that we phrase the hypothesis in terms of the population parameter μ, which in this case would be the true average cups of bags filled by the new employee.
Our assumption of no difference, the null hypothesis, is that this mean is exactly
the same as the known population mean value we want it to match, 8.00. Now let’s do the alternative:
H A : There is a difference in the cups of popcorn bags from this employee H A : μ ≠ 8.00
In this case, we don’t know if the bags will be too full or not full enough, so we do a two-tailed alternative hypothesis that there is a difference.
Our critical values are based on two things: the directionality of the test and the level of significance. We decided in step 1 that a two-tailed test is the appropriate directionality. We were given no information about the level of significance, so we assume that α = 0.05 is what we will use. As stated earlier in the chapter, the critical values for a two-tailed z-test at α = 0.05 are z* = ±1.96. This will be the criteria we use to test our hypothesis. We can now draw out our distribution so we can visualize the rejection region and make sure it makes sense
Figure 7: Rejection region for z* = ±1.96
Step 3: Calculate the Test Statistic
Now we come to our formal calculations. Let’s say that the manager collects data and finds that the average cups of this employee’s popcorn bags is ̅X = 7.75 cups. We can now plug this value, along with the values presented in the original problem, into our equation for z:
So our test statistic is z = -2.50, which we can draw onto our rejection region distribution:
Figure 8: Test statistic location
Looking at Figure 5, we can see that our obtained z-statistic falls in the rejection region. We can also directly compare it to our critical value: in terms of absolute value, -2.50 > -1.96, so we reject the null hypothesis. We can now write our conclusion:
When we write our conclusion, we write out the words to communicate what it actually means, but we also include the average sample size we calculated (the exact location doesn’t matter, just somewhere that flows naturally and makes sense) and the z-statistic and p-value. We don’t know the exact p-value, but we do know that because we rejected the null, it must be less than α.
Effect Size
When we reject the null hypothesis, we are stating that the difference we found was statistically significant, but we have mentioned several times that this tells us nothing about practical significance. To get an idea of the actual size of what we found, we can compute a new statistic called an effect size. Effect sizes give us an idea of how large, important, or meaningful a statistically significant effect is.
For mean differences like we calculated here, our effect size is Cohen’s d :
Effect sizes are incredibly useful and provide important information and clarification that overcomes some of the weakness of hypothesis testing. Whenever you find a significant result, you should always calculate an effect size
d | Interpretation |
---|---|
0.0 – 0.2 | negligible |
0.2 – 0.5 | small |
0.5 – 0.8 | medium |
0.8 – | large |
Table 1. Interpretation of Cohen’s d
Example: Office Temperature
Let’s do another example to solidify our understanding. Let’s say that the office building you work in is supposed to be kept at 74 degree Fahrenheit but is allowed
to vary by 1 degree in either direction. You suspect that, as a cost saving measure, the temperature was secretly set higher. You set up a formal way to test your hypothesis.
You start by laying out the null hypothesis:
H 0 : There is no difference in the average building temperature H 0 : μ = 74
Next you state the alternative hypothesis. You have reason to suspect a specific direction of change, so you make a one-tailed test:
H A : The average building temperature is higher than claimed H A : μ > 74
Now that you have everything set up, you spend one week collecting temperature data:
Day | Temp |
Monday | 77 |
Tuesday | 76 |
Wednesday | 74 |
Thursday | 78 |
Friday | 78 |
You calculate the average of these scores to be 𝑋̅ = 76.6 degrees. You use this to calculate the test statistic, using μ = 74 (the supposed average temperature), σ = 1.00 (how much the temperature should vary), and n = 5 (how many data points you collected):
z = 76.60 − 74.00 = 2.60 = 5.78
1.00/√5 0.45
This value falls so far into the tail that it cannot even be plotted on the distribution!
Figure 7: Obtained z-statistic
You compare your obtained z-statistic, z = 5.77, to the critical value, z* = 1.645, and find that z > z*. Therefore you reject the null hypothesis, concluding: Based on 5 observations, the average temperature (𝑋̅ = 76.6 degrees) is statistically significantly higher than it is supposed to be, z = 5.77, p < .05.
d = (76.60-74.00)/ 1= 2.60
The effect size you calculate is definitely large, meaning someone has some explaining to do!
Example: Different Significance Level
First, let’s take a look at an example phrased in generic terms, rather than in the context of a specific research question, to see the individual pieces one more time. This time, however, we will use a stricter significance level, α = 0.01, to test the hypothesis.
We will use 60 as an arbitrary null hypothesis value: H 0 : The average score does not differ from the population H 0 : μ = 50
We will assume a two-tailed test: H A : The average score does differ H A : μ ≠ 50
We have seen the critical values for z-tests at α = 0.05 levels of significance several times. To find the values for α = 0.01, we will go to the standard normal table and find the z-score cutting of 0.005 (0.01 divided by 2 for a two-tailed test) of the area in the tail, which is z crit * = ±2.575. Notice that this cutoff is much higher than it was for α = 0.05. This is because we need much less of the area in the tail, so we need to go very far out to find the cutoff. As a result, this will require a much larger effect or much larger sample size in order to reject the null hypothesis.
We can now calculate our test statistic. The average of 10 scores is M = 60.40 with a µ = 60. We will use σ = 10 as our known population standard deviation. From this information, we calculate our z-statistic as:
Our obtained z-statistic, z = 0.13, is very small. It is much less than our critical value of 2.575. Thus, this time, we fail to reject the null hypothesis. Our conclusion would look something like:
Notice two things about the end of the conclusion. First, we wrote that p is greater than instead of p is less than, like we did in the previous two examples. This is because we failed to reject the null hypothesis. We don’t know exactly what the p- value is, but we know it must be larger than the α level we used to test our hypothesis. Second, we used 0.01 instead of the usual 0.05, because this time we tested at a different level. The number you compare to the p-value should always be the significance level you test at. Because we did not detect a statistically significant effect, we do not need to calculate an effect size. Note: some statisticians will suggest to always calculate effects size as a possibility of Type II error. Although insignificant, calculating d = (60.4-60)/10 = .04 which suggests no effect (and not a possibility of Type II error).
Review Considerations in Hypothesis Testing
Errors in hypothesis testing.
Keep in mind that rejecting the null hypothesis is not an all-or-nothing decision. The Type I error rate is affected by the α level: the lower the α level the lower the Type I error rate. It might seem that α is the probability of a Type I error. However, this is not correct. Instead, α is the probability of a Type I error given that the null hypothesis is true. If the null hypothesis is false, then it is impossible to make a Type I error. The second type of error that can be made in significance testing is failing to reject a false null hypothesis. This kind of error is called a Type II error.
Statistical Power
The statistical power of a research design is the probability of rejecting the null hypothesis given the sample size and expected relationship strength. Statistical power is the complement of the probability of committing a Type II error. Clearly, researchers should be interested in the power of their research designs if they want to avoid making Type II errors. In particular, they should make sure their research design has adequate power before collecting data. A common guideline is that a power of .80 is adequate. This means that there is an 80% chance of rejecting the null hypothesis for the expected relationship strength.
Given that statistical power depends primarily on relationship strength and sample size, there are essentially two steps you can take to increase statistical power: increase the strength of the relationship or increase the sample size. Increasing the strength of the relationship can sometimes be accomplished by using a stronger manipulation or by more carefully controlling extraneous variables to reduce the amount of noise in the data (e.g., by using a within-subjects design rather than a between-subjects design). The usual strategy, however, is to increase the sample size. For any expected relationship strength, there will always be some sample large enough to achieve adequate power.
Inferential statistics uses data from a sample of individuals to reach conclusions about the whole population. The degree to which our inferences are valid depends upon how we selected the sample (sampling technique) and the characteristics (parameters) of population data. Statistical analyses assume that sample(s) and population(s) meet certain conditions called statistical assumptions.
It is easy to check assumptions when using statistical software and it is important as a researcher to check for violations; if violations of statistical assumptions are not appropriately addressed then results may be interpreted incorrectly.
Learning Objectives
Having read the chapter, students should be able to:
- Conduct a hypothesis test using a z-score statistics, locating critical region, and make a statistical decision including.
- Explain the purpose of measuring effect size and power, and be able to compute Cohen’s d.
Exercises – Ch. 10
- List the main steps for hypothesis testing with the z-statistic. When and why do you calculate an effect size?
- z = 1.99, two-tailed test at α = 0.05
- z = 1.99, two-tailed test at α = 0.01
- z = 1.99, one-tailed test at α = 0.05
- You are part of a trivia team and have tracked your team’s performance since you started playing, so you know that your scores are normally distributed with μ = 78 and σ = 12. Recently, a new person joined the team, and you think the scores have gotten better. Use hypothesis testing to see if the average score has improved based on the following 8 weeks’ worth of score data: 82, 74, 62, 68, 79, 94, 90, 81, 80.
- A study examines self-esteem and depression in teenagers. A sample of 25 teens with a low self-esteem are given the Beck Depression Inventory. The average score for the group is 20.9. For the general population, the average score is 18.3 with σ = 12. Use a two-tail test with α = 0.05 to examine whether teenagers with low self-esteem show significant differences in depression.
- You get hired as a server at a local restaurant, and the manager tells you that servers’ tips are $42 on average but vary about $12 (μ = 42, σ = 12). You decide to track your tips to see if you make a different amount, but because this is your first job as a server, you don’t know if you will make more or less in tips. After working 16 shifts, you find that your average nightly amount is $44.50 from tips. Test for a difference between this value and the population mean at the α = 0.05 level of significance.
Answers to Odd- Numbered Exercises – Ch. 10
1. List hypotheses. Determine critical region. Calculate z. Compare z to critical region. Draw Conclusion. We calculate an effect size when we find a statistically significant result to see if our result is practically meaningful or important
5. Step 1: H 0 : μ = 42 “My average tips does not differ from other servers”, H A : μ ≠ 42 “My average tips do differ from others”
Introduction to Statistics for Psychology Copyright © 2021 by Alisa Beyer is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Share This Book
Z Test: Definition & Two Proportion Z-Test
What is a z test.
For example, if someone said they had found a new drug that cures cancer, you would want to be sure it was probably true. A hypothesis test will tell you if it’s probably true, or probably not true. A Z test, is used when your data is approximately normally distributed (i.e. the data has the shape of a bell curve when you graph it).
When you can run a Z Test.
Several different types of tests are used in statistics (i.e. f test , chi square test , t test ). You would use a Z test if:
- Your sample size is greater than 30 . Otherwise, use a t test .
- Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.
- Your data should be normally distributed . However, for large sample sizes (over 30) this doesn’t always matter.
- Your data should be randomly selected from a population, where each item has an equal chance of being selected.
- Sample sizes should be equal if at all possible.
How do I run a Z Test?
Running a Z test on your data requires five steps:
- State the null hypothesis and alternate hypothesis .
- Choose an alpha level .
- Find the critical value of z in a z table .
- Calculate the z test statistic (see below).
- Compare the test statistic to the critical z value and decide if you should support or reject the null hypothesis .
You could perform all these steps by hand. For example, you could find a critical value by hand , or calculate a z value by hand . For a step by step example, watch the following video: Watch the video for an example:
Can’t see the video? Click here to watch it on YouTube. You could also use technology, for example:
- Two sample z test in Excel .
- Find a critical z value on the TI 83 .
- Find a critical value on the TI 89 (left-tail) .
Two Proportion Z-Test
Watch the video to see a two proportion z-test:
Can’t see the video? Click here to watch it on YouTube.
A Two Proportion Z-Test (or Z-interval) allows you to calculate the true difference in proportions of two independent groups to a given confidence interval .
There are a few familiar conditions that need to be met for the Two Proportion Z-Interval to be valid.
- The groups must be independent. Subjects can be in one group or the other, but not both – like teens and adults.
- The data must be selected randomly and independently from a homogenous population. A survey is a common example.
- The population should be at least ten times bigger than the sample size. If the population is teenagers for example, there should be at least ten times as many total teenagers as the number of teenagers being surveyed.
- The null hypothesis (H 0 ) for the test is that the proportions are the same.
- The alternate hypothesis (H 1 ) is that the proportions are not the same.
Example question: let’s say you’re testing two flu drugs A and B. Drug A works on 41 people out of a sample of 195. Drug B works on 351 people in a sample of 605. Are the two drugs comparable? Use a 5% alpha level .
Step 1: Find the two proportions:
- P 1 = 41/195 = 0.21 (that’s 21%)
- P 2 = 351/605 = 0.58 (that’s 58%).
Set these numbers aside for a moment.
Step 2: Find the overall sample proportion . The numerator will be the total number of “positive” results for the two samples and the denominator is the total number of people in the two samples.
- p = (41 + 351) / (195 + 605) = 0.49.
Set this number aside for a moment.
Solving the formula, we get: Z = 8.99
We need to find out if the z-score falls into the “ rejection region .”
Step 5: Compare the calculated z-score from Step 3 with the table z-score from Step 4. If the calculated z-score is larger, you can reject the null hypothesis.
8.99 > 1.96, so we can reject the null hypothesis .
Example 2: Suppose that in a survey of 700 women and 700 men, 35% of women and 30% of men indicated that they support a particular presidential candidate. Let’s say we wanted to find the true difference in proportions of these two groups to a 95% confidence interval .
At first glance the survey indicates that women support the candidate more than men by about 5% . However, for this statistical inference to be valid we need to construct a range of values to a given confidence interval.
To do this, we use the formula for Two Proportion Z-Interval:
Plugging in values we find the true difference in proportions to be
Based on the results of the survey, we are 95% confident that the difference in proportions of women and men that support the presidential candidate is between about 0 % and 10% .
Check out our YouTube channel for more stats help and tips!
- Skip to secondary menu
- Skip to main content
- Skip to primary sidebar
Statistics By Jim
Making statistics intuitive
Null Hypothesis: Definition, Rejecting & Examples
By Jim Frost 6 Comments
What is a Null Hypothesis?
The null hypothesis in statistics states that there is no difference between groups or no relationship between variables. It is one of two mutually exclusive hypotheses about a population in a hypothesis test.
- Null Hypothesis H 0 : No effect exists in the population.
- Alternative Hypothesis H A : The effect exists in the population.
In every study or experiment, researchers assess an effect or relationship. This effect can be the effectiveness of a new drug, building material, or other intervention that has benefits. There is a benefit or connection that the researchers hope to identify. Unfortunately, no effect may exist. In statistics, we call this lack of an effect the null hypothesis. Researchers assume that this notion of no effect is correct until they have enough evidence to suggest otherwise, similar to how a trial presumes innocence.
In this context, the analysts don’t necessarily believe the null hypothesis is correct. In fact, they typically want to reject it because that leads to more exciting finds about an effect or relationship. The new vaccine works!
You can think of it as the default theory that requires sufficiently strong evidence to reject. Like a prosecutor, researchers must collect sufficient evidence to overturn the presumption of no effect. Investigators must work hard to set up a study and a data collection system to obtain evidence that can reject the null hypothesis.
Related post : What is an Effect in Statistics?
Null Hypothesis Examples
Null hypotheses start as research questions that the investigator rephrases as a statement indicating there is no effect or relationship.
Does the vaccine prevent infections? | The vaccine does not affect the infection rate. |
Does the new additive increase product strength? | The additive does not affect mean product strength. |
Does the exercise intervention increase bone mineral density? | The intervention does not affect bone mineral density. |
As screen time increases, does test performance decrease? | There is no relationship between screen time and test performance. |
After reading these examples, you might think they’re a bit boring and pointless. However, the key is to remember that the null hypothesis defines the condition that the researchers need to discredit before suggesting an effect exists.
Let’s see how you reject the null hypothesis and get to those more exciting findings!
When to Reject the Null Hypothesis
So, you want to reject the null hypothesis, but how and when can you do that? To start, you’ll need to perform a statistical test on your data. The following is an overview of performing a study that uses a hypothesis test.
The first step is to devise a research question and the appropriate null hypothesis. After that, the investigators need to formulate an experimental design and data collection procedures that will allow them to gather data that can answer the research question. Then they collect the data. For more information about designing a scientific study that uses statistics, read my post 5 Steps for Conducting Studies with Statistics .
After data collection is complete, statistics and hypothesis testing enter the picture. Hypothesis testing takes your sample data and evaluates how consistent they are with the null hypothesis. The p-value is a crucial part of the statistical results because it quantifies how strongly the sample data contradict the null hypothesis.
When the sample data provide sufficient evidence, you can reject the null hypothesis. In a hypothesis test, this process involves comparing the p-value to your significance level .
Rejecting the Null Hypothesis
Reject the null hypothesis when the p-value is less than or equal to your significance level. Your sample data favor the alternative hypothesis, which suggests that the effect exists in the population. For a mnemonic device, remember—when the p-value is low, the null must go!
When you can reject the null hypothesis, your results are statistically significant. Learn more about Statistical Significance: Definition & Meaning .
Failing to Reject the Null Hypothesis
Conversely, when the p-value is greater than your significance level, you fail to reject the null hypothesis. The sample data provides insufficient data to conclude that the effect exists in the population. When the p-value is high, the null must fly!
Note that failing to reject the null is not the same as proving it. For more information about the difference, read my post about Failing to Reject the Null .
That’s a very general look at the process. But I hope you can see how the path to more exciting findings depends on being able to rule out the less exciting null hypothesis that states there’s nothing to see here!
Let’s move on to learning how to write the null hypothesis for different types of effects, relationships, and tests.
Related posts : How Hypothesis Tests Work and Interpreting P-values
How to Write a Null Hypothesis
The null hypothesis varies by the type of statistic and hypothesis test. Remember that inferential statistics use samples to draw conclusions about populations. Consequently, when you write a null hypothesis, it must make a claim about the relevant population parameter . Further, that claim usually indicates that the effect does not exist in the population. Below are typical examples of writing a null hypothesis for various parameters and hypothesis tests.
Related posts : Descriptive vs. Inferential Statistics and Populations, Parameters, and Samples in Inferential Statistics
Group Means
T-tests and ANOVA assess the differences between group means. For these tests, the null hypothesis states that there is no difference between group means in the population. In other words, the experimental conditions that define the groups do not affect the mean outcome. Mu (µ) is the population parameter for the mean, and you’ll need to include it in the statement for this type of study.
For example, an experiment compares the mean bone density changes for a new osteoporosis medication. The control group does not receive the medicine, while the treatment group does. The null states that the mean bone density changes for the control and treatment groups are equal.
- Null Hypothesis H 0 : Group means are equal in the population: µ 1 = µ 2 , or µ 1 – µ 2 = 0
- Alternative Hypothesis H A : Group means are not equal in the population: µ 1 ≠ µ 2 , or µ 1 – µ 2 ≠ 0.
Group Proportions
Proportions tests assess the differences between group proportions. For these tests, the null hypothesis states that there is no difference between group proportions. Again, the experimental conditions did not affect the proportion of events in the groups. P is the population proportion parameter that you’ll need to include.
For example, a vaccine experiment compares the infection rate in the treatment group to the control group. The treatment group receives the vaccine, while the control group does not. The null states that the infection rates for the control and treatment groups are equal.
- Null Hypothesis H 0 : Group proportions are equal in the population: p 1 = p 2 .
- Alternative Hypothesis H A : Group proportions are not equal in the population: p 1 ≠ p 2 .
Correlation and Regression Coefficients
Some studies assess the relationship between two continuous variables rather than differences between groups.
In these studies, analysts often use either correlation or regression analysis . For these tests, the null states that there is no relationship between the variables. Specifically, it says that the correlation or regression coefficient is zero. As one variable increases, there is no tendency for the other variable to increase or decrease. Rho (ρ) is the population correlation parameter and beta (β) is the regression coefficient parameter.
For example, a study assesses the relationship between screen time and test performance. The null states that there is no correlation between this pair of variables. As screen time increases, test performance does not tend to increase or decrease.
- Null Hypothesis H 0 : The correlation in the population is zero: ρ = 0.
- Alternative Hypothesis H A : The correlation in the population is not zero: ρ ≠ 0.
For all these cases, the analysts define the hypotheses before the study. After collecting the data, they perform a hypothesis test to determine whether they can reject the null hypothesis.
The preceding examples are all for two-tailed hypothesis tests. To learn about one-tailed tests and how to write a null hypothesis for them, read my post One-Tailed vs. Two-Tailed Tests .
Related post : Understanding Correlation
Neyman, J; Pearson, E. S. (January 1, 1933). On the Problem of the most Efficient Tests of Statistical Hypotheses . Philosophical Transactions of the Royal Society A . 231 (694–706): 289–337.
Share this:
Reader Interactions
January 11, 2024 at 2:57 pm
Thanks for the reply.
January 10, 2024 at 1:23 pm
Hi Jim, In your comment you state that equivalence test null and alternate hypotheses are reversed. For hypothesis tests of data fits to a probability distribution, the null hypothesis is that the probability distribution fits the data. Is this correct?
January 10, 2024 at 2:15 pm
Those two separate things, equivalence testing and normality tests. But, yes, you’re correct for both.
Hypotheses are switched for equivalence testing. You need to “work” (i.e., collect a large sample of good quality data) to be able to reject the null that the groups are different to be able to conclude they’re the same.
With typical hypothesis tests, if you have low quality data and a low sample size, you’ll fail to reject the null that they’re the same, concluding they’re equivalent. But that’s more a statement about the low quality and small sample size than anything to do with the groups being equal.
So, equivalence testing make you work to obtain a finding that the groups are the same (at least within some amount you define as a trivial difference).
For normality testing, and other distribution tests, the null states that the data follow the distribution (normal or whatever). If you reject the null, you have sufficient evidence to conclude that your sample data don’t follow the probability distribution. That’s a rare case where you hope to fail to reject the null. And it suffers from the problem I describe above where you might fail to reject the null simply because you have a small sample size. In that case, you’d conclude the data follow the probability distribution but it’s more that you don’t have enough data for the test to register the deviation. In this scenario, if you had a larger sample size, you’d reject the null and conclude it doesn’t follow that distribution.
I don’t know of any equivalence testing type approach for distribution fit tests where you’d need to work to show the data follow a distribution, although I haven’t looked for one either!
February 20, 2022 at 9:26 pm
Is a null hypothesis regularly (always) stated in the negative? “there is no” or “does not”
February 23, 2022 at 9:21 pm
Typically, the null hypothesis includes an equal sign. The null hypothesis states that the population parameter equals a particular value. That value is usually one that represents no effect. In the case of a one-sided hypothesis test, the null still contains an equal sign but it’s “greater than or equal to” or “less than or equal to.” If you wanted to translate the null hypothesis from its native mathematical expression, you could use the expression “there is no effect.” But the mathematical form more specifically states what it’s testing.
It’s the alternative hypothesis that typically contains does not equal.
There are some exceptions. For example, in an equivalence test where the researchers want to show that two things are equal, the null hypothesis states that they’re not equal.
In short, the null hypothesis states the condition that the researchers hope to reject. They need to work hard to set up an experiment and data collection that’ll gather enough evidence to be able to reject the null condition.
February 15, 2022 at 9:32 am
Dear sir I always read your notes on Research methods.. Kindly tell is there any available Book on all these..wonderfull Urgent
IMAGES
VIDEO
COMMENTS
One-Sample Z Test Hypotheses. Null hypothesis (H 0): The population mean equals a hypothesized value (µ = µ 0). Alternative hypothesis (H A): The population mean DOES NOT equal a hypothesized value (µ ≠ µ 0). When the p-value is less or equal to your significance level (e.g., 0.05), reject the null hypothesis. The difference between your ...
This tutorial explains when you should reject the null hypothesis in hypothesis testing, including an example.
How big should Z be for us to reject the null hypothesis? Well, there is a cut-off line. Since we are conducting a two-sided or a two-tailed test, there are two cut-off lines, one on each side.
If the p-value that corresponds to the z test statistic is less than your chosen significance level (common choices are 0.10, 0.05, and 0.01) then you can reject the null hypothesis. One Sample Z-Test: Assumptions. For the results of a one sample z-test to be valid, the following assumptions should be met: The data are continuous (not discrete ...
1. State the Null Hypothesis. The first step in a Z-test is to state the null hypothesis, H_0. This what you believe to be true from the population, which could be the mean of the population, μ_0: Null hypothesis equation generated in LaTeX. | Image: Egor Howell. 2. State the Alternate Hypothesis.
To decide whether to reject the null hypothesis, we compare our sample’s Z score to the Z score that marks our critical boundary. If our sample Z score falls inside the rejection region of the comparison distribution (is greater than the z-score critical boundary) we reject the null hypothesis.
The z-test is a hypothesis test to determine if a single observed mean is signi cantly di erent (or greater or less than) the mean under the null hypothesis, hyp when you know the standard deviation of the population. Here's where the z-test sits on our ow chart. START HERE. Test for = 0 Ch 17.2. z-test Ch 13.1. number of correlations. 2.
Choose an alpha level. Find the critical value of z in a z table. Calculate the z test statistic (see below). Compare the test statistic to the critical z value and decide if you should support or reject the null hypothesis. You could perform all these steps by hand.
Z TESTS. Arlo Clark-Foos. Allows us to easily see how one score (or sample) compares with all other scores (or a population). Jessica is 15 years old and 66.41 in. tall For 15 year old girls, μ = 63.8, σ = 2.66. ( X − μ ) z = = σ. 2.66 63.8) − 66.41 ( = 0 .98. 1. Percentile: How many 15 year old girls are shorter than Jessica? 50% + 33.65% = 83.65%
Basics. Hypothesis Testing. Regression. ANOVA. Probability. Time Series. Fun. Null Hypothesis: Definition, Rejecting & Examples. By Jim Frost6 Comments. What is a Null Hypothesis? The null hypothesis in statisticsstates that there is no difference between groups or no relationship between variables.