• Home
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact
BarbaraYam.com
  • Data Analytics Projects
    • Data Analytics Using Python
      • Mini Project: AirBnB Recommender App
      • Data Visualization
      • Text Mining (Text Nomalisation)
      • Statistics Analysis
      • A Study of Food Consumption in the World
    • DataCamp Projects
    • Tableau Visualizations
    • Experimenting with Google Teachable Machines
    • Data Analytics Using R
      • Udemy Projects
      • Using Dplyr
      • Hypothesis Testing
      • Linear Regression (Project1)
      • Linear Regression (Project 2)
      • Linear Regressions Exploring Interactions
      • Regression Models
      • Multiple Regression (Project 2)
  • Philanthropy
    • 2018 Water for Life
    • 2019 Habitat for Humanity Global Build
  • My Thoughts
  • Contact

Hypothesis Testing (Paired and Unpaired T-tests)

Hypothesis Testing

Hypothesis Testing

Barbara Yam

4 May 2020

# Q1

test1 <- read.csv("bp1.csv")
head(test1)
##   Sample   CoA   CoB
## 1      1 -11.0 -32.0
## 2      2  -4.8   7.5
## 3      3  -2.5   1.8
## 4      4 -25.2 -14.0
## 5      5  -4.6  11.2
## 6      6  -8.5 -30.2
summary(test1)
##      Sample           CoA               CoB         
##  Min.   : 1.00   Min.   :-25.200   Min.   :-32.000  
##  1st Qu.: 6.75   1st Qu.:-10.850   1st Qu.:-14.825  
##  Median :12.50   Median : -8.650   Median : -5.100  
##  Mean   :12.50   Mean   : -9.229   Mean   : -6.283  
##  3rd Qu.:18.25   3rd Qu.: -5.575   3rd Qu.:  3.125  
##  Max.   :24.00   Max.   : -0.300   Max.   : 18.900

Based on the mean of CoA and CoB, it looks like CoA has a lower mean than CoB. However, for the hypothesis testing, it is safer to do a two tailed test.

Q1i. H0 : Drug A and Drug B have similar effects in reducing blood pressure. Mean of Drug A results - Mean of Drug B results = 0
H1: One drug is better than the other drug in reducing blood pressure. Mean of Drug A results - Mean of Drug B results is not 0.

Q1ii.

diff1 <- test1$CoA - test1$CoB
shapiro.test(diff1)
## 
##  Shapiro-Wilk normality test
## 
## data:  diff1
## W = 0.93116, p-value = 0.1035
var(test1$CoA)
## [1] 29.73433
var(test1$CoB)
## [1] 183.8545
var.test(test1$CoA,test1$CoB)
## 
##  F test to compare two variances
## 
## data:  test1$CoA and test1$CoB
## F = 0.16173, num df = 23, denom df = 23, p-value = 4.683e-05
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.06996222 0.37385588
## sample estimates:
## ratio of variances 
##          0.1617275

The unpaired t test is being used because of the following criteria are checked: i. Drug A and B are administed on 2 independent sample groups. ii. Using the Shapiro-Wilk normality test, the p value=0.1035 > 0.05, implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality. iii. The variances of CoA and CoB are 29.73433 and 183.8545,indicating that variances are not equal. Similarly, the F test, where p=4.683e-05 <0.05, indicates that that the variances are not equal. As a result, specifically, the Welch Two Sample t-test is used.

Q1iii.

t.test(test1$CoA,test1$CoB, alternative="two.sided", var.equal = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  test1$CoA and test1$CoB
## t = -0.98747, df = 30.25, p-value = 0.3312
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.036249  3.144582
## sample estimates:
## mean of x mean of y 
## -9.229167 -6.283333

Since p = 0.3312 > 0.0025, at confidence level of 95%, there is no conclusive evidence to reject H0. There is no conclusive evidence that one drug is better than the other.

None of the company can claim their drug is better than the other.

Q2.

test2 <- read.csv("bp2.csv")
head(test2)
##     CoA   CoB
## 1 168.2 139.2
## 2 158.5 150.3
## 3 131.2 116.8
## 4 140.2  98.2
## 5 128.5 125.9
## 6 125.6 146.8
  1. H0 : Drug A and Drug B have similar effects in reducing blood pressure. Mean of Drug A results - Mean of Drug B results = 0 H1: One drug is better than the other drug in reducing blood pressure. Mean of Drug A results - Mean of Drug B results is not 0
summary(test2)
##       CoA             CoB       
##  Min.   :104.0   Min.   : 94.8  
##  1st Qu.:135.0   1st Qu.:126.4  
##  Median :145.2   Median :140.2  
##  Mean   :144.4   Mean   :139.1  
##  3rd Qu.:155.3   3rd Qu.:155.6  
##  Max.   :168.2   Max.   :193.8
diff2 <- test2$CoA - test2$CoB
shapiro.test(diff2)
## 
##  Shapiro-Wilk normality test
## 
## data:  diff2
## W = 0.96491, p-value = 0.5445
OutVals <- boxplot(diff2)$out

OutVals
## numeric(0)
var(test2$CoA)
## [1] 209.9891
var(test2$CoB)
## [1] 540.713
var.test(test2$CoA,test2$CoB)
## 
##  F test to compare two variances
## 
## data:  test2$CoA and test2$CoB
## F = 0.38836, num df = 23, denom df = 23, p-value = 0.02752
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1680002 0.8977394
## sample estimates:
## ratio of variances 
##           0.388356

The paired t test is being used because of the following criteria are checked: i. Drug A and B are administered on the same individuals over different periods. The same target group is used. ii. Using the Shapiro-Wilk normality test, the p-value is 0.5445 > 0.05, implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality. iii. OutVals, and the boxplot indicates that there is no significant outliers in the difference between the two related groups. iv. The variance for CoA and CoB are 209.9891 and 540.713 indicating that variances are not equal.The F test, where p= 0.02752 < 0.05 confirms that the variances are not equal.

Q2iii.

t.test(test2$CoA,test2$CoB, alternative="two.sided",paired = TRUE)
## 
##  Paired t-test
## 
## data:  test2$CoA and test2$CoB
## t = 1.1762, df = 23, p-value = 0.2516
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.012238 14.587238
## sample estimates:
## mean of the differences 
##                  5.2875

Q2iii. Since the p- value 0.2516 > 0.0025, at 95% confidence interval, there is no conclusive evidence to reject the null hypothesis.

Therefore, there is no conclusive evidence that any of the drug is better than the other.

More samples could be gathered or there is simply no evidence that one drug is better than the other at 95% confidence interval.

Disclaimer: 

This is a personal website. The opinions expressed here represent my own and not those of my employer. 

In addition, my thoughts and opinions change from time to time I consider this a necessary consequence of having an open mind.

All rights reserved 2024 

Privacy Policy applies 

Terms and Conditions apply.