# R-programming-statistics

R is a programming language possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithm, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R,but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on

## Email subscriptions

Delivered by FeedBurner

Showing posts with label hypothesis test. Show all posts
Showing posts with label hypothesis test. Show all posts

## Two sample t-test

x = c(70, 82, 78, 74, 94, 82)
> n = length(x)
> m=8 observation of y
"m=8 observation"
> y = c(64, 72, 60, 76, 72, 80, 84, 68)
> m = length(y)
> we will test H0 : Âµ1 = Âµ2 versus H1 : Âµ1 > Âµ2.
> x_bar = mean(x)
> s_x = sd(x)
> y_bar = mean(y)
> s_y = sd(y)
> s_p = sqrt(((n - 1) * s_x ^ 2 + (m - 1) * s_y ^ 2) / (n + m - 2))
> t = ((x_bar - y_bar) - 0) / (s_p * sqrt(1 / n + 1 / m))
> t
[1] 1.823369
> 1 - pt(t, df = n + m - 2)
[1] 0.04661961
> t.test(x, y, alternative = c("greater"), var.equal = TRUE)

### Two Sample t-test

data:  x and y
t = 1.8234, df = 12, p-value = 0.04662
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
0.1802451       Inf
sample estimates:
mean of x mean of y
80        72
> t_test_data = data.frame(values = c(x, y), group = c(rep("A", length(x)), rep("B", length(y))))
> t_test_data
values group
1      70     A
2      82     A
3      78     A
4      74     A
5      94     A
6      82     A
7      64     B
8      72     B
9      60     B
10     76     B
11     72     B
12     80     B
13     84     B
14     68     B
> t.test(values ~ group, data = t_test_data, alternative = c("greater"), var.equal = TRUE)

## What is the basic structure of the hypothesis test?

The most common being observation following a normal distribution.
The á•¼o(Null) and â±§(alternative hypothesis are specified) mostly null specifies a particular value of a parameter.
Under the general assumption, we take Ho is true, the distribution of the test statistic is known.

Given the distribution and value of the test statistic and the form of â±§, we can calculate the P-value of the test.
Based on p-value and pre-specified level of significance, we make decision
Fail to reject Ho
Reject the Ho

### One sample t-test in R

we have taken 9 random sample
t.test(x = apt_crisp\$weight, mu = 16, alternative = c("less"), conf.level = 0.95)
> apt_crisp = data.frame(weight = c(15.5, 16.2, 16.1, 15.8, 15.6, 16.0, 15.8, 15.9, 16.2))
> x_bar = mean(apt_crisp\$weight)
> s = sd(apt_crisp\$weight)
> mu_0 = 16
> n = 9
> t = (x_bar - mu_0) / (s / sqrt(n))
> t
[1] -1.2
> pt(t, df = n - 1)
[1] 0.1322336
data:  apt_crisp\$weight
t = -1.2, df = 8, p-value = 0.1322
alternative hypothesis: true mean is less than 16
95 percent confidence interval:
-Inf 16.05496
sample estimates:
mean of x
15.9

> apt_test_results = t.test(apt_crisp\$weight, mu = 16,
+ alternative = c("two.sided"), conf.level = 0.95)
> names(apt_test_results)
[1] "statistic"   "parameter"   "p.value"
[4] "conf.int"    "estimate"    "null.value"
[7] "stderr"      "alternative" "method"
[10] "data.name"
> qt(0.975, df = 8)
[1] 2.306004
apt_test_results\$conf.int
[1] 15.70783 16.09217
attr(,"conf.level")
[1] 0.95
> c(mean(apt_crisp\$weight) - qt(0.975, df = 8) * sd(apt_crisp\$weight) / sqrt(9),
+   mean(apt_crisp\$weight) + qt(0.975, df = 8) * sd(apt_crisp\$weight) / sqrt(9))
[1] 15.70783 16.09217