Linear regression in R, one uses the lm() function to create a model in the

> standard form of fit = lm(Y~X)
R is a collaborative project with many contributors.
> data(snake)
>
> attach(snake)
> dim(snake)
[1] 17  2
>
X    Y
1 23.1 10.5
2 32.8 16.7
3 31.8 18.2
4 32.0 17.0
5 30.4 16.3
6 24.0 10.5
> names(snake) = c("Independent", "dependent")
> attach(snake)
Independent dependent
1        23.1      10.5
2        32.8      16.7
3        31.8      18.2
4        32.0      17.0
5        30.4      16.3
6        24.0      10.5
>
> plot(Independent, dependent, xlab="water content of snow", ylab="water yield")
>
> # linear regression in R, one uses the lm() function to create a model in the
> standard form of fit = lm(Y~X).
Error: unexpected symbol in "standard form"
> yield.fit = lm(dependent~Independent)
>
> summary(yield.fit)

Call:
lm(formula = dependent ~ Independent)

Residuals:
Min      1Q  Median      3Q     Max
-2.1793 -1.5149 -0.3624  1.6276  3.1973

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.72538    1.54882   0.468    0.646
Independent  0.49808    0.04952  10.058 4.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.743 on 15 degrees of freedom
Multiple R-squared:  0.8709, Adjusted R-squared:  0.8623
F-statistic: 101.2 on 1 and 15 DF,  p-value: 4.632e-08

>
> plot(Independent,dependent)
> abline(yield.fit, lwd=3, col="red")
>

> par(mfrow=c(2,2))
>
> plot(yield.fit)
>
> qqPlot(yield.fit)
[1]  7 10
>
> data(water)
>
> str(water)
'data.frame': 43 obs. of  8 variables:
\$ Year   : int  1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 ...
\$ APMAM  : num  9.13 5.28 4.2 4.6 7.15 9.7 5.02 6.7 10.5 9.1 ...
\$ APSAB  : num  3.58 4.82 3.77 4.46 4.99 5.65 1.45 7.44 5.85 6.13 ...
\$ APSLAKE: num  3.91 5.2 3.67 3.93 4.88 4.91 1.77 6.51 3.38 4.08 ...
\$ OPBPC  : num  4.1 7.55 9.52 11.14 16.34 ...
\$ OPRC   : num  7.43 11.11 12.2 15.15 20.05 ...
\$ OPSLAKE: num  6.47 10.26 11.35 11.13 22.81 ...
\$ BSAAM  : int  54235 67567 66161 68094 107080 67594 65356 67909 92715 70024 ...
>
> socal.water = water[ ,-1] #new dataframe with the deletion of column 1
>
APMAM APSAB APSLAKE OPBPC  OPRC OPSLAKE  BSAAM
1  9.13  3.58    3.91  4.10  7.43    6.47  54235
2  5.28  4.82    5.20  7.55 11.11   10.26  67567
3  4.20  3.77    3.67  9.52 12.20   11.35  66161
4  4.60  4.46    3.93 11.14 15.15   11.13  68094
5  7.15  4.99    4.88 16.34 20.05   22.81 107080
6  9.70  5.65    4.91  8.88  8.15    7.41  67594
> water.cor = cor(socal.water)
> water.cor
APMAM      APSAB    APSLAKE      OPBPC      OPRC
APMAM   1.0000000 0.82768637 0.81607595 0.12238567 0.1544155
APSAB   0.8276864 1.00000000 0.90030474 0.03954211 0.1056396
APSLAKE 0.8160760 0.90030474 1.00000000 0.09344773 0.1063836
OPBPC   0.1223857 0.03954211 0.09344773 1.00000000 0.8647073
OPRC    0.1544155 0.10563959 0.10638359 0.86470733 1.0000000
OPSLAKE 0.1075421 0.02961175 0.10058669 0.94334741 0.9191447
BSAAM   0.2385695 0.18329499 0.24934094 0.88574778 0.9196270
OPSLAKE     BSAAM
APMAM   0.10754212 0.2385695
APSAB   0.02961175 0.1832950
APSLAKE 0.10058669 0.2493409
OPBPC   0.94334741 0.8857478
OPRC    0.91914467 0.9196270
OPSLAKE 1.00000000 0.9384360
BSAAM   0.93843604 1.0000000
>library("corrplot", lib.loc="~/R/win-library/3.6")
> corrplot(water.cor, method="ellipse")

1 comment:

1. Hey, thanks for this great article I really like this post and I love your blog and also Check marketing analytics certification in hyderabad at 360DIGITMG.