Sunday, March 22, 2020

Black-Scholes formula-R

 Black-Scholes formula-R

> BlackScholes <- function(TypeFlag = c("c", "p"), S, X, Time, r, b, sigma) { TypeFlag = TypeFlag[1] d1 = (log(S/X) + (b + sigma * sigma/2) * Time)/(sigma * sqrt(Time)) d2 = d1 - sigma * sqrt(Time) if (TypeFlag == "c") price = S * exp((b - r) * Time) * pnorm(d1) - X * exp(-r * Time) * pnorm(d2) else if (TypeFlag == "p") price = X * exp(-r * Time) * pnorm(-d2) - S * exp((b - r) * Time) * pnorm(-d1) param <- list(TypeFlag = TypeFlag, S = S, X = X, Time = Time, r = r, b = b, sigma = sigma) ans <- list(parameters = param, price = price, option = "Black Scholes") class(ans) <- c("option", "list") ans }


> library("ggplot2", lib.loc="~/R/win-library/3.6")
> x <- seq(-4, 4, l = 50)
> y <- x
> myf <- function(x, y) {
+     sin(x) + cos(y)
+ }
> z <- outer(x, y, FUN = myf)
> persp(x, y, z, theta = 45, phi = 45, shade = 0.45)
> Cap = floor(Capitalization/1000)
> Cap
                2003  2004 2005  2006  2007 2008
Euronext US     1328 12707 3632 15421 15650 9208
TSX Group        888  1177 1482  1700  2186 1033
Australian SE    585   776  804  1095  1298  683
Bombay SE        278   386  553   818  1819  647
Hong Kong SE     714   861 1054  1714  2654 1328
NSE India        252   363  515   774  1660  600
Shanghai SE      360   314  286   917  3694 1425
Tokyo SE        2953  3557 4572  4614  4330 3115
BME Spanish SE   726   940  959  1322  1781  948
Deutsche Boerse 1079  1194 1221  1637  2105 1110
London SE       2460  2865 3058  3794  3851 1868
Euronext EU     2076  2441 2706  3712  4222 2101
SIX SE           727   826  935  1212  1271  857
> barplot(t(Cap)/1e+06, beside = TRUE, las = 2, ylab = "Capitalization [Mio USD]")
> title(main = "Major Stock Markets")
>
 function


> mtext(side = 3, "2003 - 2008")
> barplot(Cap/1e+06, beside = TRUE, ylab = "Capitalization [Mio USD]")
> palette(rainbow(13, s = 0.6, v = 0.75))
> stars(t(log(Cap)), draw.segments = TRUE, ncol = 3, nrow = 2,
+       key.loc = c(4.6, -0.5), mar = c(15, 0, 0, 0))
> mtext(side = 3, line = 2.2, text = "Growth and Decline of Major Stock Markets",
+       cex = 1.5, font = 2)
> abline(h = 0.9)

Friday, March 20, 2020

Logistic regression

we will focus on the logistic function. 

The logistic function used in logistic regression
 data(Carseats)
> str(Carseats)
'data.frame': 400 obs. of  11 variables:
 $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
 $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
 $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
 $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
 $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
 $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
 $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
 $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
 $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
 $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
 $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...
>
> data(Smarke)
Warning message:
In data(Smarke) : data set ‘Smarke’ not found
> data(Smarket)
> str(Smarket)
'data.frame': 1250 obs. of  9 variables:
 $ Year     : num  2001 2001 2001 2001 2001 ...
 $ Lag1     : num  0.381 0.959 1.032 -0.623 0.614 ...
 $ Lag2     : num  -0.192 0.381 0.959 1.032 -0.623 ...
 $ Lag3     : num  -2.624 -0.192 0.381 0.959 1.032 ...
 $ Lag4     : num  -1.055 -2.624 -0.192 0.381 0.959 ...
 $ Lag5     : num  5.01 -1.055 -2.624 -0.192 0.381 ...
 $ Volume   : num  1.19 1.3 1.41 1.28 1.21 ...
 $ Today    : num  0.959 1.032 -0.623 0.614 0.213 ...
 $ Direction: Factor w/ 2 levels "Down","Up": 2 2 1 2 2 2 1 2 2 2 ...
> sales.fit = lm(Sales~Advertising+ShelveLoc, data=Carseats)
>
> summary(sales.fit)

Call:
lm(formula = Sales ~ Advertising + ShelveLoc, data = Carseats)

Residuals:
    Min      1Q  Median      3Q     Max
-6.6480 -1.6198 -0.0476  1.5308  6.4098

Coefficients:
                Estimate Std. Error t value Pr(>|t|)   
(Intercept)      4.89662    0.25207  19.426  < 2e-16 ***
Advertising      0.10071    0.01692   5.951 5.88e-09 ***
ShelveLocGood    4.57686    0.33479  13.671  < 2e-16 ***
ShelveLocMedium  1.75142    0.27475   6.375 5.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.244 on 396 degrees of freedom
Multiple R-squared:  0.3733, Adjusted R-squared:  0.3685
F-statistic: 78.62 on 3 and 396 DF,  p-value: < 2.2e-16

> contrasts(Carseats$ShelveLoc)
       Good Medium
Bad       0      0
Good      1      0
Medium    0      1
> contrasts(Carseats$Urban)
    Yes
No    0
Yes   1
> contrasts(Carseats$Us)
Error in contrasts(Carseats$Us) : contrasts apply only to factors
> contrasts(Carseats$us)
Error in contrasts(Carseats$us) : contrasts apply only to factors
> contrasts(Carseats$US)
    Yes
No    0
Yes   1
> contrasts(Carseats$Price)
Error in contrasts(Carseats$Price) : contrasts apply only to factors
> lm(Today~Lag1+Lag2,data=Smarket)

Call:
lm(formula = Today ~ Lag1 + Lag2, data = Smarket)

Coefficients:
(Intercept)         Lag1         Lag2 
   0.003283    -0.026444    -0.010946 

> library("MASS", lib.loc="C:/Program Files/R/R-3.6.1/library")
> data(biopsy)
>
> str(biopsy)
'data.frame': 699 obs. of  11 variables:
 $ ID   : chr  "1000025" "1002945" "1015425" "1016277" ...
 $ V1   : int  5 5 3 6 4 8 1 2 2 4 ...
 $ V2   : int  1 4 1 8 1 10 1 1 1 2 ...
 $ V3   : int  1 4 1 8 1 10 1 2 1 1 ...
 $ V4   : int  1 5 1 1 3 8 1 1 1 1 ...
 $ V5   : int  2 7 2 3 2 7 2 2 2 2 ...
 $ V6   : int  1 10 2 4 1 10 10 1 1 1 ...
 $ V7   : int  3 3 3 3 3 9 3 3 1 2 ...
 $ V8   : int  1 2 1 7 1 7 1 1 1 1 ...
 $ V9   : int  1 1 1 1 1 1 1 1 5 1 ...
 $ class: Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 1 1 1 ...
> biopsy$ID = NULL
>
> names(biopsy) = c("thick", "u.size", "u.shape", "adhsn", "s.size",
+                   "nucl", "chrom", "n.nuc", "mit", "class")
>
> names(biopsy)
 [1] "thick"   "u.size"  "u.shape" "adhsn"   "s.size"  "nucl" 
 [7] "chrom"   "n.nuc"   "mit"     "class" 
>
> biopsy.v2 = na.omit(biopsy)
>
> library("reshape2", lib.loc="~/R/win-library/3.6")
> library("ggplot2", lib.loc="~/R/win-library/3.6")
> biop.m = melt(biopsy.v2, id.var="class")
>
> ggplot(data=biop.m, aes(x=class, y=value)) + geom_boxplot()
> +facet_wrap(~variable,ncol = 3)
Error: Cannot use `+.gg()` with a single argument. Did you accidentally put + on a new line?
>
>
> library("corrplot", lib.loc="~/R/win-library/3.6")
corrplot 0.84 loaded
Warning message:
package ‘corrplot’ was built under R version 3.6.3
> bc = cor(biopsy.v2[ ,1:9])
> corrplot.mixed(bc)
>
rcorelation
> set.seed(123)
> ind = sample(2, nrow(biopsy.v2), replace=TRUE, prob=c(0.7, 0.3))
>
> train = biopsy.v2[ind==1,]
> test = biopsy.v2[ind==2,]
> str(test)
'data.frame': 209 obs. of  10 variables:
 $ thick  : int  5 6 4 2 1 7 6 7 1 3 ...
 $ u.size : int  4 8 1 1 1 4 1 3 1 2 ...
 $ u.shape: int  4 8 1 2 1 6 1 2 1 1 ...
 $ adhsn  : int  5 1 3 1 1 4 1 10 1 1 ...
 $ s.size : int  7 3 2 2 1 6 2 5 2 1 ...
 $ nucl   : int  10 4 1 1 1 1 1 10 1 1 ...
 $ chrom  : int  3 3 3 3 3 4 3 5 3 2 ...
 $ n.nuc  : int  2 7 1 1 1 3 1 4 1 1 ...
 $ mit    : int  1 1 1 1 1 1 1 4 1 1 ...
 $ class  : Factor w/ 2 levels "benign","malignant": 1 1 1 1 1 2 1 2 1 1 ...
 - attr(*, "na.action")= 'omit' Named int  24 41 140 146 159 165 236 250 276 293 ...
  ..- attr(*, "names")= chr  "24" "41" "140" "146" ...
> table(train$class)

   benign malignant
      302       172
>
> table(test$class)

   benign malignant
      142        67
>
> full.fit = glm(class~., family=binomial, data=train)
> summary(full.fit)

Call:
glm(formula = class ~ ., family = binomial, data = train)

Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-3.3397  -0.1387  -0.0716   0.0321   2.3559 

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -9.4293     1.2273  -7.683 1.55e-14
thick         0.5252     0.1601   3.280 0.001039
u.size       -0.1045     0.2446  -0.427 0.669165
u.shape       0.2798     0.2526   1.108 0.268044
adhsn         0.3086     0.1738   1.776 0.075722
s.size        0.2866     0.2074   1.382 0.167021
nucl          0.4057     0.1213   3.344 0.000826
chrom         0.2737     0.2174   1.259 0.208006
n.nuc         0.2244     0.1373   1.635 0.102126
mit           0.4296     0.3393   1.266 0.205402
             
(Intercept) ***
thick       **
u.size       
u.shape       
adhsn       . 
s.size       
nucl        ***
chrom         
n.nuc         
mit           
---
Signif. codes: 
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 620.989  on 473  degrees of freedom
Residual deviance:  78.373  on 464  degrees of freedom
AIC: 98.373

Number of Fisher Scoring iterations: 8

>
> confint(full.fit)
Waiting for profiling to be done...
                   2.5 %     97.5 %
(Intercept) -12.23786660 -7.3421509
thick         0.23250518  0.8712407
u.size       -0.56108960  0.4212527
u.shape      -0.24551513  0.7725505
adhsn        -0.02257952  0.6760586
s.size       -0.11769714  0.7024139
nucl          0.17687420  0.6582354
chrom        -0.13992177  0.7232904
n.nuc        -0.03813490  0.5110293
mit          -0.14099177  1.0142786
> exp(coef(full.fit))
 (Intercept)        thick       u.size
8.033466e-05 1.690879e+00 9.007478e-01
     u.shape        adhsn       s.size
1.322844e+00 1.361533e+00 1.331940e+00
        nucl        chrom        n.nuc
1.500309e+00 1.314783e+00 1.251551e+00
         mit
1.536709e+00

Monday, March 16, 2020

Decision-making.

We can make powerful and insightful predictions to support

> decision-making.
 head(leaps)
                                                                                         
1 function (x, y, wt = rep(1, NROW(x)), int = TRUE, method = c("Cp",                     
2     "adjr2", "r2"), nbest = 10, names = NULL, df = NROW(x), strictly.compatible = TRUE)
3 {                                                                                     
4     if (!is.logical(int))                                                             
5         stop("int should be TRUE or FALSE")                                           
6     if (!is.null(names))                                                               
> fit=lm(BSAAM~., data=socal.water)
Error in is.data.frame(data) : object 'socal.water' not found
>
> fit=lm(BSAAM~.,data=socal.water)
Error in is.data.frame(data) : object 'socal.water' not found
> x<-matrix(rnorm(100),ncol=4)
> y<-rnorm(25)
> leaps(x,y)
$which
      1     2     3     4
1 FALSE FALSE FALSE  TRUE
1  TRUE FALSE FALSE FALSE
1 FALSE FALSE  TRUE FALSE
1 FALSE  TRUE FALSE FALSE
2  TRUE FALSE FALSE  TRUE
2 FALSE FALSE  TRUE  TRUE
2 FALSE  TRUE FALSE  TRUE
2  TRUE FALSE  TRUE FALSE
2  TRUE  TRUE FALSE FALSE
2 FALSE  TRUE  TRUE FALSE
3  TRUE FALSE  TRUE  TRUE
3  TRUE  TRUE FALSE  TRUE
3 FALSE  TRUE  TRUE  TRUE
3  TRUE  TRUE  TRUE FALSE
4  TRUE  TRUE  TRUE  TRUE

$label
[1] "(Intercept)" "1"           "2"           "3"         
[5] "4"         

$size
 [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5

$Cp
 [1] 0.291457 1.330360 2.597818 2.800123 1.479960 2.123237 2.214481
 [8] 2.669777 3.315351 4.290161 3.099933 3.465247 3.937318 4.529656
[15] 5.000000

> data(swiss)
> a<-regsubsets(as.matrix(swiss[,-1]),swiss[,1])
> summary(a)
Subset selection object
5 Variables  (and intercept)
                 Forced in Forced out
Agriculture          FALSE      FALSE
Examination          FALSE      FALSE
Education            FALSE      FALSE
Catholic             FALSE      FALSE
Infant.Mortality     FALSE      FALSE
1 subsets of each size up to 5
Selection Algorithm: exhaustive
         Agriculture Examination Education Catholic Infant.Mortality
1  ( 1 ) " "         " "         "*"       " "      " "           
2  ( 1 ) " "         " "         "*"       "*"      " "           
3  ( 1 ) " "         " "         "*"       "*"      "*"           
4  ( 1 ) "*"         " "         "*"       "*"      "*"           
5  ( 1 ) "*"         "*"         "*"       "*"      "*"           
> b<-regsubsets(Fertility~.,data=swiss,nbest=2)
> summary(b)
Subset selection object
Call: regsubsets.formula(Fertility ~ ., data = swiss, nbest = 2)
5 Variables  (and intercept)
                 Forced in Forced out
Agriculture          FALSE      FALSE
Examination          FALSE      FALSE
Education            FALSE      FALSE
Catholic             FALSE      FALSE
Infant.Mortality     FALSE      FALSE
2 subsets of each size up to 5
Selection Algorithm: exhaustive
         Agriculture Examination Education Catholic Infant.Mortality
1  ( 1 ) " "         " "         "*"       " "      " "           
1  ( 2 ) " "         "*"         " "       " "      " "           
2  ( 1 ) " "         " "         "*"       "*"      " "           
2  ( 2 ) " "         " "         "*"       " "      "*"           
3  ( 1 ) " "         " "         "*"       "*"      "*"           
3  ( 2 ) "*"         " "         "*"       "*"      " "           
4  ( 1 ) "*"         " "         "*"       "*"      "*"           
4  ( 2 ) " "         "*"         "*"       "*"      "*"           
5  ( 1 ) "*"         "*"         "*"       "*"      "*"           
> coef(a, 1:3)
[[1]]
(Intercept)   Education
 79.6100585  -0.8623503

[[2]]
(Intercept)   Education    Catholic
 74.2336892  -0.7883293   0.1109210

[[3]]
     (Intercept)        Education         Catholic Infant.Mortality
     48.67707330      -0.75924577       0.09606607       1.29614813

> vcov(a, 3)
                  (Intercept)     Education      Catholic
(Intercept)      62.711883147 -0.2349982009 -0.0011120059
Education        -0.234998201  0.0136416868  0.0004427309
Catholic         -0.001112006  0.0004427309  0.0007408169
Infant.Mortality -2.952862263  0.0033603646 -0.0017163629
                 Infant.Mortality
(Intercept)          -2.952862263
Education             0.003360365
Catholic             -0.001716363
Infant.Mortality      0.149759535
> plot(a)
> plot(a,scale="r2")
PAIRS

PLOTS

Sunday, March 15, 2020

linear regression in R, one uses the lm() function to create a model i

 Linear regression in R, one uses the lm() function to create a model in the

> standard form of fit = lm(Y~X)
R is a collaborative project with many contributors.
 > data(snake)
>
> attach(snake)
> dim(snake)
[1] 17  2
>
> head(snake)
     X    Y
1 23.1 10.5
2 32.8 16.7
3 31.8 18.2
4 32.0 17.0
5 30.4 16.3
6 24.0 10.5
> names(snake) = c("Independent", "dependent")
> attach(snake)
> head(snake)
  Independent dependent
1        23.1      10.5
2        32.8      16.7
3        31.8      18.2
4        32.0      17.0
5        30.4      16.3
6        24.0      10.5
>
> plot(Independent, dependent, xlab="water content of snow", ylab="water yield")
>
> # linear regression in R, one uses the lm() function to create a model in the
> standard form of fit = lm(Y~X).
Error: unexpected symbol in "standard form"
> yield.fit = lm(dependent~Independent)
>
> summary(yield.fit)

Call:
lm(formula = dependent ~ Independent)

Residuals:
    Min      1Q  Median      3Q     Max
-2.1793 -1.5149 -0.3624  1.6276  3.1973

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.72538    1.54882   0.468    0.646   
Independent  0.49808    0.04952  10.058 4.63e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.743 on 15 degrees of freedom
Multiple R-squared:  0.8709, Adjusted R-squared:  0.8623
F-statistic: 101.2 on 1 and 15 DF,  p-value: 4.632e-08

>
> plot(Independent,dependent)
> abline(yield.fit, lwd=3, col="red")
>
linear model

> par(mfrow=c(2,2))
>
> plot(yield.fit)
>
residual
> qqPlot(yield.fit)
[1]  7 10
>
> data(water)
>
> str(water)
'data.frame': 43 obs. of  8 variables:
 $ Year   : int  1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 ...
 $ APMAM  : num  9.13 5.28 4.2 4.6 7.15 9.7 5.02 6.7 10.5 9.1 ...
 $ APSAB  : num  3.58 4.82 3.77 4.46 4.99 5.65 1.45 7.44 5.85 6.13 ...
 $ APSLAKE: num  3.91 5.2 3.67 3.93 4.88 4.91 1.77 6.51 3.38 4.08 ...
 $ OPBPC  : num  4.1 7.55 9.52 11.14 16.34 ...
 $ OPRC   : num  7.43 11.11 12.2 15.15 20.05 ...
 $ OPSLAKE: num  6.47 10.26 11.35 11.13 22.81 ...
 $ BSAAM  : int  54235 67567 66161 68094 107080 67594 65356 67909 92715 70024 ...
>
> socal.water = water[ ,-1] #new dataframe with the deletion of column 1
>
> head(socal.water)
  APMAM APSAB APSLAKE OPBPC  OPRC OPSLAKE  BSAAM
1  9.13  3.58    3.91  4.10  7.43    6.47  54235
2  5.28  4.82    5.20  7.55 11.11   10.26  67567
3  4.20  3.77    3.67  9.52 12.20   11.35  66161
4  4.60  4.46    3.93 11.14 15.15   11.13  68094
5  7.15  4.99    4.88 16.34 20.05   22.81 107080
6  9.70  5.65    4.91  8.88  8.15    7.41  67594
> water.cor = cor(socal.water)
> water.cor
            APMAM      APSAB    APSLAKE      OPBPC      OPRC
APMAM   1.0000000 0.82768637 0.81607595 0.12238567 0.1544155
APSAB   0.8276864 1.00000000 0.90030474 0.03954211 0.1056396
APSLAKE 0.8160760 0.90030474 1.00000000 0.09344773 0.1063836
OPBPC   0.1223857 0.03954211 0.09344773 1.00000000 0.8647073
OPRC    0.1544155 0.10563959 0.10638359 0.86470733 1.0000000
OPSLAKE 0.1075421 0.02961175 0.10058669 0.94334741 0.9191447
BSAAM   0.2385695 0.18329499 0.24934094 0.88574778 0.9196270
           OPSLAKE     BSAAM
APMAM   0.10754212 0.2385695
APSAB   0.02961175 0.1832950
APSLAKE 0.10058669 0.2493409
OPBPC   0.94334741 0.8857478
OPRC    0.91914467 0.9196270
OPSLAKE 1.00000000 0.9384360
BSAAM   0.93843604 1.0000000
>library("corrplot", lib.loc="~/R/win-library/3.6")
> corrplot(water.cor, method="ellipse")

Saturday, March 14, 2020

How surveyr to used rail survey

How surveyr to used metro rail survey.

Few month ago we was struggling to save its computer friendly reputation. Aap comprehensive survey was commissioned by the Delhi metro rail corporations in March 2016. Total off of 39987 computers work interviewed which accounts for more than 2% of of the total daily commuters on the metro. The survey was conducted through random distribution of Chennai among the commuters on all 6 lines.
the purpose of the survey was to assess the overall satisfaction level of the computer with the Delhi metro which presently characters to an average of 50 lakh commuters a day. The finding of the survey show that 90% of of computers are very happy with the the tactics facility of the metro while only 3% felt the need of improvement.regarding maintenance of punctuality 6% commuters set that metro needed improvement 3% commuters fat diet train and station SBI tie dye and cleaner while 4% felt that comfort level needs improvement.
Survey
Survey

while 93% of commuters was satisfied with its punctuality 85% felt that metro stations and tail are clean and well maintained. Comfortable journey in the metro made 86% of commuters satisfied.

Friday, March 13, 2020

Univariate linear regression in data science

>How to apply univariate linear regression in data science with R.

We are going to predict quantitative response Y, is one predictor variable, x where why has a dinner relationship with x.
Y=b0+b1+e
b0=intercept
b1=slope
e=error term.
The least squares choose the model parameters that minimise the sum of square (RSS) of predict the value of x values versus the actual Y values.

 data(anscombe)
>
> attach(anscombe)
>
> anscombe
   x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89
> #correlation of x1 and y1
> cor(x1, y1)
[1] 0.8164205
> cor(x2,y2)
[1] 0.8162365
> cor(x3,y3)
[1] 0.8162867
> cor(x4,y4)
[1] 0.8165214
> cor(x2, y1)
[1] 0.8164205
> #create a 2x2 grid for plotting
>
univariate linear regression in data science
FIG1

> par(mfrow=c(2,2))
> plot(x1, y1, main="Plot 1")
>
> plot(x2, y2, main="Plot 2")
>
> plot(x3, y3, main="Plot 3")
>
> plot(x4, y4, main="Plot 4")
> #Plot 1 appears to have a true linear relationship, Plot 2 is curvilinear, Plot
> 3 has a dangerous outlier, and Plot 4 is driven by the one outlier

Wednesday, March 11, 2020

How R-language used in computer.

R-language used in data science.

People have helped me right this blogs. Similarly ideas came from the students and teachers of introduction to data science with R.data scientist who knows how to program and computers give you scientific superpower. It can do all of these things quickly and error free.R gives youlanguage to speak in. It gives you always to talk to your computer and free. You can save data into an object like P or Q .whenever R counters the object it will replace it with the data saved inside. You can name and object in R almost anything you want but there are few rules. First name cannot start with a number.second on name cannot use some special symbols like ^,€,* ,$,!,+,-@...
R data science
R-datascience

R-uses element-wise exclusion.

When we use two or more vectors in operations, R will line up the vectors and perform a sequence of individual operations.
library("ggplot2", lib.loc="~/R/win-library/3.6")
> x <- c(-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1)
>
> x
 [1] -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0
> y=x^3
> y
 [1] -1.000 -0.512 -0.216 -0.064 -0.008  0.000  0.008  0.064
 [9]  0.216  0.512  1.000
> qplot(x,y)
> qplot(y,x)
> qplot(x, binwidth = 1)
> qplot(y, binwidth = 1)

y~x

>
> roll <- function() {
+     die <- 1:6
+     dice <- sample(die, size = 2, replace = TRUE,
+                    prob = c(1/8, 1/8, 1/8, 1/8, 1/8, 3/8))
+ sum(dice)
+ }
>
> rolls <- replicate(10000, roll())
>
> qplot(rolls, binwidth = 1)
R apply
conclusion
R is free to learn from online.

Black-Scholes formula-R

 Black-Scholes formula-R > BlackScholes <- function(TypeFlag = c("c", "p"), S, X, Time, r, b, sigma) { TypeFla...