R-programming-statistics

R is a programming language possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithm, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R,but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on

AD1

Email subscriptions

Enter your email address:

Delivered by FeedBurner

Thursday, February 6, 2020

Arrays are similar to matrix

How to create with an array functions R

We have taken example of creating a three dimensional array of numbers. Matrix are two dimensional like vectors can contains only one data type whenever more than two dimensions we will use arrays function.
Creating and arrays
Arrays फ़ंक्शन है

 dim1 <- c("A1", "A2")
>  dim2 <- c("B1", "B2", "B3")
>  dim3 <- c("C1", "C2", "C3", "C4")
> z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
>
> z
, , C1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4

   B1 B2 B3
A1 19 21 23
A2 20 22 24

> patientID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 52)
> diabetes <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <- data.frame(patientID, age, diabetes, status)
> patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor
 with(mtcars, {
+     summary(mpg, disp, wt)
+     plot(mpg, disp)
+     plot(mpg, wt)
+ })
> with(mtcars, {
+     stats <- summary(mpg)
+     stats
+ })
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  10.40   15.43   19.20   20.09   22.80   33.90
> with(mtcars, {
+     nokeepstats <- summary(mpg)
+     keepstats <<- summary(mpg)
+ })
> nokeepstats
Error: object 'nokeepstats' not found
> keepstats
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  10.40   15.43   19.20   20.09   22.80   33.90 

Wednesday, February 5, 2020

Data frame R

Data Frames

Dataframe


Data frame is similar to datasets SAS, SPSS and Stata. The data frame is created with R.
Data<- data.frame(col1, col2,col3,..), where col1,col2....... are column vectors. Data frames are close to what the analyst typically think of as data sets. There are several ways to identify the elements of a data frame. Here using the student data as a data frame.


> studentID <- c(1, 2, 3, 4)
> age <- c(25, 34, 28, 32)
> score <- c("Type1", "Type2", "Type1", "Type1")
> status <- c("Poor", "Improved", "Excellent", "Poor")
> studentdata <- data.frame(studentID, age, score, status)
> studentdata
  studentID age score    status
1         1  25 Type1      Poor
2         2  34 Type2  Improved
3         3  28 Type1 Excellent
4         4  32 Type1      Poor
> studentdata[1:2]
  studentID age
1         1  25
2         2  34
3         3  28
4         4  32
> studentdata[c("score", "status")]
  score    status
1 Type1      Poor
2 Type2  Improved
3 Type1 Excellent
4 Type1      Poor
> studentdata$age
[1] 25 34 28 32
> table(studentdata$score, studentdata$status)
     
        Excellent  Improved  Poor
  Type1         1        0           2
  Type2         0        1           0
studentdata <- data.frame(studentID, age, score, status,
+                           row.names=studentID)
Importing data from EXCEL
download and install the RODBC package
> install.packages("RODBC")
 channel <- odbcConnectExcel("myfile.xls")
Importing data from SPSS
install.packages("Hmisc")
 mydataframe <- spss.get("mydata.sav", use.value.labels=TRUE)
 Importing data from SAS
SAS program:
>     proc export data=mydata
>     outfile="mydata.csv"
>     dbms=csv;

>     run;

Vector and matrix with R

Vector and matrix with R

vector, nrow=number_of_rows


> url_to_open
[1] "http://finviz.com/export.ashx?v=152&c=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68"
> summary(finviz)
                                                                                         X..DOCTYPE.html.
 \t\t\t                <td><img src=/img/elite/no.png srcset=/img/elite/no_2x.png 2x alt=No></td>   : 24 
 \t\t                </tr>                                                                         : 18 
 \t\t                <tr>                                                                          : 18 
                 </div>                                                                          : 15 
                     </div>                                                                      : 12 
 \t\t\t                <td><img src=/img/elite/yes.png srcset=/img/elite/yes_2x.png 2x alt=Yes></td>:  8 
 (Other)                                                                                         :356 
> clean_numeric <- function(s){
+     s <- gsub("%|\\$|,|\\)|\\(", "", s)
+     s <- as.numeric(s)
+ }
> finviz <- cbind(finviz[,1:6],apply(finviz[,7:68], 2,
+                                    clean_numeric))
Error in `[.data.frame`(finviz, , 1:6) : undefined columns selected
> finviz <- cbind(finviz[,1:6],apply(finviz[,7:68], 2,
+                                    clean_numeric))
Error in `[.data.frame`(finviz, , 1:6) : undefined columns selected
> hist(finviz$Price, breaks=100, main="Price Distribution",
+      xlab="Price")
Error in hist.default(finviz$Price, breaks = 100, main = "Price Distribution",  :
  'x' must be numeric
> industry_avg_prices <-
+     aggregate(Price~Sector+Industry,data=finviz,FUN="mean")
Error in eval(predvars, data, env) : object 'Price' not found
> url <-
+     paste("http://sports.yahoo.com/nfl/stats/byteam?group=Offense&
+ cat=Total&conference=NFL&year=season_",year,"&sort=530&old_category=Total&old_group=Offense")
Error in paste("http://sports.yahoo.com/nfl/stats/byteam?group=Offense&\ncat=Total&conference=NFL&year=season_",  :
  object 'year' not found
> sector_avg <-
+     subset(sector_avg,variable%in%c("Price","P.E","PEG","P.S","P.B"))
Error in subset(sector_avg, variable %in% c("Price", "P.E", "PEG", "P.S",  :
  object 'sector_avg' not found
> a <- c(1, 2, 5, 3, 6, -2, 4)
> b <- c("one", "two", "three")
> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
> a <- c(1, 2, 5, 3, 6, -2, 4)
> a[3]
[1] 5
> a[c(1, 3, 5)]
[1] 1 5 6
> a[2:6]
[1]  2  5  3  6 -2
> myymatrix <- matrix(vector, nrow=number_of_rows, ncol=number_of_columns,byrow=logical_value, dimnames=list(
+ char_vector_rownames, char_vector_colnames))
Error in as.vector(x, mode) :
  cannot coerce type 'closure' to vector of type 'any'
> y <- matrix(1:20, nrow=5, ncol=4)
> M <- matrix(1:20, nrow = 5, ncol = 4)
> y
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> x<-pretty(c(-5,5),30)
> k<-dnorm(x)
x<-pretty(c(-5,5),30) k<-dnorm(x)
Error: unexpected symbol in "x<-pretty(c(-5,5),30) k"
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  "klab" is not a graphical parameter
2: In doTryCatch(return(expr), name, parentenv, handler) :
  "kaxs" is not a graphical parameter
3: In doTryCatch(return(expr), name, parentenv, handler) :
  "klab" is not a graphical parameter
4: In doTryCatch(return(expr), name, parentenv, handler) :
  "kaxs" is not a graphical parameter
> x<-pretty(c(-5,5),30)
>  y<-dnorm(x)
> plot(x,y,type ="1",xlab="Normal Daviate",ylab= "Density",yaxs='i')
Error in plot.xy(xy, type, ...) : invalid plot type '1'
> x <- pretty(c(-3,3), 30)
> y <- dnorm(x)
> plot(x, y,
+      type = "l",
+      xlab = "Normal Deviate",
+      ylab = "Density",
+      yaxs = "i"
+ )
> plot(x,y,type ="l",xlab="Normal Daviate",ylab= "Density",yaxs='i')
> pnorm(1.96)
[1] 0.9750021
> lm(mpg~wt, data=mtcars)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Coefficients:
(Intercept)           wt
     37.285       -5.344

> lmfit <- lm(mpg~wt, data=mtcars)
> summary(lmfit)

Call:
lm(formula = mpg ~ wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max
-4.5432 -2.3647 -0.1252  1.4096  6.8727

Coefficients:
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
wt           -5.3445     0.5591  -9.559 1.29e-10 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446
F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

> plot(lmfit)
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
> cook<-cooks.distance(lmfit)
> plot(cook)
> predict(lmfit, mynewdata)
Error in predict.lm(lmfit, mynewdata) : object 'mynewdata' not found
> help(lm)
> library("vcd", lib.loc="~/R/win-library/3.6")
Loading required package: grid
> a <- c(1, 2, 5, 3, 6, -2, 4)
> b <- c("one", "two", "three")
> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
> a[3]
[1] 5
> a[c(1, 3, 5)]
[1] 1 5 6
> a[2:6]
[1]  2  5  3  6 -2
> myymatrix <- matrix(vector, nrow=number_of_rows, ncol=number_of_columns,byrow=logical_value, dimnames=list(
+ char_vector_rownames, char_vector_colnames))
Error in as.vector(x, mode) :
  cannot coerce type 'closure' to vector of type 'any'
> y <- matrix(1:20, nrow=5, ncol=4)
> y
     [,1] [,2] [,3] [,4]
[1,]    1    6   11   16
[2,]    2    7   12   17
[3,]    3    8   13   18
[4,]    4    9   14   19
[5,]    5   10   15   20
> cells <- c(1,26,24,68)
> rnames <- c("R1", "R2")                                 cnames <- c("C1", "C2")
Error: unexpected symbol in "rnames <- c("R1", "R2")                                 cnames"
> rnames <- c("R1", "R2")
>  cnames <- c("C1", "C2")
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
+                    dimnames=list(rnames, cnames))
> mymatrix
   C1 C2
R1  1 26
R2 24 68
> mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=FALSE,
+                    dimnames=list(rnames, cnames))
> mymatrix
   C1 C2
R1  1 24
R2 26 68
>
> x <- matrix(1:10, nrow=2)
> x
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    3    5    7    9
[2,]    2    4    6    8   10
> x[2,]
[1]  2  4  6  8 10
> x[,2]
[1] 3 4
> x[,4]
[1] 7 8
> x[1,4]
[1] 7
> x[1, c(4,5)]
[1] 7 9
> myarray <- array(vector, dimensions, dimnames)
Error in as.vector(x, mode) :
  cannot coerce type 'closure' to vector of type 'any'
> dim1 <- c("A1", "A2")
>  dim2 <- c("B1", "B2", "B3")
>  dim3 <- c("C1", "C2", "C3", "C4")
> z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
> z
, , C1

   B1 B2 B3
A1  1  3  5
A2  2  4  6

, , C2

   B1 B2 B3
A1  7  9 11
A2  8 10 12

, , C3

   B1 B2 B3
A1 13 15 17
A2 14 16 18

, , C4

   B1 B2 B3
A1 19 21 23
A2 20 22 24

> patientID <- c(1, 2, 3, 4)
>  age <- c(25, 34, 28, 52)
>  diabetes <- c("Type1", "Type2", "Type1", "Type1")
>  status <- c("Poor", "Improved", "Excellent", "Poor")
> patientdata <-data.frame(patientID, age,diabetes,status)
> patientdata
  patientID age diabetes    status
1         1  25    Type1      Poor
2         2  34    Type2  Improved
3         3  28    Type1 Excellent
4         4  52    Type1      Poor
> patientdata[1,2]
[1] 25
> patientdata[1:2]
  patientID age
1         1  25
2         2  34
3         3  28
4         4  52
> patientdata[c("diabetes", "status")]
  diabetes    status
1    Type1      Poor
2    Type2  Improved
3    Type1 Excellent
4    Type1      Poor
> patientdata$age
[1] 25 34 28 52
> table(patientdata$diabetes, patientdata$status)
     
        Excellent Improved Poor
  Type1         1        0    2
  Type2         0        1    0
> attach(mtcars)
The following object is masked from package:ggplot2:

    mpg

> summary(mpg)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  10.40   15.43   19.20   20.09   22.80   33.90
> plot(mpg, disp)
> plot(mpg, wt)
> detach(mtcars)

Tuesday, February 4, 2020

Put call ratio & PCR in option trading

How people can use some instruments as a trading or hedging tool commodities & stocks.

Options future and PCR indicator

What are options?
in a commodities exchange have launched option that developed into futures if exercised.instrument that facilitated the purchase sales of and underlying commodity like gold or silver at a fixed price on a future date.
In SEBI has approved the launch of option by exchange like MCX, n c d e x and IC which are based on the rain commodities rather than on future.
What does PCR indicate?
Optional two types calls and put from a buyer's prospective call is purchased when and underline underlying is expected to rise and put is purchased when and on reliance is expected to to correct or fall from a sellers prospective call is sold when expect the underlying to not rise above the strike sold plus premium received from the buyers
Appu is sold when sick spect and under 14 below the strike sold minus the premium received from the pit buyers.
The open interest is the number of open buy and sell positions created by the birds or sellers.since for every sailor there has to be a bi and vice versa open interest is calculated single sided for example in a gold options in on MCX expiring on March 27 the maximum number of the coal sold at 830 mm for 10 gram strike which has 249 laws open the maximum number of puts sold is is at 30000 strike this means seller expected board to trade in 30,000 to 32000 range till March 27.
similarly if you look at the entire option chain the sum of all the calls and put gives the overall PCR of goal from month of season it tells you what the writers expect this stand at 1.58 this can be read in two ways by have haste themselves against up potential fall by buying more puts than calls but still have sold more put there are police on a gold price.
if gold rises the bookseller gets to keep much for all of the premium paid by the buyers but if it falls footwears can make a huge profit. PCR is therefore of contration indicator.
What were the pros and cons of option against future?
Below 10000 limited premium paid while profit can we very high in future loss and profit can be unlimited only stop loss are placed. Also to buy option premium which is normally lesser than the margin put up to a trade a future contract. There is no mark to market daily settlement in option unlikely in futures alt of the premium value of options keeps changing.however folding option by bears for long is risky as options lose value due to time decay or theta not the case with futures which can be rolled over for sailors the risk of writing option is technically unlimited but profit are limited to the premium received.also option seller have to put up margin normally equal to the Future margin to trade how your 8 out of 10 times of zilla make money while bad loose option sellers are considered better informed and financially service than buyers.
How does it help traders?
for an experienced Ada a very high PCR indicates that and all the layers might have stopped and himesh sale either a futures or a call option to benefit from anticipated short covering by put sellers again if the ratio is much below 1 ATS a 0.7 or 0.8 it might signal or potential bottoming in price as 2 minut have sorted called and might be forced to buy them back a square of their position in the event of trigger full stop the buying back of calls rises and the price just as the buying back of a puts by shot put writer cause the price of to decline however the PCR is only one indicator and there should ideally considered fundamental and technical analysis before buying or selling as derivative trading being levered is is brought with risk

Thursday, January 30, 2020

Log-likelihood function

we create the log-likelihood function

log-likelihood function


>  LL = function(params , rets ) {alpha = params [ 1 ] ; sigsq = params [ 2 ]
+     logf = −log(sqrt (2*pi*sigsq))
+     − ( rets−alpha )^2/(2*sigsq )
+      LL = −sum( logf ) }
We then go ahead and do the MLE using the nlm (non-linear minimization) package in R. It uses a Newton-type algorithm.
# create starting guess for par ameters
 params = c(0.001, 0.001 ) 
> res = nlm( LL,params, rets )
> params = c(0.001, 0.001 )
> res = nlm( LL,params, rets )
There were 50 or more warnings (use warnings() to see the first 50)
> res
$minimum
[1] -6.591566

$estimate
[1] 1.000000e-03 2.995424e-07

$gradient
[1]      0.0 733755.7

$code
[1] 2

$iterations
[1] 7
We now pick off the results and manipulate them to get the annualized parameters {µ, σ}.

>  alpha = res$estimate [ 1 ]
>  sigsq = res$estimate [ 2 ]
>  sigma = sqrt( sigsq/h )
> sigma
[1] 0.008688192
> mu = alpha/h+0.5*sigma^2 
> mu
[1] 0.2520377

Sunday, January 26, 2020

Higher-Order Moments

 How to find Higher-Order Moments stock data science

"AAPL" "CSCO"
stockdatascience

tail(stkdata)
           AAPL.Adjusted CSCO.Adjusted
2020-01-16        315.24         49.05
2020-01-17        318.73         49.02
2020-01-21        316.57         48.80
2020-01-22        317.70         49.07
2020-01-23        319.23         49.00
2020-01-24        318.31         48.85
> aapl=AAPL[’2007−01−03::2020−01−17’]
Error: unexpected input in "aapl=AAPL[’"
>
> aapl=AAPL['2007−01−03::2020−01−17']
> csco =CSCO['2007−01−03::2020−01−17']
> aapl = as.matrix(AAPL[ , 6 ])
> csco = as.matrix(CSCO[ , 6 ])
> stkdata = cbind (aapl,csco)
>
> dim(stkdata)
[1] 3288    2
>
> n = length(stkdata[,1])
> n
[1] 3288
> rets = log( stkdata [ 2 : n , ]/stkdata[1:( n−1) ,])

> MEAN

> colMeans(rets)
AAPL.Adjusted CSCO.Adjusted 
 0.0010403556  0.0002501597 
> cv = cov ( rets )
>
> print( cv , 2 )
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted       0.00039       0.00018
CSCO.Adjusted       0.00018       0.00033
>
> cr = cor( rets )
>
> print ( cr,2 )
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted          1.00          0.49
CSCO.Adjusted          0.49          1.00
x = matrix(rnorm ( 4 ) , 2 , 2 )

> x
          [,1]     [,2]
[1,]  0.512371 1.176699

[2,] -2.124443 0.676594
> print( t( x ) , 2 )
     [,1]  [,2]
[1,] 0.51 -2.12

[2,] 1.18  0.68
> print ( t( x ) %*% x , 2 )
      [,1]  [,2]
[1,]  4.78 -0.83
[2,] -0.83  1.84

> print ( x %*% t ( x ) , 2 )
      [,1]  [,2]
[1,]  1.65 -0.29

[2,] -0.29  4.97
> cv_inv = solve( cv )

> print ( cv_inv , 2 )
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted          3411         -1828
CSCO.Adjusted         -1828          4039

> print ( cv_inv %*% cv , 2 )
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted             1      -1.1e-16
CSCO.Adjusted             0       1.0e+00

> library("corpcor", lib.loc="~/R/win-library/3.6")
> is.positive.definite( cv )
[1] TRUE
> is.positive.definite(x)
[1] FALSE
> is.positive.definite( x %*% t( x ))

[1] TRUE
Stock data in place, and we can compute daily

returns, and then convert those returns into annualized returns.
> rets_annual = rets*252
> print (c(mean( rets ) ,mean( rets_annual)))

[1] 0.0006452577     0.1626049311
Compute the daily and annualized standard deviation of returns
> r_sd = sd( rets )
>
> r_sd_annual = r_sd*sqrt ( 252 )
> print(r_sd_annual)
[1] 0.2999346
> print(c( r_sd, r_sd_annual))
[1] 0.0188941 0.2999346
>
> print(sd (rets*252))
[1] 4.761314
>
> print(sd (rets*252))/252
[1] 4.761314
[1] 0.0188941
> print( sd(rets*252))/sqrt(252 )
[1] 4.761314
[1] 0.2999346
The variance is easy as well.
> r_var = var( rets )

> r_var_annual = var ( rets )*252
> print ( c( r_var , r_var_annual ) )
[1] 0.0003869694 0.0001751147 0.0001751147
[4] 0.0003268013 0.0975162890 0.0441289039

[7] 0.0441289039 0.0823539370
Higher-Order Moments
Skewness means one tail is fatter than the other (asymmetry). Fatter right (left) tail implies positive (negative) skewness.
> skewness ( rets )
AAPL.Adjusted  CSCO.Adjusted 
   -0.4691675     -0.5079582 
Kurtosis means both tails are fatter than with a normal 
distribution.
> kurtosis (rets)
AAPL.Adjusted   CSCO.Adjusted 

     10.22829      14.64755
Example-For the normal distribution, skewness is zero, and kurtosis is 3. Kurtosis
minus three is denoted “excess kurtosis
 > skewness(rnorm(1000000))
[1] -0.0003906446

> kurtosis(rnorm (1000000))
[1] 3.007031
Using properties of the lognormal distribution, the conditional mean of the stock price becomes E[S(t + h)|S(t)] = S(t) · e^µh

Annualized volatility σ

> h = 1/252
> sigma = sd( rets)/sqrt(h) 
> sigma
[1] 0.2999346
The parameter µ is also easily estimated as
> mu = mean(rets)/h+0.5*sigma^2
> mu
[1] 0.2075853.


Sunday, January 19, 2020

Negative volume index and positive volume index

Negative volume index and positive volume index

Negative volume index and positive volume index

Negative volume index and positive volume index are used the trading rules in stock, commodity market.
NVI and PVI are based on the accumulation method the absolute value depends on the starting date. The absolute values will be different in one year and 3 years charts. Because of this limitation, you cannot derive any conclusion based on its absolute value. Long term moving average normally or 200 days moving average is plotted on the positive volume index and negative volume index charts to identify the primary market trends.
The negative volume index usually remains above the long-term average during the bull market and positive volume index below the long term average during the bear market. there is 95% probability that a bull market is in progress if negative volume index has placed a verb its long term moving average. The probability of a bear market when a positive volume index is placed below its long-term moving average is 67%.
The positive volume index and negative volume index processing its own long term average can also give important signals. we have holidays in the positive volume index that will be below the long-term average during their market and therefore positive volume index going below the long term average can be treated as the starting point of a bear market. Confirmation by a positive volume index is useful for people who trade based on other technical indicators like 200 days moving average of price. For example nifty below its 200 days moving average during August and September of 2019. Since this is a bearer signal this would have triggered or sale, if one was using only that indicator however positive volume index, remained above its 200 day moving average and therefore indicated that bear market is it to be start and saved the trader from squaring off his position.

a