Sunday, March 8, 2020

Which measure of dispersion to use

Measure of Central values

The choice of the suitable measure depends on the two factors
1. The type of data available
If there generally acute avoid the mean deviations as well. If they have gaps around the quartiles the bottles division should be avoided. If there are open end classes the quartile measure of dispersion should be pray for. If if they are generally rescued and dfu in number or contain extreme value avoid the standard deviation.
Measure of dispersion
Measure of dispersion


2. The purpose of investigation
vishwa make use of standard deviation for measuring variability.in an elementary treatment of statical series in which a measure of variability is desired only for itself any of three measures, namely range quartile deviation and average deviation would be acceptable. Probably the average deviations would be better.howrah in usual practise the measure of variability is employed in for the statistical analysis. It is free from those defects from which other measures.it lends itself to the analysis of variability in term of normal call of error.Practically all advanced statistical methods deal with variability and centre around the standard deviation.

Monday, March 2, 2020

How to apply function R


Function:

 chol(x) Choleski decomposition col(x) Matrix with column numbers of the elements diag(x) Create a diagonal matrix from a vector ncol(x) Returns the number of columns of a matrix nrow(x) Returns the number of rows of a matrix qr(x) QR matrix decomposition row(x) Matrix with row numbers of the elements solve(A,b) Solve the system Ax=b solve(x) Calculate the inverse svd(x) Singular value decomposition var(x) Covariance matrix of the columns
HOW TO APPLY FUNCTION R
FUNCTION


Wednesday, February 26, 2020

Difference between sampling and nonsampling error

Difference between sampling and non sampling error

Sampling error
sampling gives rise to certain error known as sampling error the error could not be present in a complete enumeration survey the error can be controlled.
Bias

Biased Errors
This error arise from any biased in selection estimation, etc
For example if in place of simple random sampling deliberate sampling has been used in the particular case some wise is introduced in the Richard and hence such error are called biased sampling error.
Unbiased error
These error arises due to to chance difference between the member of population including in the same per and those not included. An error in statics is different between the value of static and that of corresponding parameter.
Causes of Bias
Bias may arise due to
Faculty process of selection
Faculty work during the collection
Faculty method of analysis.
Faculty selection
faculty selection of the sample main give rise to buyers in a number of ways
Deliberate selection of representatives sample
Conscious or unconscious bias in the selection of random sample.
Substitution
Non-response
Bias due to faculty collection of data
Any consistent error in measurement will give rise to buy whether the measurement are are carried out on a simple or on all the units of the population.
Bias in Analysis
in addition to base which arises from faculty process of selection and faculty collections of information, faculty method of analysis may also introduced by.
Avoidance of Bias
If possibility of bias exist fully objective conclusion cannot be drawn. The first essential of any sampling for sensors produce must therefore be the elimination of all sources of bias.
The sampling error usually decreases with increase in sample size and in fact in which situation the decreases is inversely proportional to the square root of the sample size as as can be seen. Sample survey to provide estimates within permissible margin of error instead of a complete enumeration survey as in the letter the F4 and the cost needed will be substantially higher due to the attempt to reduced sampling error  to zero.
Non sampling errors
non sampling error can occur at every stage of planning and execution of census for survey such error can arise due to a number of causes such as defective method of data collection and tabulation faculty definition incomplete coverage of the population for sample etc.




Tuesday, February 25, 2020

Statistics and others

Statistics and the state

In an Anderson x The rolling things and Chief have relied heavily on statistics in farming suitable military and physical policies. DJ statistics help in farming suitable policies.all ministries and department of government whether they be finance transports, defence, railway, commerce, posts and others industries.the transport department cannot solve the problem of transport in Delhi unless it knows how many buses are operating at present what is the total requirement and therefore, how many e additional bases be added to the existing fleet.
Statistics and economics

Statistics and business
The higher the degree of accuracy of a businessman estimators the greater is the success attending on his business.Business activities can be grouped under the different heads.
Production
Sale
Purchase
Finance
Personal
Accounting
Market and product research
Quality control.
Statistics and economics
statical data and statical method are of immense help in the proper understanding of the economic problems and in the formulation of economic policies
What to produce
How to produce
For whom to produce 
these are questions that need a lot of statical data in the absence of which it is not possible to arrive at correct decision. Statistics production help in adjusting the supply to demand.
Statistics and physical science
Static techniques have provided to be extremely useful in the study of all natural sciences like astronomy biology medicine logi zoology botany etc. Example one has to really heavily on statics in conducting experiments about the plant effect of temperatures type of soil etc.

Statistics and research

statical method effect research in medicine and public health. There is a hardly any research work today that one can find complete without statical data and statical method
Also it is impossible to understand the meaning and implication of most of the research finding in various discipline of knowledge without having at least a speaking acquaintance with the subject of statistics.

Sunday, February 23, 2020

Data frames a very much like spreadsheets or tables

Data frames a very much like spreadsheets or tables,
dataframes

 but they are also a lot like databases. Some sort of happy medium. If you want to join two dataframes, it is the same a joining two databases.
[1] "GOOGL" "AAPL"  "CSCO"
>  csco = as.matrix (CSCO [ , 6 ] ),
> ibm = as.ma trix ( IBM [ , 6 ] )
> aapl = as.matrix(AAPL [ , 6 ] )
> csco = as.matrix (CSCO [ , 6 ] )
> google =as.ma trix (GOOGL [ , 6 ] )
> google =as.matrix (GOOGL [ , 6 ] )
> stkdata = cbind ( google , aapl , csco )
> dim( stkdata )
[1] 3307    3
> n = length ( stkdata [ , 1 ] )
> n
[1] 3307
> n = length (stkdata [ , 1 ] )
> rets = log(stkdata [ 2 : n , ] / stkdata[ 1 :( n−1 ),] )
> colMeans ( rets )
GOOGL.Adjusted  AAPL.Adjusted  CSCO.Adjusted
  0.0005585866   0.0010300534   0.0002325053
> cv = cov ( rets )
> print ( cv , 2 )
               GOOGL.Adjusted AAPL.Adjusted CSCO.Adjusted
GOOGL.Adjusted        0.00032       0.00019       0.00016
AAPL.Adjusted         0.00019       0.00039       0.00018
CSCO.Adjusted         0.00016       0.00018       0.00033
>
> cr = cor(rets )
>
> print (cr,4 )
               GOOGL.Adjusted AAPL.Adjusted CSCO.Adjusted
GOOGL.Adjusted         1.0000        0.5451        0.5030
AAPL.Adjusted          0.5451        1.0000        0.4929
CSCO.Adjusted          0.5030        0.4929        1.0000
>
> x = matrix(rnorm ( 12 ) , 4 , 3 )
> x
            [,1]      [,2]       [,3]
[1,]  0.88410486  1.176699  1.3999856
[2,]  0.09300388  0.676594 -0.7676680
[3,]  0.51237100  1.673757 -0.1873588
[4,] -2.12444302 -1.384905 -0.4829591
> print ( t(x) , 3 )
      [,1]   [,2]   [,3]   [,4]
[1,] 0.884  0.093  0.512 -2.124
[2,] 1.177  0.677  1.674 -1.385
[3,] 1.400 -0.768 -0.187 -0.483
> print (t(x)%*%x , 3 )
     [,1] [,2] [,3]
[1,] 5.57 4.90 2.10
[2,] 4.90 6.56 1.48
[3,] 2.10 1.48 2.82
> print( x%*%t(x) , 3 )
       [,1]   [,2]  [,3]   [,4]
[1,]  4.126 -0.196  2.16 -4.184
[2,] -0.196  1.056  1.32 -0.764
[3,]  2.160  1.324  3.10 -3.316
[4,] -4.184 -0.764 -3.32  6.664
>
> cv_inv = solve ( cv )
>
> print ( cv_inv , 3 )
               GOOGL.Adjusted AAPL.Adjusted CSCO.Adjusted
GOOGL.Adjusted           5027         -1782         -1529
AAPL.Adjusted           -1782          4045         -1288
CSCO.Adjusted           -1529         -1288          4503
> print(cv_inv%*%cv , 3 )
               GOOGL.Adjusted AAPL.Adjusted CSCO.Adjusted
GOOGL.Adjusted       1.00e+00      2.22e-16      1.67e-16
AAPL.Adjusted       -2.78e-17      1.00e+00     -5.55e-17
CSCO.Adjusted        1.11e-16      0.00e+00      1.00e+00
>
> library("corpcor", lib.loc="~/R/win-library/3.6")
> is.positive.definite( cv )
[1] TRUE
> n = dim( data ) [ 1 ]
>
> n
NULL
> n = dim( google )
> n
[1] 3307    1
> s = data [ , 7 ]

    kurtosis, skewness

> skewness ( rets )
GOOGL.Adjusted  AAPL.Adjusted  CSCO.Adjusted
     0.4842460     -0.4681109     -0.5107760
>
> kurtosis (rets )
GOOGL.Adjusted  AAPL.Adjusted  CSCO.Adjusted
      14.51975       10.18243       14.56774
>
> skewness (rnorm ( 1000000 ) )
[1] -0.0003929433
>
> kurtosis(rnorm ( 1000000 ) )
[1] 3.007031
> h = 1/252
>
> sigma = sd ( rets )/sqrt( h )
>
> sigma
[1] 0.2941101
> mu = mean( rets)/h+0.5*sigma^2
>
> mu
[1] 0.1962266
>  LL = function (params,rets ) {
+      alpha = params [ 1 ] ; sigsq = params [ 2 ]
+      logf = −log ( sqrt(2*pi*sigsq ) )
+     − ( rets−alpha )^2 / (2*sigsq )
+      LL = −sum( logf ) }
>
> LL
function (params,rets ) {
     alpha = params [ 1 ] ; sigsq = params [ 2 ]
     logf = -log ( sqrt(2*pi*sigsq ) )
    - ( rets-alpha )^2 / (2*sigsq )
     LL = -sum( logf ) }
> params = c ( 0.001 , 0.001 )

Saturday, February 22, 2020

17 observation two variable.Now y is dependent on x

package ‘alr3’ successfully unpacked and MD5 sums checked
package ‘alr3’ was built under R version 3.6.2
> data("snake")
> head(snake)
     X    Y
1 23.1 10.5
2 32.8 16.7
3 31.8 18.2
4 32.0 17.0
5 30.4 16.3
6 24.0 10.5
> dim(snake)
[1] 17  2
> There are 17 observation  two variable.Now y is dependent on x.
Error: unexpected symbol in "There are"
> change x  and y into meaning ful variable.names()
Error: unexpected symbol in "change x"
> name(snake)=c(content', 'yield')
Error: unexpected string constant in "name(snake)=c(content', '"
> name(snake)=c('content', 'yield')
Error in name(snake) = c("content", "yield") :
  could not find function "name<-"
> name(snake)=c("content", "yield")
Error in name(snake) = c("content", "yield") :
  could not find function "name<-"
> names(snake) = c("content", "yield")
> attach(snake) #reattach data with new names
>
> head(snake)
  content yield
1    23.1  10.5
2    32.8  16.7
3    31.8  18.2
4    32.0  17.0
5    30.4  16.3
6    24.0  10.5
>
> plot(content, yield, xlab="water content of snow", ylab="water yield")
> A linear regression in R, one uses the lm() function to create a model in the
> standard form of fit = lm(Y~X).
> yield.fit = lm(yield~content)
>
> summary(yield.fit)

Call:
lm(formula = yield ~ content)

Residuals:
    Min      1Q  Median      3Q     Max
-2.1793 -1.5149 -0.3624  1.6276  3.1973

Coefficients:
            Estimate Std. Error t value Pr(>|t|) 
(Intercept)  0.72538    1.54882   0.468    0.646 
content      0.49808    0.04952  10.058 4.63e-08 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.743 on 15 degrees of freedom
Multiple R-squared:  0.8709, Adjusted R-squared:  0.8623
F-statistic: 101.2 on 1 and 15 DF,  p-value: 4.632e-08
> Looking at the parameter estimates,
> the model tells us that the yield is equal to 0.72538 plus 0.49808 times the content. It
> can be stated that for every one unit change in the content, the yield will increase by
> 0.49808 units. F-statistic is used to test the null hypothesis that the model coefficients
> are all 0.
> The interpretation in
> this case is that 87 percent of the variation in the water yield can be explained by the
> water content of snow.
> We can recall our scatterplot, and now add the best fit line produced by our model
> plot(content, yield)
>
> abline(yield.fit, lwd=3, col="red")
>
> A linear regression model is only as good as the validity of its assumptions
> par(mfrow=c(2,2))
>
> plot(yield.fit)
> qqPlot(yield.fit)
[1]  7 10

 There are 17 observation  two variable.Now y is dependent on x.

> set.seed(123)
> ma.sim = arima.sim(list(order=c(0,0,1), ma=-0.5), n=200)
>
> plot(ma.sim)
>
> acf(ma.sim)
> pacf(ma.sim)
 two variable.Now y is dependent on x.

Friday, February 21, 2020

R uses data science

R uses
It uses the symbols for addition, subtraction, multiplication, division, and exponents nonsense percentage can be used to specify the order of operation.

R  use data science
1. Arithmetic
>pi
3.141.......
2. Variables
It can be letters numbers and dot,-.
>x<- 200
>x
[1] 100
good programming practice is to use informative name for your variables to improve readability.
3. Functions
Function text 1 and more inputs and produces one and more outputs.
>seq(from=1, to= 8,by=2)
[1] 1 3 5 7 9
4.
Vectors
> (x <-s (1,15, by =3)
[1] 1 3 5 7 9 11 13
5. Matrices
It is created from a vector using the function matrix

[1] "AAPL" "CSCO"
>
> aapl = as.matrix (AAPL[ , 6 ] )
>
> csco = as.matrix(CSCO[,6])
> stkdata = cbind ( aapl , csco )
>
> head(stkdata)
           AAPL.Adjusted CSCO.Adjusted
2007-01-03      10.39169      21.46619
2007-01-04      10.62234      22.03129
2007-01-05      10.54669      22.03904
2007-01-08      10.59878      22.16290
2007-01-09      11.47922      22.03904
2007-01-10      12.02857      22.20160
> tail(stkdata)
           AAPL.Adjusted CSCO.Adjusted
2020-02-11        319.61         49.13
2020-02-12        327.20         49.93
2020-02-13        324.87         47.32
2020-02-14        324.95         46.97
2020-02-18        319.00         46.59
2020-02-19        323.62         46.29
> dim( stkdata )
[1] 3305    2
> Now, compute daily returns. This time, we do log returns in continuoustime. The mean returns are:
Error: unexpected ',' in "Now,"
> n = length ( stkdata [ , 1 ] )
> n
[1] 3305
> rets = log ( stkdata [ 2:n , ] / stkdata[ 1 :( n−1 ),] )
> colMeans ( rets )
AAPL.Adjusted CSCO.Adjusted
 0.0010407275  0.0002325807
> we  can find covariance matrix and correlation matrix:
Error: unexpected symbol in "we  can"
>   
> cv = cov ( rets )
> cv
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted  0.0003870351  0.0001754235
CSCO.Adjusted  0.0001754235  0.0003271735
> print ( cv , 2 )
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted       0.00039       0.00018
CSCO.Adjusted       0.00018       0.00033
>
> cr = cor ( rets )
> cr
              AAPL.Adjusted CSCO.Adjusted
AAPL.Adjusted     1.0000000     0.4929735
CSCO.Adjusted     0.4929735     1.0000000

Black-Scholes formula-R

 Black-Scholes formula-R > BlackScholes <- function(TypeFlag = c("c", "p"), S, X, Time, r, b, sigma) { TypeFla...