## How surveyr to used metro rail survey.

Few month ago we was struggling to save its computer friendly reputation. Aap comprehensive survey was commissioned by the Delhi metro rail corporations in March 2016. Total off of 39987 computers work interviewed which accounts for more than 2% of of the total daily commuters on the metro. The survey was conducted through random distribution of Chennai among the commuters on all 6 lines.
the purpose of the survey was to assess the overall satisfaction level of the computer with the Delhi metro which presently characters to an average of 50 lakh commuters a day. The finding of the survey show that 90% of of computers are very happy with the the tactics facility of the metro while only 3% felt the need of improvement.regarding maintenance of punctuality 6% commuters set that metro needed improvement 3% commuters fat diet train and station SBI tie dye and cleaner while 4% felt that comfort level needs improvement.
 Survey

while 93% of commuters was satisfied with its punctuality 85% felt that metro stations and tail are clean and well maintained. Comfortable journey in the metro made 86% of commuters satisfied.

## >How to apply univariate linear regression in data science with R.

We are going to predict quantitative response Y, is one predictor variable, x where why has a dinner relationship with x.
Y=b0+b1+e
b0=intercept
b1=slope
e=error term.
The least squares choose the model parameters that minimise the sum of square (RSS) of predict the value of x values versus the actual Y values.

data(anscombe)
>
> attach(anscombe)
>
> anscombe
x1 x2 x3 x4    y1   y2    y3    y4
1  10 10 10  8  8.04 9.14  7.46  6.58
2   8  8  8  8  6.95 8.14  6.77  5.76
3  13 13 13  8  7.58 8.74 12.74  7.71
4   9  9  9  8  8.81 8.77  7.11  8.84
5  11 11 11  8  8.33 9.26  7.81  8.47
6  14 14 14  8  9.96 8.10  8.84  7.04
7   6  6  6  8  7.24 6.13  6.08  5.25
8   4  4  4 19  4.26 3.10  5.39 12.50
9  12 12 12  8 10.84 9.13  8.15  5.56
10  7  7  7  8  4.82 7.26  6.42  7.91
11  5  5  5  8  5.68 4.74  5.73  6.89
> #correlation of x1 and y1
> cor(x1, y1)
[1] 0.8164205
> cor(x2,y2)
[1] 0.8162365
> cor(x3,y3)
[1] 0.8162867
> cor(x4,y4)
[1] 0.8165214
> cor(x2, y1)
[1] 0.8164205
> #create a 2x2 grid for plotting
>
 FIG1

> par(mfrow=c(2,2))
> plot(x1, y1, main="Plot 1")
>
> plot(x2, y2, main="Plot 2")
>
> plot(x3, y3, main="Plot 3")
>
> plot(x4, y4, main="Plot 4")
> #Plot 1 appears to have a true linear relationship, Plot 2 is curvilinear, Plot
> 3 has a dangerous outlier, and Plot 4 is driven by the one outlier

## R-language used in data science.

People have helped me right this blogs. Similarly ideas came from the students and teachers of introduction to data science with R.data scientist who knows how to program and computers give you scientific superpower. It can do all of these things quickly and error free.R gives youlanguage to speak in. It gives you always to talk to your computer and free. You can save data into an object like P or Q .whenever R counters the object it will replace it with the data saved inside. You can name and object in R almost anything you want but there are few rules. First name cannot start with a number.second on name cannot use some special symbols like ^,€,* ,\$,!,+,-@...
 R-datascience

### R-uses element-wise exclusion.

When we use two or more vectors in operations, R will line up the vectors and perform a sequence of individual operations.
library("ggplot2", lib.loc="~/R/win-library/3.6")
> x <- c(-1, -0.8, -0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6, 0.8, 1)
>
> x
[1] -1.0 -0.8 -0.6 -0.4 -0.2  0.0  0.2  0.4  0.6  0.8  1.0
> y=x^3
> y
[1] -1.000 -0.512 -0.216 -0.064 -0.008  0.000  0.008  0.064
[9]  0.216  0.512  1.000
> qplot(x,y)
> qplot(y,x)
> qplot(x, binwidth = 1)
> qplot(y, binwidth = 1)

>
> roll <- function() {
+     die <- 1:6
+     dice <- sample(die, size = 2, replace = TRUE,
+                    prob = c(1/8, 1/8, 1/8, 1/8, 1/8, 3/8))
+ sum(dice)
+ }
>
> rolls <- replicate(10000, roll())
>
> qplot(rolls, binwidth = 1)
conclusion
R is free to learn from online.

## Measure of Central values

The choice of the suitable measure depends on the two factors
1. The type of data available
If there generally acute avoid the mean deviations as well. If they have gaps around the quartiles the bottle division should be avoided. If there are open-end classes the quartile measure of dispersion should be praying for. If if they are generally rescued and dfu in number or contain extreme value avoid the standard deviation.
 Measure of dispersion

2. The purpose of investigation
vishwa make use of standard deviation for measuring variability.in an elementary treatment of statical series in which a measure of variability is desired only for itself any of three measures, namely range quartile deviation and average deviation would be acceptable. Probably the average deviations would be better.howrah in usual practise the measure of variability is employed in for the statistical analysis. It is free from those defects from which other measures.it lends itself to the analysis of variability in term of normal call of error.Practically all advanced statistical methods deal with variability and centre around the standard deviation.

## Function:

chol(x) Choleski decomposition col(x) Matrix with column numbers of the elements diag(x) Create a diagonal matrix from a vector ncol(x) Returns the number of columns of a matrix nrow(x) Returns the number of rows of a matrix qr(x) QR matrix decomposition row(x) Matrix with row numbers of the elements solve(A,b) Solve the system Ax=b solve(x) Calculate the inverse svd(x) Singular value decomposition var(x) Covariance matrix of the columns
 FUNCTION

## Difference between sampling and non sampling error

Sampling error
sampling gives rise to certain error known as sampling error the error could not be present in a complete enumeration survey the error can be controlled.

Biased Errors
This error arise from any biased in selection estimation, etc
For example if in place of simple random sampling deliberate sampling has been used in the particular case some wise is introduced in the Richard and hence such error are called biased sampling error.
Unbiased error
These error arises due to to chance difference between the member of population including in the same per and those not included. An error in statics is different between the value of static and that of corresponding parameter.
Causes of Bias
Bias may arise due to
Faculty process of selection
Faculty work during the collection
Faculty method of analysis.
Faculty selection
faculty selection of the sample main give rise to buyers in a number of ways
Deliberate selection of representatives sample
Conscious or unconscious bias in the selection of random sample.
Substitution
Non-response
Bias due to faculty collection of data
Any consistent error in measurement will give rise to buy whether the measurement are are carried out on a simple or on all the units of the population.
Bias in Analysis
in addition to base which arises from faculty process of selection and faculty collections of information, faculty method of analysis may also introduced by.
Avoidance of Bias
If possibility of bias exist fully objective conclusion cannot be drawn. The first essential of any sampling for sensors produce must therefore be the elimination of all sources of bias.
The sampling error usually decreases with increase in sample size and in fact in which situation the decreases is inversely proportional to the square root of the sample size as as can be seen. Sample survey to provide estimates within permissible margin of error instead of a complete enumeration survey as in the letter the F4 and the cost needed will be substantially higher due to the attempt to reduced sampling error  to zero.
Non sampling errors
non sampling error can occur at every stage of planning and execution of census for survey such error can arise due to a number of causes such as defective method of data collection and tabulation faculty definition incomplete coverage of the population for sample etc.

## Statistics and the state

In an Anderson x The rolling things and Chief have relied heavily on statistics in farming suitable military and physical policies. DJ statistics help in farming suitable policies.all ministries and department of government whether they be finance transports, defence, railway, commerce, posts and others industries.the transport department cannot solve the problem of transport in Delhi unless it knows how many buses are operating at present what is the total requirement and therefore, how many e additional bases be added to the existing fleet.

The higher the degree of accuracy of a businessman estimators the greater is the success attending on his business.Business activities can be grouped under the different heads.
Production
Sale
Purchase
Finance
Personal
Accounting
Market and product research
Quality control.
Statistics and economics
statical data and statical method are of immense help in the proper understanding of the economic problems and in the formulation of economic policies
What to produce
How to produce
For whom to produce
these are questions that need a lot of statical data in the absence of which it is not possible to arrive at correct decision. Statistics production help in adjusting the supply to demand.
Statistics and physical science
Static techniques have provided to be extremely useful in the study of all natural sciences like astronomy biology medicine logi zoology botany etc. Example one has to really heavily on statics in conducting experiments about the plant effect of temperatures type of soil etc.

### Statistics and research

statical method effect research in medicine and public health. There is a hardly any research work today that one can find complete without statical data and statical method
Also it is impossible to understand the meaning and implication of most of the research finding in various discipline of knowledge without having at least a speaking acquaintance with the subject of statistics.

## Data frames a very much like spreadsheets or tables,

but they are also a lot like databases. Some sort of happy medium. If you want to join two dataframes, it is the same a joining two databases.
[1] "GOOGL" "AAPL"  "CSCO"
>  csco = as.matrix (CSCO [ , 6 ] ),
> ibm = as.ma trix ( IBM [ , 6 ] )
> aapl = as.matrix(AAPL [ , 6 ] )
> csco = as.matrix (CSCO [ , 6 ] )
> google =as.ma trix (GOOGL [ , 6 ] )
> google =as.matrix (GOOGL [ , 6 ] )
> stkdata = cbind ( google , aapl , csco )
> dim( stkdata )
[1] 3307    3
> n = length ( stkdata [ , 1 ] )
> n
[1] 3307
> n = length (stkdata [ , 1 ] )
> rets = log(stkdata [ 2 : n , ] / stkdata[ 1 :( n−1 ),] )
> colMeans ( rets )
0.0005585866   0.0010300534   0.0002325053
> cv = cov ( rets )
> print ( cv , 2 )
>
> cr = cor(rets )
>
> print (cr,4 )
>
> x = matrix(rnorm ( 12 ) , 4 , 3 )
> x
[,1]      [,2]       [,3]
[1,]  0.88410486  1.176699  1.3999856
[2,]  0.09300388  0.676594 -0.7676680
[3,]  0.51237100  1.673757 -0.1873588
[4,] -2.12444302 -1.384905 -0.4829591
> print ( t(x) , 3 )
[,1]   [,2]   [,3]   [,4]
[1,] 0.884  0.093  0.512 -2.124
[2,] 1.177  0.677  1.674 -1.385
[3,] 1.400 -0.768 -0.187 -0.483
> print (t(x)%*%x , 3 )
[,1] [,2] [,3]
[1,] 5.57 4.90 2.10
[2,] 4.90 6.56 1.48
[3,] 2.10 1.48 2.82
> print( x%*%t(x) , 3 )
[,1]   [,2]  [,3]   [,4]
[1,]  4.126 -0.196  2.16 -4.184
[2,] -0.196  1.056  1.32 -0.764
[3,]  2.160  1.324  3.10 -3.316
[4,] -4.184 -0.764 -3.32  6.664
>
> cv_inv = solve ( cv )
>
> print ( cv_inv , 3 )
> print(cv_inv%*%cv , 3 )
>
> library("corpcor", lib.loc="~/R/win-library/3.6")
> is.positive.definite( cv )
[1] TRUE
> n = dim( data ) [ 1 ]
>
> n
NULL
> n = dim( google )
> n
[1] 3307    1
> s = data [ , 7 ]

kurtosis, skewness

> skewness ( rets )
0.4842460     -0.4681109     -0.5107760
>
> kurtosis (rets )
14.51975       10.18243       14.56774
>
> skewness (rnorm ( 1000000 ) )
[1] -0.0003929433
>
> kurtosis(rnorm ( 1000000 ) )
[1] 3.007031
> h = 1/252
>
> sigma = sd ( rets )/sqrt( h )
>
> sigma
[1] 0.2941101
> mu = mean( rets)/h+0.5*sigma^2
>
> mu
[1] 0.1962266
>  LL = function (params,rets ) {
+      alpha = params [ 1 ] ; sigsq = params [ 2 ]
+      logf = −log ( sqrt(2*pi*sigsq ) )
+     − ( rets−alpha )^2 / (2*sigsq )
+      LL = −sum( logf ) }
>
> LL
function (params,rets ) {
alpha = params [ 1 ] ; sigsq = params [ 2 ]
logf = -log ( sqrt(2*pi*sigsq ) )
- ( rets-alpha )^2 / (2*sigsq )
LL = -sum( logf ) }
> params = c ( 0.001 , 0.001 )