Data Science in R Programming

Question 1

Why should I learn R programming for a Data Science career?

Answer

R is one of the most prominent and powerful tools - that is used to extract, clean and build models on a huge amount of data and it has been used in all major companies by leading data scientists. It is one of the easiest tools to learn and implement, for data analysis and it is required that one should know R programming in order to get a job in the field of data analysis.

Question 2

What are the pre-requisites to learn Data Science in R?

Answer

The pre-requisites to learn Data Science in R is pretty straightforward. You need to have a strong aptitude for numbers, basic programming exposure and college level mathematics mastery.

Question 3

Who will be my faculty?

Answer

All the faculty are leading Data Scientists in multi national analytics firms. They have all been approved to teach Data Science at ProjectPro, after going through a series of stringent tests. So you can be assured that whatever you are learning is cutting edge and industry relevant.

Question 4

What will I learn in this course?

Answer

We will begin the course by covering basic syntax in R programming like - small programs to handle data, basic statistical concepts and then move on to different statistical methods to drive or summarize the data to get conclusions. The next level will be to implement all the statistical concepts in R - to solve data analysis problems. The last level will be implementing machine learning techniques to solve real industry problems.

Question 5

What is Back-propagation learning for Neural Networks?

Answer

In simple terms, back-propagation learning for Neural networks is gradient descent method. In this method, random weights are initialized to the nodes of the neural network.
Forward propagation through the layers is done to save the output of each layer. Then an error variable is calculated by computing the difference between desired output and actual output. Then the model is back propagated to find the error at each layer, weights adn bias of each layers are updated to minimize the error at each layer. The same process is repeated till the error variable reaches the threshold.

Question 6

What are the advantages of Neural Network over Support Vector Machines?

Answer

Multi-layer feed forward networks of Artificial Neural Networks are comparable to Support Vector Machines. The clear benefit for these models over SVM is the fact that these are parametric models with fixed node size, while SVM's are non-parametric. Any artificial Neural Network is made up of multiple hidden layers with variable number of nodes and bias parameters depending upon number of features. On the other hand, an SVM is consisted of a set of support vectors with assigned weights calculated from training set.
One of the key advantages of Neural Networks is that they have multiple outputs, whereas any SVM will only produce one output. Therefore to create an n-ary classifier with SVM, we need to create n SVMs and train them separately; while n-ary classifier using a Neural Network can be trained in a single instance.

Question 7

What does .SD stand for in data.table in R?

Answer

SD is a data.table containing the subset of data for every group, excluding the columns of group. It is to be used when grouping by 'i', when keying by 'by', grouping by 'by' and ad hoc 'by'.
.SD stands for 'Subset of Data.Table'. The full stop in the beginning of character SD is to avoid match with any user-defined column name.
Consider a data.table:

DT = data.table(x=rep(c("a","b","c"),each=2), y=c(1,3), v=1:6)
setkey(DT, y)
DT
#      y x v
# [1,] 1 a 1
# [2,] 1 b 3
# [3,] 1 c 5
# [4,] 3 a 2
# [5,] 3 b 4
# [6,] 3 c 6

Instead of this, you can use .SD :

DT[, .SD[,paste(x,v, sep="", collapse="_")], by=y]
#      y       V1
# [1,] 1 a1_b3_c5
# [2,] 3 a2_b4_c6

Question 8

How to count consecutive patterns in string using R?

Answer

Consider a random string input with recurring characters. The objective of this task is to count the consecutive patterns and print them along with the character of the repetitive strings. This can be done in R using the dplyr library. The code for which is mentioned below:

library(dplyr)
library(stringi)
library(tidyr)

//strings input = "Z,Z,Z,Y,X,X,W,W"
data_frame(test_string = input) %>%
  group_by(test_string) %>%
  do(.$test_stringstring %>%
       first %>%
       stri_split_fixed(",") %>%
       first %>%
       rle %>%
       unclass %>%
       as.data.frame) %>%
  summarize(output_string = paste(lengths, values, collapse = " , "))

Question 9

How to convert lists of different length vectors to data.frame in R?

Answer

Consider a list which contains different length vectors, which needs to be converted to Data.Frame. For example, here is input for a list with two columns with unequal length vectors:

SampleList <- list(A=c(1,2,3),B=c(1,2,3,4,5,6))

There are quite a few ways to accomplish this conversion, some of the notable ones are mentioned below:

data.frame(lapply(SampleList, "length<-", max(lengths(SampleList))))

or

ListToDataFrame <- function(SampleList){
  sapply(SampleList, "length<-", max(lenghts(SampleList)))
}

Then the following command can be executed for any list by simply calling the function.

ListToDataFrame(SampleList)

Question 10

How to calculate mean and variance of a data set in R?

Answer

Suppose you have the following data type:

	Name	Value
1	A	10
2	B	12
3	C	11
4	D	13

and so on. The objective is to calculate mean and variance of the data set in R. There are various commands which can used to do the job:

with (df, tapply(Value, Name, function(x) c(mean(x), var(x))))

or

aggregate(Value ~ Name, df, function (x) c(mean(x), var(x)))

or

do.call(rbind, by(df, df$Name, function(x) c(mean(x$Value), var(x$Value))))

or

library(data.table)
setDT(df)[, list(Var=var(value), Mean = mean(value), by = Name)]

Question 11

How to use Linear Regression and Group by function in R?

Answer

A linear regression in R can be performed using either lme4 package or the plyr package or the nlme approach. For a data set which has multiple vectors, a mixed linear model will be a better approach.

**

library(nlme)
lme(response ~ vector1, random = ~vector1|state1, correlation = corAR1(~vector1))

**

require(base) 
library(base) 
attach(data) # data = your data base
        #state is your label for the states column
modell<-by(data, data$state, function(data) lm(y~I(1/var1)+I(1/var2)))
summary(modell)

**

Question 12

What is the Neural Network Activation function in R?

Answer

An activation function converts the weighted inputs of nodes in a Neural Network to its output activation. There are various activation functions used with Neural Networks, below mentioned is a list of few:

Step Function
Linear Combination
Continuous Log-Sigmoid Functions
Continuous Tan-Sigmoid Functions
Softmax Functions

Question 13

What is the difference between Data Science, Big Data and Business Analytics?

Answer

Big Data is the term used to refer to high volume of data, that can be generated from various sources and in different formats. Big Data are often complex and large enough to be processed by traditional database management techniques. Data Science is term which refers to the discipline of analyzing the data. A data scientist creates knowledge out of the data using traditional and non-traditional tools and techniques.

Business analytics is usually followed by Data Science applications. A Business analyst gathers insight from the previous business performance and results obtained by data analytics.

Data Science in R Programming

Get our detailed course curriculum

Data Science in R Programming Training in 30 days

Data Science in R Programming

Project Portfolio

Real world Projects

Lifetime Access & 24x7 Support

Weekly 1-on-1 meetings

How will this help me get jobs?

How will I benefit from the Mentorship Track with Industry Expert?

How will this Data Science in R training benefit me?

What if I have any doubts?

Do you provide placements?

Introduction to Data Science Methologies

Correlation / AssociationRegressionCategorical variables

Data Preparation

Logistic Regression

Cluster AnalysisClassification Models

Introduction and to Forecasting Techniques

Advance Time Series Modeling

Stock market prediction

Pharmaceuticals

Market Research

Machine Learning

Machine Learning

Fraud Analytics

Text Analytics

Social Media Analytics

20+ Natural Language Processing Datasets for Your Next Project

30+ Python Pandas Interview Questions and Answers

Data Products-Your Blueprint to Maximizing ROI

The Economist Intelligence Unit finds UK Companies to seriously lack Data Exploitation skills. July 27, 2016. ComputerWeekly.com

Data Science and Democracy: A delicate balance. July 19, 2016. DemocraticAudit.com

Elena Grewal to head the team of Data Scientists at AirBnB. July 9, 2016. LATimes.com

The 2016 Leaderboard for Data Science Game has just been released. July 2, 2016. DZone.com

What do recruiters look for in a Data Scientist? June 27, 2016. Dataconomy.com

Senior Data Scientist

Junior Data Scientist

Senior Director, Data Science