This tutorial contains commands to import data from web pages into R. There are three main topics explored in this article, importing .txt data from a web source, fetching data from an HTML table and downloading XML or JSON data content from webpages.
This tutorial contains a breif guideline to import data in R from Relational Database. As mentioned in this tutorial, there are various packages available on CRAN and even for non-relational databases like Hadoop, MongoDB, etc. Examples of RMySQL and RODBC packages are included in this tutorial.
Excel is a spreadsheet application, which is widely used by many institutions to store data. This tutorial will give a brief of reading, writing and manipulating the data in Excel files using R. We will learn about various R packages and extensions to read and import Excel files. At the end of this section, we have written about some common problems encountered while loading Excel files and spreadsheet data.
Machine Learning according to Tom Mitchell at Carnegie Mellon University, is a process when “A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E ”. In simple words, think of a task of predicting traffic patterns at a busy intersection (Task T), you can run the data of previous traffic patterns (Experience E) through a machine learning algorithm and upon successfully learning, the program will improve the future traffic pattern prediction (Measure P).
Regression is a statistical way to establish a relationship between a dependent variable and a set of independent variable(s).
Logistic regression measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution.
SVM is a machine learning technique to separate data which tries to maximize the gap between the categories (a.k.a Margin).
Clustering helps to group similar data points together while these groups are significantly different from each other. Clustering is used in un-supervised learning where we don’t have any knowledge about dependent variable.
dplyr contains mainly 5 verbs and these verbs make up the majority of data manipulation which corresponds to common tasks you might to perform on a table of data. These verbs are known as select(), filter(), mutate(), arrange(), summarise().
dplyr is a well known R-package for data manipulation. dplyr is an upgraded version of plyr package and both package written and maintained by Hadley Wickham. It is focused on tools for working with data frame (hence the d in its name).