Introduction to Data Science with R
Data Science is a multidisciplinary branch created from various parental disciplines of software engineering, data engineering, business intelligence, scientific methods, visualization, statistics and a mishmash of many other disciplines. R is a statistical programming language which will help us analyzing the data in a very fine manner. In data science now a days R is playing a major role and creates a lot of scope to explore every day. This tutorial series explains how to perform Data Science application using R programming language. First let us go through R.
Ross Ihaka and Robert Gentleman created R language as an open source in 1995 to make it user-friendly in terms of doing
- Data analysis
- Graphical Models.
Why R is so popular?
What makes them unique with other software’s?
Advantages in R:
- Open-Source language
- More graphical interface usage
- More than 5000 packages available in the library
R packages are available at CRAN (Comprehensive R archive Network) library, Bio-conductor, Github.
Download and Installing R ( Windows OS) ( Operating Systems: Windows 8, windows 7, Windows Vista)
Step 1: Download the R software from https://cran.r-project.org/ website.
Step 2: Once the software get downloaded, go to download section in your system, Double-Click on the software to run it.
Step 3: Please select the language as your preference. Default language considered as English. Once selected press Ok button.
Step 4: Now, Pop-up box with R for windows version number will be appeared on the screen, press next button, which will show you the license before we use the software.
As it is open source software, we are free to use it.
Step 5: Press the next button, which show the path file where the software will install in system.
By default, it creates a folder in C:\Program Files\R\ R-version Number. If we require specific location to install, then press browse button at the end of the text box, and select the destination location, once done with the location, select the Next button.
Step 6: Select the components which all to be installed? – Default it selects all the files which will be helpful in future. No need to change. Directly press the Next button.
Step 7: Startup Options? Customized or Default
Either one we can select as “Yes” (Customized startups) or “No” (Default Startups). Press Next Button.
Step 8: If we need a Menu Folder separately, we can create or else we can go for next step. Press Next button.
Step 9: Press the Next button to start installation of Software.
Step 10: Once the Installation done, it creates a shortcut icon on the desktop.
Finally, the installation process was done successfully.
Check in the C:\Program Files of R file which exists, go to R folder and check in the library section which all packages are existing by default.
Getting started with R:
As R is a command line based language, all the commands are entered in console directly.
Always it’s good to start any programming language with a pocket calculator.
The command line starts with ( > ) symbol.
>` 1+2 #addition
>` 3-2 #subtraction
>` 4*5 #Multiplication
>` 2^3 #Exponential
>` sqrt (3) #square root
> log(10) logarithm function
- Numeric data
- Example: myname <- “what”
- Character data type
- Logical data type. (Boolean type – TRUE or FALSE)
R has 4 standard object types.
- Data Frames
Data Science applications in R
The first re-call in mind now a days whenever someone sounds a word Data science the next words comes as R as a supporting language. R is organized in many ways but let’s see what structure in which we proceed is.
- Gather the data required
- Loading the data into R. (Importing data into R)
- Data deduction/ Data reduction/ Data Cleaning
- Exploratory Data Analysis
- Building Models based on requirement
- Applying Machine learning algorithms
- Bringing out insights from the data
- Optimizing the data
Once we do all the above steps, the visualizations stand-out what R defines always. Most of the business decisions can be solved with the visualizations. We apply R programming language and statistical analysis techniques to explain marketing, business Intelligence and decision support for the company.