R is a language and software environment for Statistical Computing. The Data being imported into R would be mostly a variation of spreadsheet-like text file. The easiest form of data to import into R is through a text file.

### Principal Component Analysis Tutorial

Principal Component Analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called Principal Components.

### Pandas Tutorial Part-3

This part of the tutorial will focus on intermediate topics like

- handing strings and dates
- reshaping data
- plotting

### Pandas Tutorial Part-2

This part of the tutorial will focus on intermediate topics like

- by-group processing,
- merging or concatenating data

### Pandas Tutorial Part-1

Pandas is a software library focused on fast and easy data manipulation and analysis in Python. In particular, it offers high-level data structures (like DataFrame and Series) and data methods for manipulating and visualizing numerical tables and time series data. It is built on top of NumPy and is highly optimized for performance (about 15x faster), with critical code paths written in Cython or C. The ndarray data structure and NumPy’s broadcasting abilities are heavily used.

Pandas creator Wes McKinney starting developing the library in 2008 during his tenure at AQR, a quantitative investment management firm. He was motivated by a distinct set of data analysis requirements that were not well-addressed by any single tool at his disposal at the time.

### Tutorial- Hadoop Multinode Cluster Setup on Ubuntu

This tutorial is a step-by-step guide for installation of Hadoop multinode cluster on Ubuntu 12.04. Tutorial on how to set up Hadoop multi node cluster on Ubuntu, Hadoop Map-Reduce and YARN configuration,create hdfs storage directories on multi nodes.

### Data Visualizations Tools in R

This tutorial contains installation giudelines, getting started and examples for the data visualizations packages for R. The packages covered in this tutorials are GGPlot, GGVis, Lattice.

### R Statistical and Language tutorial

R is a programming language and software provider for statistical computing and graphical visualization. It has many features which has in-built functions as well as functional coding. Both the ways it can be done in R. R is a freely available under GNU general public License. R provides a wide variety of statistics and graphical techniques which includes both linear and non-linear models, time series analysis, classification analysis, clustering, forecasting, classical test and many more.

Now a days R has become data mining tool as it is used by many data miners. R has only static graphics. But if we need dynamic graphics, which requires special packages need to be installed.

### Introduction to Data Science with R

Data Science is a multidisciplinary branch created from various parental disciplines of software engineering, data engineering, business intelligence, scientific methods, visualization, statistics and a mishmash of many other disciplines. R is a statistical programming language which will help us analyzing the data in a very fine manner. In data science now a days R is playing a major role and creates a lot of scope to explore every day. This tutorial series explains how to perform Data Science application using R programming language.

### Apache Pig Tutorial: User Defined Function Example

This case study of Apache Pig programming will cover how to write a user defined function. The example of student grades database is used to illustrate writing and registering the custom scripts in Python for Apache Pig. The theme of this example is to analyze the performance of students. The database in question, contains student, subject and score details. The custom script presented in this case study build using Python calculates the weighted average or grade point average of the student.