Importing Data from Excel
Excel is a spreadsheet application, which is widely used by many institutions to store data. This tutorial will give a brief of reading, writing and manipulating the data in Excel files using R. We will learn about various R packages and extensions to read and import Excel files. At the end of this section, we have written about some common problems encountered while loading Excel files and spreadsheet data.
Before we import the data from Excel spreadsheet into R, there are some standard practices to tone your data, to avoid any unnecessary error.
- The first column of the spreadsheet is used to identify the sample dataset, therefore it should be a unique key id. Similarly the first row is reserved for header, describing the scheme of the data.
- Concatenating words in the cells should be done using ‘.’. For example, ‘Sample.data’.
- The names and header of the data scheme should usually avoid symbols.
- All missing data points in the Excel spreadsheet should be indicated with ‘NA’.
Before you import the Excel data in R, you would need to set the console in R to working directory.
>getwd()//To get the working directory at the moment >setwd(“
Before we look into the packages available to extract data from Excel spreadsheet, we will show you simple R commands that can do the job. Utlis package is one of the core packages which contains bunch of basic utility functions and the following commands are part of this package.
”, header =TRUE)
The first argument of read.table() function is the name of the text file within the double quotes and if the data file has a header for data schema in the top row, the second argument will be true. This function will work for files, which are saved in .txt format.
Reading data from an excel file is incredibly easy and it can be done using several packages. You can export the Excel file to a Comma delimited file and import it using the method shown in the tutorial Importing Data from Flat Files in R. Another method to Import Data in R from Excel is using xlsx package, which I used to access Excel files. The first row should contain variable names.
//read in the excel sheet from workbook sample_excel.xls //variable name in the first row library(xlsx) sampledata<- read.xlsx(“sample_excel.xls”, sheetName=”sample_sheet1”)
It is necessary that while using read.xlsx function, we mention the sheet index or the sheet name. If the required dataset is bigger, then read.xlsx2() function is used.
Sample.data <- read.xlsx2(“sample_excel.xls”, sheetName=”sample_sheeet1”, startRow = 100, colIndex = 100)
Additionally in the function above, user can mention the end row or the data import can be limited to certain row and column index. xlsx package does a lot more than importing data from Excel files, it can also manipulate the data and write data frames into the spreadsheets. The data frames can be written to Excel workbook using the function write.xlsx().
>write.xlsx(Sample.data, “Sample_Sheet.xls”, sheetName=”sample_sheet1”)
Apart from the xlsx package, we have gdata package, which has functions that can read from data in the Excel format. gdata provides a cross platform solution for importing data from Excel files into R. The read.xls function in particular can read data from an Excel spreadsheet and gives data frame as output. Take for example a sample Excel spreadsheet, named ‘Sample_Sheet.xls’ and to use this method, you would require Perl runtime in your system.
>library(gdata)//Load gdata package >sample_data = read.xls(“Sample_Sheet.xls”)//Read data from the sheet
This function converts the Sample_Sheet.xls file into a temporary .csv or .tab limited file using Perl. While executing read.xls function, R will search for a path to the excel file and looks out for Perl on its way. If it doesn’t find perl.exe, then R will return an error. To avoid this error, another argument for the function can be given to search for the Perl executable file.
>sample.data <- read.xlsx(“Sample_Sheet.xlsx, sheetIndex = 1, perl = “C:/Perl/bin/perl.exe””)
gdata has several other functions to convert the Excel file into various other formats. Such as:
The input arguments for these functions are same as that for read.xlsx() function.
Another package that can do the job of importing data from Excel is the XLConnect package using the loadWorkbook function. This function can be used to read the entire workbook, followed by readWorksheet function to load the worksheets into R. Java is required to be pre-installed for this package to work. This package also provides function to create Excel workbooks, and export data to them.
>library(XLConnect) >Sample_Workbook = loadWorkbook(“Sample_Sheet.xls”) >Sample_Data = readWorksheet (Sample_Workbook, sheet=”Sheet1”)
Other arguments can also be added after the Index argument such as startCol or StartRow or endCol or endRow to indicate and limit the cells that are required to be imported from the Excel workbook. Another argument ‘region’ can also be used in this function to highlight the range of starting and ending rows and columns.