Importing Data from Relational Database
R has certain limitation regarding the processing of extremely large dataset and doesn’t support concurrent access to data. A Relational Database Management System on the other hand is capable of providing fast access to selected portions of the big data, and can provide concurrent access from multiple users running on various hosts. There are various R packages that can be used to communicate with RDBMS, each with a different level of abstraction. Some of these packages have the functionality of copying entire data frames to and from databases. Some of the packages available on CRAN for importing data from Relational Database are:
- RSQLite (This packages is used for bundled DBMS SQLite)
- RJDBC (This package uses Java and can connect to any DBMS with a JDBC driver)
- RMongo (This is an R interface for Java Client with MongoDB)
RMySQL package is an interface to MySQL DBMS. The current version of this package requires DBI package to be pre-installed.
The function dbDriver(“MySQL”) returns the database connection manager object, which can be followed up by functions like dbConnect and dbDisconnect to open a database connection and close the connection, respectively. Packages such as RSQLite, RPostgreSQL and ROracle needs to be installed before working on these DBMS using their respective call functions, dbDriver(“SQLite”), dbDriver(“RPostgreSQL”), dbDriver(“Oracle”).
- dbGetQuery sends the queries and fetches results as the data frame.
- dbSendQuery only sends the query and returns an object of class inheriting from “DBIResult”, this object of class can be used to fetch the required result.
- dbClearResult removes the result from cache memory.
- fetch returns few or all rows that were asked in query. The output of fetch function is a list.
- dbHasCompleted is used to check is all the rows are retrieved.
- dbReadTable and dbWriteTable functions are used to read and write the tables in Database from an R data frame.
>library(RMySQL) >connection <-dbConnect(dbDriver(“MySQL”), dbname = ”Test_Database”) ## Assuming that we are using MySQL tables for DBMS >dbListTables(connection) ##Loading a data frame into database >data(Sample_Data) >dbWriteTable(connection, “Column1”, Sample_Data, overwrite=True) ## To read the Column1 of in the database >dbReadTable(connection, “Column1”) ## Selecting from the loaded table as a query >dbGetQuery(Connection, paste(Row_Name, Variable_Name, Condition))
RODBC package provides an interface to database sources supporting on ODBC interface. This package is widely used because it needs the same R code to import data from different database system. RODBC package works on OS X, Windows and Unix/Linux across most of the database systems such as, MySQL, Microsoft SQL Server, Oracle, PostgreSQL.
- odbcConnect and obdcDriverConnect functions are used to open a database connection.
- odbcGetInfo function will provide details about the client and server.
- odbcClose function is used to close the database connection.
- sqlSave function saves the R data frame in argument into the database table.
- sqlFetch does the opposite, by saving the database table into an R data frame.
- sqlQuery is used to send an SQL query to the database, which returns an R data frame.
In the example mentioned below, we are using a PostgreSQL with an ODBC driver.
>library(RODBC) >connection <- odbcConnect(“Sample_Database”, uid=”Name”, case=”tolower”) >data(Sample_Data) >sqlSave(connection, Sample_Data, rowname=”Row1”, addPK=TRUE) >rm(Sample_Data) >sqlQuery(connection, “Select_Column, Condition”)
Importing Data from Non-Relational Database
R also has packages that support non-relational database for data import.
- rhbase is used for Hadoop Distributed File System
- RCassandra is used for Cassandra Database system
- Rmongodb is used for MongoDB.
>Library(rmongodb) >SampleDatabase <- “Test_Database” >MyMongoDB <- mongo.create(db=SampleDatabase) ## To insert a list >mongo.insert(MyMongoDB, “Test_Database.Column1”, list.name)