Access dashDB (or DB2) using ibmdbR from R notebook

IBM dashDB is a fully managed cloud data warehouse, purpose-built for analytics. In this article we will explore how to connect to it from a R notebook and use ibmdbR to perform in-database analytics.

To get started you will need credentials for a dashDB database. If you don't already have an instance of dashDB you can get one for free by following steps in this article.

Next, create a new Jupyter notebook in Data Scientist Workbench. Set it to use R.



The ibmdbR package (including the RODBC package that it depends on) and the dashDB (DB2) ODBC (CLI) driver are already installed and configured to use in your R Jupyter notebook in Data Scientist Workbench.

Create a new cell and load the ibmdbR package:
library(ibmdbR)
Enter the other connection details for your instance of dashDB (or DB2 database).
#Enter the values for you database connection
driver.name <- "DB2"   # this matches entry in odbc.ini for DB2 driver
db.name <- "BLUDB" # e.g. BLUDB
host.name <- "hostname" # e.g. awh-yp-small03.services.dal.bluemix.net
port <- "50000" # e.g. 50000
protocol="TCPIP" # i.e. TCPIP
user.name <- "username" # e.g. dash104434
pwd <- "password" # e.g. your secret password xxxx
Run the cell above. Create a new cell to create the connection and initialize:
con.text <- paste(driver.name,
                  ";Database=",db.name,
                  ";Hostname=",host.name,
                  ";Port=",port,
                  ";PROTOCOL=",protocol,
                  ";UID=", user.name,
                  ";PWD=",pwd,sep="")
con <- idaConnect(con.text)
idaInit(con)

Let's make sure the connection is working by getting the list of tables:
sqlTables(con,tableType="TABLE")
Here is an excerpt from the output:
TABLE_CAT     TABLE_SCHEM                   TABLE_NAME TABLE_TYPE
...
135 GOSALESRT ACTIVITY_STATUS_LOOKUP TABLE 136 GOSALESRT RETAILER TABLE 137 GOSALESRT RETAILER_ACTIVITY TABLE 138 GOSALESRT RETAILER_CONTACT TABLE 139 GOSALESRT RETAILER_SITE TABLE 140 GOSALESRT RETAILER_SITE_MB TABLE 141 GOSALESRT RETAILER_TYPE TABLE 142 SAMPLES ANCESTRY TABLE 143 SAMPLES CITSTATUS TABLE 144 SAMPLES COW TABLE 145 SAMPLES CUSTOMER_ACQUISITION TABLE 146 SAMPLES CUSTOMER_CHURN TABLE 147 SAMPLES CUST_RETENTION_DEMOGRAPHICS TABLE 148 SAMPLES CUST_RETENTION_LIFE_DURATION TABLE
You can use the idaQuery() function to perform SQL queries on the database and put the results in a database. Let's fetch rows from a sample table and display the contents of the dataframe:
tableName <- "SAMPLES.COW"
df <- idaQuery("SELECT * FROM ",tableName)
df
Here is the output:
   COW_CODE                                           COW_DESC
1         b                        Under 16 years/Never worked
2         1                Private for-profit company Employee
3         2                Private non-profit company Employee
4         3                          Local government employee
5         4                          State government employee
6         5                        Federal government employee
7         6              Self-employed in own not incorporated
8         7                  Self-employed in own incorporated
9         8     Working without pay in family business or farm
10        9 Unemployed and last worked 5 years ago or earlier 
 
Of course, the power of ibmdbR lies in "pushing down many basic and complex R operations into the database, which removes the main memory boundary of R and allows you to make full use of parallel processing in the underlying database". The ibmdbR library contains many functions for performing such operations which can be explored by going to:
https://cran.r-project.org/web/packages/ibmdbR/ibmdbR.pdf

There are also several articles examples on IBM developerWorks utilizing ibmdbR for in database analytics.

Finally, as a best practice we should close the database connection once we're done with it.
idaClose(con)

Feedback and Knowledge Base