Access dashDB (or DB2) using ibm_db from Python notebook

IBM dashDB is a fully managed cloud data warehouse, purpose-built for analytics. In this article we will explore how to connect to it from a Python notebook and perform data access using ibm_db and Pandas.

To get started you will need credentials for a dashDB database. If you don't already have an instance of dashDB you can get one for free by following steps in this article.

The steps outlined below are included in this sample notebook (also available on the Jupyter notebook Welcome screen of your Workbench) which you can import into your Python notebook by entering the link for it in the search bar (in the top-right corner).



Start by importing the ibm_db python library to perform database operations.
import ibm_db
Next you will need to enter your database credentials:
#Enter the values for you database connection
dsn_driver = "IBM DB2 ODBC DRIVER"
dsn_database = "BLUDB"           # e.g. "BLUDB"
dsn_hostname = "<Enter hostname>" # e.g.: "awh-yp-small03.services.dal.bluemix.net"
dsn_port = "50000"               # e.g. "50000"
dsn_protocol = "TCPIP"           # i.e. "TCPIP"
dsn_uid = "<Enter userID>"       # e.g. "dash104434"
dsn_pwd = "<Enter password"     # e.g. "7dBZ3jWt9xN6$o0JiX!m"
Then create the connection:
dsn = (
    "DRIVER={{IBM DB2 ODBC DRIVER}};"
    "DATABASE={0};"
    "HOSTNAME={1};"
    "PORT={2};"
    "PROTOCOL=TCPIP;"
    "UID={3};"
    "PWD={4};").format(dsn_database, dsn_hostname, dsn_port, dsn_uid, dsn_pwd)

conn = ibm_db.connect(dsn, "", "")
And start to execute queries and work with your data:
# test query
query = "SELECT * FROM GOSALESDW.EMP_EXPENSE_FACT;"
# run direct SQL
stmt = ibm_db.exec_immediate(conn, query)
ibm_db.fetch_both(stmt)
The results will look like:
{0: 20111130,
 1: 11103,
 2: 43637,
 3: 4010,
 4: 2104,
 5: 8050,
 6: 15.0,
 7: '208.33',
 'ACCOUNT_KEY': 8050,
 'DAY_KEY': 20111130,
 'EMPLOYEE_KEY': 4010,
 'EXPENSE_TOTAL': '208.33',
 'EXPENSE_TYPE_KEY': 2104,
 'EXPENSE_UNIT_QUANTITY': 15.0,
 'ORGANIZATION_KEY': 11103,
 'POSITION_KEY': 43637}
If you want to perform more advanced analytics with dashDB data using Python, consider using the ibmdbpy package. Instructions for getting started with ibmdbpy and dashDB are covered in another KB article.

(Optional): You can also use Pandas data frames to process and manipulate tabular data:
import pandas
import ibm_db_dbi
pconn = ibm_db_dbi.Connection(conn)
df = pandas.read_sql('SELECT * FROM SAMPLES.COW', pconn)
df

COW_CODE COW_DESC
0 b Under 16 years/Never worked
1 1 Private for-profit company Employee
2 2 Private non-profit company Employee
3 3 Local government employee
4 4 State government employee
5 5 Federal government employee
6 6 Self-employed in own not incorporated
7 7 Self-employed in own incorporated
8 8 Working without pay in family business or farm
9 9 Unemployed and last worked 5 years ago or earl...
Sometimes, its useful to utilize Spark on data from relational databases. Data Scientist Workbench comes with a local Spark environment and is already enabled in Python notebooks. Lets put the data into a spark data frame, filter it and view the results:
from pyspark.sql import SQLContext
SQLContext = SQLContext(sc)

sdf = SQLContext.createDataFrame(df)
sdf.filter(sdf['COW_CODE']>4).show()
The filtered results display as:
+--------+--------------------+
|COW_CODE|            COW_DESC|
+--------+--------------------+
|       5|Federal governmen...|
|       6|Self-employed in ...|
|       7|Self-employed in ...|
|       8|Working without p...|
|       9|Unemployed and la...|
+--------+--------------------+

There are other tutorials that show you how to work with Pandas and Spark data frames.

Good luck!

Feedback and Knowledge Base