Tuesday, February 2, 2016

Use Python to upload files to swift storage.

Use Python to upload files to swift storage.

NYPD Motor Vehicle Accidents
From Bluemix example NYPD Motor Vehicle Accidents, use the code upto the point where we create spark dataframe.

# adding the PySpark modul to SparkContext
sc.addPyFile("https://raw.githubusercontent.com/seahboonsiew/pyspark-csv/master/pyspark_csv.py")
import pyspark_csv as pycsv

collisions = sc.textFile("swift://hivecontainer." + credentials['name'] + "/NYPD_Motor_Vehicle_Collisions.csv")

# create Spark DataFrame using pyspark-csv
collisions_df = pycsv.csvToDataFrame(sqlContext, collisions_body, sep=",", columns=collisions_header_list)

#save the Spark DataFrame to local storage
collisions_df.toPandas().to_csv('mycsv.csv’)

# Now there are two ways to upload this resultant csv file back to swift object storage

#1. You can use install swift CLI and run swift commands with !(magic) https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html#using-swift-cli but that
# seems to erring out since python-dev module and other module required for “!pip —user install python-swiftclient” and  pip install python-keystoneclient and 
# pip install urllib3 certifi pyopenssl

#2. You can use !(magic) with curl and swift storage REST API (which is tested and working fine)

!curl -i -H "Content-Type: application/json" -d '{"auth": {"identity": {"methods": ["password"],"password": {"user": {"id": "cc8b1374d0de412fa1c7e201a4e90bce","password": "jo..F04#N(bDR1OZ"}}},"scope": {"project": {"id": "e4321c16ed084c06a9dc62ba810a61bf"}}}}'  https://identity.open.softlayer.com/v3/auth/tokens

HTTP/1.1 201 Created
Date: Wed, 03 Feb 2016 01:43:18 GMT
Server: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_wsgi/3.4 Python/2.7.5
X-Subject-Token: gAAAAABWsVs3GT2stF………
Vary: X-Auth-Token
x-openstack-request-id: req-fb9adbc9-425d-4551-817b-9c19995f3107
Content-Length: 17448
Content-Type: application/json

Copy the X-Subject-Token value and save it to token variable for multiple uses.

token = “gAAAAABWsVs3GT2stFDdr…"

Now to get the size of the exported file run following
!ls -l mycsv.csv
-rw-r--r-- 1 s027-20bcfe6e4297e8-2c631c8ff999 users 145549225 Feb  1 15:08 mycsv.csv


Now replace the content-length value with the size of the exported file and replace other parameters to form object storage URL
as explain in the link specified here
https://www.ng.bluemix.net/docs/services/ObjectStorage/index.html#using-swift-restapi 

!curl -X PUT -H "X-Auth-Token:$token" -H "Content-Length: 145549225" https://dal.objectstorage.open.softlayer.com/v3/AUTH_e4321c16ed084c06a9dc62ba810a61bf/hivecontainer/mycsv.csv -T mycsv.csv


You can now verify that mycsv.csv is uploaded to swift storage by going to swift service.


Thanks,
Charles.

Monday, February 1, 2016

Connecting to any DB2 or (SQLDB powerwd by db2 on Bluemix) from IBM Bluemix - Juypter Notebooks on Spark

  1. Create an account in bluemix(ibm offers 30 days free trial) - https://console.ng.bluemix.net/registration/

  2. Create a spark service (https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html)

  3. Now create notebook with scala as language.
    1. Import the two classes SparkConf and SparkContext
      import org.apache.spark.sql.{DataFrame, SQLContext} 
    2. Simply replace url with your db2 url.
      dbtable with name of the table for which you want to create dataframe.
      replace user and password for your db2 database server.
      val url = "jdbc:db2://75.126.155.153:50000/SQLDB"
      val dbtable = "USER13878.MYTABLE"
      val user = "USER13878"
      val password = "<enter your password>"
      val options = scala.collection.Map("url" -> url, "driver" -> "com.ibm.db2.jcc.DB2Driver", "dbtable" ->dbtable,"user"->user,"password"->password)

    3. Now create new SQLContext from your new Spark Context which has db2 driver loaded
      val sqlContext = new SQLContext(sc)

    4. Create a dataframereader from your SQLContext for your table
      val dataFrameReader = sqlContext.read.format("jdbc").options(options)

    5. Call the load method to create DataFrame for your table.
      val tableDataFrame = dataFrameReader.load()

    6. Call show() method to display the table contents in the Notebook
      tableDataFrame.show()


  4. You have successfully created a dataframe for your db2 table , now you can do further processing according to your need.
  5.