Thursday, March 17, 2016

Connecting to MongoDB from IBM Bluemix - Juypter Notebooks on Spark

Connecting to MongoDB from IBM Bluemix - Juypter Notebooks on Spark

  1. Create an account in bluemix(ibm offers 30 days free trial) - https://console.ng.bluemix.net/registration/

  2. Create a spark service (https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html)

  3. Now create notebook with scala as language.
    1. Add unityJDBC jar which has mongodb driver.
      %Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/unityjdbc.jar
      
      
      
    2. Add Mongo Java Driver jar which unityJDBC need
      %Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/mongo-java-driver-2.13.3.jar
      
      
      
    3.  Test below import
      import mongodb.jdbc.MongoDriver
       
      
    4. Import the two classes SparkConf and SparkContext
      import org.apache.spark.sql.{DataFrame, SQLContext} 
    5. Simply replace url with your mongodb url.
      dbtable with name of the table for which you want to create dataframe.
      replace user and password for your db2 database server.
      val url = "jdbc:mongo://ds045252.mlab.com:45252/samplemongodb"
      val dbtable = "Photos"
      val user = "charles2588"
      val password = "*****"
      val options = scala.collection.Map("url" -> url,"driver" -> "mongodb.jdbc.MongoDriver","dbtable" ->dbtable,"user"->user,"password"->password)
      

    6. Now create new SQLContext from your new Spark Context which has db2 driver loaded
      val sqlContext = new SQLContext(sc)

    7. Create a dataframereader from your SQLContext for your table
      val dataFrameReader = sqlContext.read.format("jdbc").options(options)

    8. Call the load method to create DataFrame for your table.
      val tableDataFrame = dataFrameReader.load()

    9. Call show() method to display the table contents in the Notebook
      tableDataFrame.show()

  4. You have successfully created a dataframe from mongodb , now you can do further processing according to your need.
  5.