Showing posts with label ibm data science experience. Show all posts
Showing posts with label ibm data science experience. Show all posts

Thursday, July 28, 2016

Connect to Cloudant database from SparkR

How to Connect to Cloudant Database from SparkR kernel



Below i will show how to do it from Bluemix but it will apply  Jupyter Notebook running on any environment.

Connecting to Cloudant from IBM Bluemix - Juypter Notebooks on Spark


  1. Create an account in bluemix(ibm offers 30 days free trial) - https://console.ng.bluemix.net/registration/

  2. Create a spark service (https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html) 
  3. Now create notebook with sparkR as language.  
    • spark context needs to know which driver to use to connect to Cloudant database. In bluemix spark service enivornment the driver is loaded by default. 
    •  https://github.com/cloudant-labs/spark-cloudant
    •  Also if you are in different environment, you use binary
    • https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.3/cloudant-spark-v1.6.3-125.jar
    • For ex. use %Addjar -f https://github.com/cloudant-labs/spark-cloudant/releases/download/v1.6.3/cloudant-spark-v1.6.3-125.jar to add it to your spark.
  4. Once you have spark-cloudant connector in your spark.
  5.  You are going to need to have 3 configuration parameters set for you spark context
    • cloudant.host","ACCOUNT.cloudant.com"
      "cloudant.username", "USERNAME"
      "cloudant.password","PASSWORD"
       
      So in sparkR, you would need to use one of the sparkEnv variable
      to pass your environment variables to all the executors.
      sc <- sparkR.init(sparkEnv = list("cloudant.host"="c8dca934-d2a4-4dcc-9123-2189ce9f5812-bluemix.cloudant.com","cloudant.username"="c8dca934-d2a4-4dcc-9123-2189ce9f5812-bluemix","cloudant.password"="XXXXXXXXXXXXXXXXXXXX")) 
       
       
      Once you execute above. Your sparkcontext is ready to use cloudant-connector.
      All you need to do is specify that you are reading using com.cloudant.spark
       
      people <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true') 
      
      
      I have complete Notebook published on this github repo. Feel Free to use it.

Thursday, June 9, 2016

Free Beta Data Science Tools with Spark

Below are the links to beta programs / community editions to allow to test your spark programs on spark servers without having to setup anything.

IBM

Sign Up for IBM Data Science Experience. Beta wait-list.
http://datascience.ibm.com/


Databricks

Sign up for Community Edition
This gives you free spark instance. Beta wait-list.

https://databricks.com/try-databricks