Connecting to Postgres from IBM Bluemix - Juypter Notebooks on Spark
- Create an account in bluemix(ibm offers 30 days free trial) - https://console.ng.bluemix.net/registration/
- Create a spark service (https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html)
- Now create notebook with scala as language.
-
Download the postgres jar using %Addjar method to add a third party jar.
%Addjar -f https://jdbc.postgresql.org/download/postgresql-9.4.1207.jre7.jar
-
Import the two classes SparkConf and SparkContext
import org.apache.spark.{SparkConf, SparkContext}
-
First statement simply creates a SparkConf configuration object from Spark's initial context "sc"
Then conf.setJars is magic statement that specify which all jars to be added to the new Sparkcontext we are going to create.(In this case as we have downloaded postgres driver jar, it will add this new jar to new spark context we created. (Simply copy paste the statement as it is so complex to modify:))
val conf = sc.getConf conf.setJars(ClassLoader.getSystemClassLoader.asInstanceOf[java.net.URLClassLoader].getURLs.map(_.toString).toSet.toSeq ++ kernel.interpreter.classLoader.asInstanceOf[java.net.URLClassLoader].getURLs.map(_.toString).toSeq) conf.set("spark.driver.allowMultipleContexts", "true") conf.set("spark.master","local[*]") val scPostgres = new SparkContext(conf)
-
Import the SQLContext class for further dataframe and other use
import org.apache.spark.sql.{SQLContext}
Simply replace url with your postgres url.
dbtable with name of the table for which you want to create dataframe.
replace user and password for your postgres database.
Note in url:- You can opt to remove sslmode argument depending on the configuration of the Postgres Server.
val url = "jdbc:postgresql://ec2-75-101-163-171.compute-1.amazonaws.com:5432/d7vad26hel3q5l?sslmode=require" val dbtable = "public.test" val user = "
" val password = " " val options = scala.collection.Map("url" -> url, "driver" -> "org.postgresql.Driver", "dbtable" ->dbtable,"user"->user,"password"->password) -
Now create new SQLContext from your new Spark Context which has postgres driver loaded
val ncsqlContext = new SQLContext(scPostgres)
-
Create a dataframereader from your SQLContext for your table
val dataFrameReader = ncsqlContext.read.format("jdbc").options(options)
-
Call the load method to create DataFrame for your table.
val tableDataFrame =
dataFrameReader
.load() -
Call show() method to display the table contents in the Notebook
tableDataFrame.show()
-
Download the postgres jar using %Addjar method to add a third party jar.
- You have successfully created a dataframe.
No comments:
Post a Comment