Spark Summit East Keynote: Apache Spark 2.0
1. Databricks Community Edition
2. Download and set it up
Major features:-
- Tungsten Phase 2 speedups of 5-10x
- Structured Streaming real-time engine on SQL/DataFrames
- Unifying Datasets and DataFrames
%Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/unityjdbc.jar
%Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/mongo-java-driver-2.13.3.jar
import mongodb.jdbc.MongoDriver
import org.apache.spark.sql.{DataFrame, SQLContext}
Simply replace url with your mongodb url. val url = "jdbc:mongo://ds045252.mlab.com:45252/samplemongodb" val dbtable = "Photos" val user = "charles2588" val password = "*****" val options = scala.collection.Map("url" -> url,"driver" -> "mongodb.jdbc.MongoDriver","dbtable" ->dbtable,"user"->user,"password"->password)
val sqlContext = new SQLContext(sc)
val dataFrameReader = sqlContext.read.format("jdbc").options(options)
val tableDataFrame =
dataFrameReader
.load()
tableDataFrame.show()