Big Data Authority

Monday, June 13, 2016

Spark 2.0 is out

Spark Summit East Keynote: Apache Spark 2.0

How do you get your hands on Spark 2.0 :-
1. Databricks Community Edition
2. Download and set it up

Major features:-

Tungsten Phase 2 speedups of 5-10x
Structured Streaming real-time engine on SQL/DataFrames
Unifying Datasets and DataFrames

Thursday, June 9, 2016

Running your first R notebook on IBM Bluemix Apache Spark Service

Running your first R notebook on IBM Bluemix Apache Spark Service

IBM Bluemix Apache Spark Service have introduced R -tech preview for allowing users to run R programs on spark cluster.
https://developer.ibm.com/clouddataservices/docs/spark/technical-previews/r-in-jupyter-notebooks/
So how do you get yourself started on R notebook on Spark.

You would need to create new instance of the service as tech preview was introduced in May 2016. Please check it out.
I have a simple example of PI Calculator here, if you just want to import and give the service a try:- https://github.com/charles2588/bluemixsparknotebooks/raw/master/R/Pi_Bluemix.ipynb

Free Beta Data Science Tools with Spark

Below are the links to beta programs / community editions to allow to test your spark programs on spark servers without having to setup anything.

IBM

Sign Up for IBM Data Science Experience. Beta wait-list.
http://datascience.ibm.com/

Databricks

Sign up for Community Edition
This gives you free spark instance. Beta wait-list.

https://databricks.com/try-databricks

Thursday, April 7, 2016

PANCAKE STACK -- New Data Science Stack

Presto
Arrow
NiFi
Cassandra
AirFlow
Kafka
ElasticSearch
Apache-Spark
TensorFlow
Algebird
CoreNLP
Kibana

Architecture:-

Thursday, March 17, 2016

Connecting to MongoDB from IBM Bluemix - Juypter Notebooks on Spark

Create an account in bluemix(ibm offers 30 days free trial) - https://console.ng.bluemix.net/registration/

Create a spark service (https://www.ng.bluemix.net/docs/services/AnalyticsforApacheSpark/index.html)

Now create notebook with scala as language.
1. Add unityJDBC jar which has mongodb driver.
  %Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/unityjdbc.jar
2. Add Mongo Java Driver jar which unityJDBC need
  %Addjar https://github.com/charles2588/SparkNotebooksJars/raw/master/mongo-java-driver-2.13.3.jar
3. Test below import
```
import mongodb.jdbc.MongoDriver
```
```
 
```
4. Import the two classes SparkConf and SparkContext
  import org.apache.spark.sql.{DataFrame, SQLContext}
5. Simply replace url with your mongodb url.
  dbtable with name of the table for which you want to create dataframe.
  replace user and password for your db2 database server.
```
val url = "jdbc:mongo://ds045252.mlab.com:45252/samplemongodb"
val dbtable = "Photos"
val user = "charles2588"
val password = "*****"
val options = scala.collection.Map("url" -> url,"driver" -> "mongodb.jdbc.MongoDriver","dbtable" ->dbtable,"user"->user,"password"->password)
```
6. Now create new SQLContext from your new Spark Context which has db2 driver loaded
  val sqlContext = new SQLContext(sc)
7. Create a dataframereader from your SQLContext for your table
  val dataFrameReader = sqlContext.read.format("jdbc").options(options)
8. Call the load method to create DataFrame for your table.
  val tableDataFrame = dataFrameReader.load()
9. Call show() method to display the table contents in the Notebook
  tableDataFrame.show()

You have successfully created a dataframe from mongodb , now you can do further processing according to your need.

Big Data Authority

Pages

Monday, June 13, 2016

Spark 2.0 is out

Thursday, June 9, 2016

Running your first R notebook on IBM Bluemix Apache Spark Service

Free Beta Data Science Tools with Spark

Thursday, April 7, 2016

PANCAKE STACK -- New Data Science Stack

PANCAKE STACK -- New Data Science Stack

Thursday, March 17, 2016

Connecting to MongoDB from IBM Bluemix - Juypter Notebooks on Spark

Blog Archive