Big Data Authority: June 2016

Friday, June 17, 2016

Difference Between Spark and Hadoop

Difference	Spark	Hadoop Map-reduce
1. Perfomance	Itertaive computations are performed in-memory, the mapper functions just transform one RDD to another RDD, resulting in saving disk io,network io and improving performance	Map and Reduce phases cause every mapper/reducer to write data to disk after mapping and then successive mapper/reducer to read from it, thus resulting in disk io,network io, causing latency
2. Programming Languages	Scala,Java,Python,R	Java
3. Basic Unit of Data	RDD - Resilient Distributed Dataset	Tuples
4. Lines of Code for WordCount	as less as 6 in python code. refer here	as less as 73 in Java code. refer here

Monday, June 13, 2016

Spark Summit East Keynote: Apache Spark 2.0

How do you get your hands on Spark 2.0 :-
1. Databricks Community Edition
2. Download and set it up

Major features:-

Big Data Authority