Big Data Authority: Difference Between Spark and Hadoop Map-reduce

Friday, June 17, 2016

Difference Between Spark and Hadoop

Difference	Spark	Hadoop Map-reduce
1. Perfomance	Itertaive computations are performed in-memory, the mapper functions just transform one RDD to another RDD, resulting in saving disk io,network io and improving performance	Map and Reduce phases cause every mapper/reducer to write data to disk after mapping and then successive mapper/reducer to read from it, thus resulting in disk io,network io, causing latency
2. Programming Languages	Scala,Java,Python,R	Java
3. Basic Unit of Data	RDD - Resilient Distributed Dataset	Tuples
4. Lines of Code for WordCount	as less as 6 in python code. refer here	as less as 73 in Java code. refer here

Big Data Authority