Difference Between Spark and Hadoop
Difference | Spark | Hadoop Map-reduce |
---|---|---|
1. Perfomance | Itertaive computations are performed in-memory, the mapper functions just transform one RDD to another RDD, resulting in saving disk io,network io and improving performance | Map and Reduce phases cause every mapper/reducer to write data to disk after mapping and then successive mapper/reducer to read from it, thus resulting in disk io,network io, causing latency |
2. Programming Languages | Scala,Java,Python,R | Java |
3. Basic Unit of Data | RDD - Resilient Distributed Dataset | Tuples |
4. Lines of Code for WordCount | as less as 6 in python code. refer here | as less as 73 in Java code. refer here |
No comments:
Post a Comment