Spark and Hadoop are not mutually exclusive but they rather work together. Spark is an execution engine that runs on top of Hadoop by broadening the kind of computing workloads Hadoop handles whilst tuning the performance of the big data framework.
Apache Hadoop stores data on disks whereas Spark stores data in-memory. Spark uses RDD and various data storage models to guarantee fault tolerance by minimizing network I/O whereas Hadoop achieves fault tolerance through replication.
Spark is effective over Hadoop as it handles all the computation operation in-memory by copying them from physical memory storage to a faster logical RAM. Thus, the time taken to read and write from slow hard drives is reduced unlike Hadoop MapReduce.
Spark vs. Hadoop – Workloads
If the big data application involves ETL type computations wherein the resulting data sets are large and possibly might exceed the overall RAM of the Hadoop cluster then Hadoop will outperform Spark. Spark proves to be efficient for computations that involve iterative machine learning algorithms.
For the complete list of big data companies and their salaries- CLICK HERE
Spark vs. Hadoop- Cost
Hadoop and Spark are both open source big data frameworks but money needs to be spent on staffing and machinery. Hadoop is economical for implementation as there are more Hadoop engineers available when compared to personnel in Spark expertise and also because of HaaS. (Hadoop as a Service). Spark is cost effective according to the benchmarks but staffing is expensive due to the lack of personnel with Spark expertise.
Spark vs. Hadoop- Ease of Use
Programming in Hadoop is difficult as there is no interactive mode unlike Spark which has an interactive mode making it easier for programming purposes.
Read more on - Spark vs. Hadoop