Spark or Hadoop: which one to use for big data mining?

Mark · Post by **Mark** » Sun Apr 10, 2022 12:16 am

I.wonder.whether to use spark or hadoop for the big data mining. any advices? Thanks

Alva · Post by **Alva** » Sun Apr 10, 2022 2:08 am

I heard that Spark Is faster because It can keep data in memory instead of always reading and writting to file. But that Is all I know.

Lin · Post by **Lin** » Sun Apr 10, 2022 2:44 pm

Spark is newer. I recommend It over hadoop

YouTube · Post by **admin** » Mon Apr 11, 2022 6:43 am

I have not work much on big data topics but I think most people agree that Spark is quite good. MapReduce model used by Hadoop is efficient especially if processing requires a single iteration of Map and Reduce. If there are more than two iterations, then it becomes hard.

There are also some alternatives to map reduce and spark. For example, I know some new models like RSP (Random Sample Partitions) developed in the institute where I work that provide an alternative. The idea of RSP is to support approximate computing by performing sampling on the big data. By sampling, random data blocks are create which allows to give good approximation for data mining or machine learning tasks without using all the data. This can be faster than traditional big data models. There are also other works.