Spark or Hadoop: which one to use for big data mining?

Here you can discuss any topics related to data mining and big data
Post Reply
User avatar
Mark
Posts: 55
Joined: Wed Apr 06, 2022 3:23 am

Spark or Hadoop: which one to use for big data mining?

Post by Mark »

I.wonder.whether to use spark or hadoop for the big data mining. any advices? Thanks
Alva
Posts: 44
Joined: Sun Apr 10, 2022 12:17 am

Re: Spark or Hadoop: which one to use for big data mining?

Post by Alva »

I heard that Spark Is faster because It can keep data in memory instead of always reading and writting to file. But that Is all I know.
User avatar
Lin
Posts: 19
Joined: Wed Apr 06, 2022 3:01 am

Re: Spark or Hadoop: which one to use for big data mining?

Post by Lin »

Spark is newer. I recommend It over hadoop
User avatar
admin
Site Admin
Posts: 121
Joined: Tue Apr 05, 2022 12:47 am
Location: China
Contact:

Re: Spark or Hadoop: which one to use for big data mining?

Post by admin »

I have not work much on big data topics but I think most people agree that Spark is quite good. MapReduce model used by Hadoop is efficient especially if processing requires a single iteration of Map and Reduce. If there are more than two iterations, then it becomes hard.

There are also some alternatives to map reduce and spark. For example, I know some new models like RSP (Random Sample Partitions) developed in the institute where I work that provide an alternative. The idea of RSP is to support approximate computing by performing sampling on the big data. By sampling, random data blocks are create which allows to give good approximation for data mining or machine learning tasks without using all the data. This can be faster than traditional big data models. There are also other works.
User avatar
gmc
Posts: 69
Joined: Tue Apr 05, 2022 4:48 pm

Re: Spark or Hadoop: which one to use for big data mining?

Post by gmc »

Nice, I will try to find information about this
Post Reply