Spark or Hadoop: which one to use for big data mining?
Spark or Hadoop: which one to use for big data mining?
I.wonder.whether to use spark or hadoop for the big data mining. any advices? Thanks
Re: Spark or Hadoop: which one to use for big data mining?
I heard that Spark Is faster because It can keep data in memory instead of always reading and writting to file. But that Is all I know.
Re: Spark or Hadoop: which one to use for big data mining?
Spark is newer. I recommend It over hadoop
Re: Spark or Hadoop: which one to use for big data mining?
I have not work much on big data topics but I think most people agree that Spark is quite good. MapReduce model used by Hadoop is efficient especially if processing requires a single iteration of Map and Reduce. If there are more than two iterations, then it becomes hard.
There are also some alternatives to map reduce and spark. For example, I know some new models like RSP (Random Sample Partitions) developed in the institute where I work that provide an alternative. The idea of RSP is to support approximate computing by performing sampling on the big data. By sampling, random data blocks are create which allows to give good approximation for data mining or machine learning tasks without using all the data. This can be faster than traditional big data models. There are also other works.
There are also some alternatives to map reduce and spark. For example, I know some new models like RSP (Random Sample Partitions) developed in the institute where I work that provide an alternative. The idea of RSP is to support approximate computing by performing sampling on the big data. By sampling, random data blocks are create which allows to give good approximation for data mining or machine learning tasks without using all the data. This can be faster than traditional big data models. There are also other works.
Re: Spark or Hadoop: which one to use for big data mining?
Nice, I will try to find information about this