Why pentaho is being used with hadoop horton works.In horton we have beautiful GUI to create a jobs and execute jobs on cluster.Can you please tell me what the advantage the pentaho is giving.what is the need for it. Can't we use only horton to process our data.?
Iam confused why people using pentaho.
I'm not sure what "beautiful GUI" you are referring to, whether that is NiFi, or SAM, but these tools did not always exist, and they only exist as part of the HDF package, not native HDP.
Pentaho works with all Hadoop environments, not only HDP. As for why people use it, you have to ask them, but if I had to guess, they were sold it by some vendor/consultant, or it was marketed to them through other channels.
I've been watching this question for a while, and very surprised there's no answers.
However; I think that says a lot. I think you'll struggle to find anyone who can give an objective answer to this. I'm an expert on the Pentaho side - but never used NiFi or SAM so therefore any answer I give on why people use Pentaho would not be objective. Nevertheless there are some benefits to using Pentaho.
In many ways, it's like the age-old question - why use an ETL tool when i can write it in python or r. It's a question that really has no single correct answer - depends on what your doing, what skills you have in your team and so on.
On the Pentaho side I've seen a huge uptick in use in hortonworks recently - so something is going on! Previously all the bigdata Pentaho work I saw was in another hadoop distribution. (Not that this makes any difference to Pentaho). However; I'm based in the UK - and I believe marketshare in the UK/Europe is a bit different to that in the US.
Finally; Just as a clarification - don't forget Pentaho is opensource, and the CE (community) version is fully usable in hadoop. I know this is true of NiFi too, but some people seem to think Pentaho is a commercial tool, with only an EE version - I guess thats the marketing team in action!