Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do you know something like kaggle for Hadoop

avatar
Contributor

Excuse me but maybe my question is out of scope.

I am looking for a platform where firms post their projects and data in order to have solutions.

Something similar today is kaggle but I see that it is specialized in machine learning.

I am not a machine learner because I am not at ease with Maths. I will like to apply what I have learned on this hadoop platform on a real entreprise project in order to mark it as a valuable experience on my CV.

For the moment I am preparing HDPCD exam certification, but it cannot replace the entreprise experience.

For the moment , I am not able to work in an entreprise for personnal reasons.

Thanks for your feedback.

1 ACCEPTED SOLUTION

avatar
Super Guru

@Oriane

I am not aware of a platform similar to Kaggle for Hadoop. Problems on Kaggle are very specific. There is a known input (data) and companies know exactly what they want. They don't have to share their data and in the end they walk out with best data model in terms of accuracy. That's it. There are no adhoc queries, no one is talking about 10 different data sources, seven of which have regulatory data which you cannot copy due to compliance reasons but must somehow work with the teams to build reports.

See, Machine learning problems, like the one shared on Kaggle, can be solved with the help of Hadoop eco system, specifically Spark. But then there is whole set of other problems that are solved with Hadoop and there is no platform for those problems similar to Kaggle because they don't have a definitive outcome.

For example, if someone wants to run ad hoc queries using Hive LLAP and wants 5 second SLA, you will have to do that in their environment. you can't just solve those on a platform like Kaggle and provide a solution to the customer. Or imagine someone trying to build an app to use HBase as a backend. Some of the problems they will encounter is key design, sizing of the cluster and ensuring SLAs for x number of concurrent queries (omes under sizing and design). You can help guide them, but they will mostly not share their business requirements on a public platform. For whatever help they need, they can come to sites like this and get their questions answered. In most cases they will work directly with Hortonworks to come with the best solution.

In a nutshell, machine learning problems tends to be definitive (math is definitive) without requiring access to real data. Kaggle is a perfect platform to bring your problems and pay people to solve them and walk away with the best solution (not just any solution) from some really good minds out there. It also provides companies an opportunity to hire best talent.

Other set of problems like the one you are asking for, are not definitive, require disclosing information that companies consider competitive and so you don't see a platform like Kaggle for such problems.

Hope this helps.

View solution in original post

3 REPLIES 3

avatar
Super Guru

@Oriane

I am not aware of a platform similar to Kaggle for Hadoop. Problems on Kaggle are very specific. There is a known input (data) and companies know exactly what they want. They don't have to share their data and in the end they walk out with best data model in terms of accuracy. That's it. There are no adhoc queries, no one is talking about 10 different data sources, seven of which have regulatory data which you cannot copy due to compliance reasons but must somehow work with the teams to build reports.

See, Machine learning problems, like the one shared on Kaggle, can be solved with the help of Hadoop eco system, specifically Spark. But then there is whole set of other problems that are solved with Hadoop and there is no platform for those problems similar to Kaggle because they don't have a definitive outcome.

For example, if someone wants to run ad hoc queries using Hive LLAP and wants 5 second SLA, you will have to do that in their environment. you can't just solve those on a platform like Kaggle and provide a solution to the customer. Or imagine someone trying to build an app to use HBase as a backend. Some of the problems they will encounter is key design, sizing of the cluster and ensuring SLAs for x number of concurrent queries (omes under sizing and design). You can help guide them, but they will mostly not share their business requirements on a public platform. For whatever help they need, they can come to sites like this and get their questions answered. In most cases they will work directly with Hortonworks to come with the best solution.

In a nutshell, machine learning problems tends to be definitive (math is definitive) without requiring access to real data. Kaggle is a perfect platform to bring your problems and pay people to solve them and walk away with the best solution (not just any solution) from some really good minds out there. It also provides companies an opportunity to hire best talent.

Other set of problems like the one you are asking for, are not definitive, require disclosing information that companies consider competitive and so you don't see a platform like Kaggle for such problems.

Hope this helps.

avatar
Contributor

Thanks a lot @mqureshi for this clear answer. I see that it is not possible to have this type of platform.

I will prepare my certtification and find some uses cases on line .

do you have some websites which can be helpful?

Thanks.

avatar
Super Guru

@Oriane

Glad it was helpful. If you are satisfied with the answer, please accept it.