Support Questions

Find answers, ask questions, and share your expertise

Can I use HortonWorks and Its Technologies to Kind do a similar job as Amazon Elastic MapReduce

avatar
Explorer

Hi,

I have a project to work on it for my Master Degree in CS with Concentration in Data Mining with the following requirements:

I already e-mailed the Professor for this class mentioned about using HortonWorks instead. I am not knowledgeable on both Amazon Elastic MapReduce and HortonWorks, therefore it will be a learning curve but worth it.

1) Can you please let me know if the HortonWorks and all its technologies involved are open source (free)?

2) If that's so, I am going to assume that I can do the same using HortonWorks for what I suppose to do in this school project using Amazon Elastic MapReduce, am I right?

3) I already had implemented one of the algorithm for this project using Microsoft Visual Studio and .NET C#. Does HortonWorks support .NET/C#?

Big thanks in advance.

Marco Lanza

P.S.: I am very new to these technologies.

1 ACCEPTED SOLUTION

avatar
Master Mentor

1) yes 100%

2) if you can get away with running mapreduce programs, Hive queues or pig scripts then you should be fine.

3) java is primary language, no .Net support. At least not in HDP on Linux.

View solution in original post

7 REPLIES 7

avatar
Master Mentor

1) yes 100%

2) if you can get away with running mapreduce programs, Hive queues or pig scripts then you should be fine.

3) java is primary language, no .Net support. At least not in HDP on Linux.

avatar
Explorer

Hi Artem,

Big thanks for your answer/reply.

In other words, I can work on my project utilizing HortonWorks for free but my problem rely on the prog. language I am planning to use such as C# not being supported.

The solution is that I have to change the language to Java and I will assume that C/C++ is supported, as well.

Thanks again.

avatar
Master Mentor

@Marco Lanza you can take a look at hadoop pipes or hadoop streaming to leverage a different language than Java. I think if you plan to learn MapReduce on Hortonworks platform, then invest into Java. There's also http://www.cascading.org/, then there are a couple of higher level languages like Apache Pig or Apache Hive that have smaller learning curve. You can also look at Apache Spark as that's where Big Data industry is going and there you have multiple language support including C# http://research.microsoft.com/en-us/projects/spark-clr/

hadoop streaming reference below

https://hadoop.apache.org/docs/current/hadoop-streaming/HadoopStreaming.html

avatar
Explorer

@Artem Ervits

Thanks for your reply.

avatar
Explorer

Sorry, I forgot to provide the requirement for my project school as it follows:

"

This option is to implement a data mining algorithm (e.g. association mining, classification, clustering, etc.) of your choice on a dataset of your choice using Hadoop and MapReduce.

Cloud infrastructure: Amazon Elastic MapReduce (Amazon EMR) Programming software: Hadoop/MapReduce on an AWS cluster in a master slave fashion with multiple nodes using a programming language of your preference (e.g. Java)

"

Big thanks again.

avatar
Master Mentor

yep, Java is the way to go. Try mapreduce with Java, it's not too bad, https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduc...

avatar
Explorer

@Artem Ervits,

Thanks for the replies. You have provided valuable information.

For this project I have to stick with the requirements using Amazon technologies or chose for my school project another option which does not involve any of these technologies (that's the reason I want to chose this option using MapReduce, Clouds and etc).

I am planning and am looking to work in the field of Data Mining. I have been checking and notice that, as you mentioned, that are companies using Apache Spark and etc.

For the language part, I noticed too that Java is the way to go. When I asked previously to my Professor what is the "top" language when working with Data Mining and he answered as follows:

"

Depending on a professional's background, top data mining languages may vary.

For a professional with a computer science background, Java/SQL or Python is favored.

For a professional with a Statistics background, R is favored.

For a professional with an engineering background, Matlab is favored.

Keep in mind that data mining is everywhere and people with divers background are working in this hot field.

"

Thanks again for your valuable information.