Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

sentiment analysis with HDP

SOLVED Go to solution
Highlighted

sentiment analysis with HDP

New Contributor

Hi all

I am a master student I want work my thesis in sentiment analysis with hadoop in arabic language, so my question is here support this language ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: sentiment analysis with HDP

Depends what you want to do. By itself Hadoop doesn't support any sentiment analysis. So you need to use a sentiment analytics package. HAdoop is mostly written on Java so pretty much all Java packages will work. Java itself handles strings as UTF so arabic is supported by itself.

The biggest ones are

Stanford NLP

OpenNLP

and Gate

From a quick google search both gate and stanford support some arabic features:

https://gate.ac.uk/gate/plugins/Lang_Arabic/src/arabic/

http://nlp.stanford.edu/projects/arabic.shtml

If you want to run these packages in hadoop you will have to decide if you want to run them in

- MapReduce

- as pig udfs perhaps

- Spark

( Hadoop Streaming and Spark also support python, so you could use nltk but I would suggest Java )

2 REPLIES 2

Re: sentiment analysis with HDP

Depends what you want to do. By itself Hadoop doesn't support any sentiment analysis. So you need to use a sentiment analytics package. HAdoop is mostly written on Java so pretty much all Java packages will work. Java itself handles strings as UTF so arabic is supported by itself.

The biggest ones are

Stanford NLP

OpenNLP

and Gate

From a quick google search both gate and stanford support some arabic features:

https://gate.ac.uk/gate/plugins/Lang_Arabic/src/arabic/

http://nlp.stanford.edu/projects/arabic.shtml

If you want to run these packages in hadoop you will have to decide if you want to run them in

- MapReduce

- as pig udfs perhaps

- Spark

( Hadoop Streaming and Spark also support python, so you could use nltk but I would suggest Java )