Support Questions
Find answers, ask questions, and share your expertise

Hive Data Extraction and Processing

Hive Data Extraction and Processing

Hi everyone!

As part of this community i would like to have some help from you all.

The issue is the following.-

I've extracted data from twitter and stored it into an external hive table.

What i want to show is a study of common words used by users who wrote something about a specific subject.

Big part of the results will lead us to understan what is the receivement that people has about the subject that we were looking for.

Any idea to complete this task?



Re: Hive Data Extraction and Processing

@Cristian Vasquez

The first step of getting the data into hadoop is done.

Now this problem can be tackled by a multiple ways in hadoop.

In hive take a look at this:

You can also choose to use spark here.

Let us know how you progress. All the best.

Re: Hive Data Extraction and Processing

@Cristian Vasquez

1. Hive

Hive provides few stats & data mining functions like - ngrams() & context_ngrams().

ngrams() would simply give you the x most frequent words in one or more sequences

context_ngrams() extend the ngrams() feature and allows you to add a context to your mining i.e., in your case a 'subject'.

Official wiki for stats and data mining functions in Hive

You could also refer to the section on "Analyze Tweet data in Hive" in this Hortonworks Tutorial and modify the queries to suit your requirements.

2. Spark

You can create a HiveContext instance in Spark(using Scala) like this:

val hiveContext = new org.apache.spark.sql.hive.HiveContext(SparkContext)

Then define your Hive query as:


You could refer to this tutorial to see the above use of HiveContext.