Member since
11-30-2016
21
Posts
3
Kudos Received
0
Solutions
03-07-2017
02:23 AM
Hi @Constantin Stanca
Thanks for replying. I've tried Zeppeling and his Notebooks, but what i need is to create my own layouts and finally export to pdf, xls or even rtf. Regards
... View more
03-07-2017
02:05 AM
2 Kudos
Dear Community, I finally was able to work and manage my twitter analysis using NiFi + Hortonworks.
Now that i've accomplished my data source, i would like to create a nice report output. got a suggestion? ps: I used to work with Bi Publisher. Regards
... View more
Labels:
- Labels:
-
Apache NiFi
02-20-2017
12:44 PM
Dear community, I've been wondering something that cannot figure out yet.
A quick summary
I was able to work for a while with Flume to gather data from twitter and store it in a external table using Hive.
After processing data using Hive, i finally got the results that i want.
All steps made were manually.
What i need now is to make same process, but automatically because i need to finally make it public.
A suggestion? Thanks for your help,
Regards
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
12-21-2016
03:01 AM
I've placed more ram, added jar path missing and worked perfect. Thanks for your help
... View more
12-12-2016
02:26 AM
Dear Sunile, First of all. Thanks for answering my question.
I really appreciate your recommendation, i'll definitely use NiFi next time. For my use case, what do you think about trying to process data with Hive?
I'm thinking about creating new tables and then inserting data filtered with the parameters of my interest (as i explained before).
I'd like to know if there is a best practice (or alternative like programming something in java for Hadoop, or scala for Spark) to continue transforming data through the path that i have chosen and get results. Regards
... View more
12-11-2016
02:37 AM
Hi everyone reading this.
As you can see, the main subject of this post is to get tips from you all for taking the best decision. I've been working with Hortonworks Sandbox 2,5 since a month ago. I've been playing with Flume, collecting data from twitter and trying to transform it into a readable format inside a Hive Table. Streaming-twitter-data-using-flume/
Until now, i successfully loaded data into Hive table (called tweets) using this guide
Tweets using Hive
My problem is the following. It's not easy for me at this step to select and process data. I would like to make it more readable (is it possible? if so, can you tell me how?). Also, would like to filter data. I want to now, for example: 1.List most used words
2.List most used time for tweeting
3.List active users
and so on... Any technique, technology is welcome. I am trying to learn as fast as i can, but your help will be always welcome. Let's work together if you want, you're welcome. Regards, Cristian
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
Apache Hive
12-09-2016
03:27 PM
Hi Michael, I'm still trying to figure out where to find that log (is there any folder?).
I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it.
Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a Until now, i've just found the following exception (through CLI) after execution fails: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space Any suggestion? Regards and thank you Michael for helping me
... View more
12-07-2016
03:08 AM
Dead Jean-Phillipe. I tried to follow your instructions. But, as you can see is giving me headaches "YARN will need to fit at least one Tez AM (512 MB) and a couple Tez
containers (512MB *2). You can check how much memory is allocated to
YARN on the YARN config page "Memory allocated for all YARN containers
on a node"." Can you please guide me please. Regards
... View more