Member since
11-30-2016
21
Posts
3
Kudos Received
0
Solutions
03-07-2017
02:23 AM
Hi @Constantin Stanca
Thanks for replying. I've tried Zeppeling and his Notebooks, but what i need is to create my own layouts and finally export to pdf, xls or even rtf. Regards
... View more
03-07-2017
02:05 AM
2 Kudos
Dear Community, I finally was able to work and manage my twitter analysis using NiFi + Hortonworks.
Now that i've accomplished my data source, i would like to create a nice report output. got a suggestion? ps: I used to work with Bi Publisher. Regards
... View more
02-20-2017
12:44 PM
Dear community, I've been wondering something that cannot figure out yet.
A quick summary
I was able to work for a while with Flume to gather data from twitter and store it in a external table using Hive.
After processing data using Hive, i finally got the results that i want.
All steps made were manually.
What i need now is to make same process, but automatically because i need to finally make it public.
A suggestion? Thanks for your help,
Regards
... View more
- Tags:
- Flume
- hadoop
- Hadoop Core
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
12-21-2016
03:01 AM
I've placed more ram, added jar path missing and worked perfect. Thanks for your help
... View more
12-12-2016
02:26 AM
Dear Sunile, First of all. Thanks for answering my question.
I really appreciate your recommendation, i'll definitely use NiFi next time. For my use case, what do you think about trying to process data with Hive?
I'm thinking about creating new tables and then inserting data filtered with the parameters of my interest (as i explained before).
I'd like to know if there is a best practice (or alternative like programming something in java for Hadoop, or scala for Spark) to continue transforming data through the path that i have chosen and get results. Regards
... View more
12-11-2016
02:37 AM
Hi everyone reading this.
As you can see, the main subject of this post is to get tips from you all for taking the best decision. I've been working with Hortonworks Sandbox 2,5 since a month ago. I've been playing with Flume, collecting data from twitter and trying to transform it into a readable format inside a Hive Table. Streaming-twitter-data-using-flume/
Until now, i successfully loaded data into Hive table (called tweets) using this guide
Tweets using Hive
My problem is the following. It's not easy for me at this step to select and process data. I would like to make it more readable (is it possible? if so, can you tell me how?). Also, would like to filter data. I want to now, for example: 1.List most used words
2.List most used time for tweeting
3.List active users
and so on... Any technique, technology is welcome. I am trying to learn as fast as i can, but your help will be always welcome. Let's work together if you want, you're welcome. Regards, Cristian
... View more
Labels:
- Labels:
-
Apache Flume
-
Apache Hadoop
-
Apache Hive
12-09-2016
03:27 PM
Hi Michael, I'm still trying to figure out where to find that log (is there any folder?).
I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it.
Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a Until now, i've just found the following exception (through CLI) after execution fails: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space Any suggestion? Regards and thank you Michael for helping me
... View more
12-07-2016
03:08 AM
Dead Jean-Phillipe. I tried to follow your instructions. But, as you can see is giving me headaches "YARN will need to fit at least one Tez AM (512 MB) and a couple Tez
containers (512MB *2). You can check how much memory is allocated to
YARN on the YARN config page "Memory allocated for all YARN containers
on a node"." Can you please guide me please. Regards
... View more
12-07-2016
02:12 AM
Hey Ed. Thanks for answering. I'm trying to make contact with Michael, lol. As you can see above, my select count(*) is failling.
... View more
12-07-2016
02:06 AM
Dead Michael,
as you say, i'd love knowing how many rows does Tweets table has. That's why i'm trying to execute a select count(*) Regards
... View more
12-07-2016
01:29 AM
Hi there! I'm new using hortonworks sandbox 2.5 and i wonder why HIVE shows different behavior: hive> select * from tweets limit 2;
OK
then shows fetched rows hive> select count(*) from tweets;
Query ID = hdfs_20161205070716_8b694554-f4fe-4aea-b4c9-507d2fc343e0Total
jobs = 1Launching Job 1 out of 1
then shows
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED 4 0 0 4 13 0
Reducer 2 KILLED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 261.46 s
--------------------------------------------------------------------------------
it is because the second query requires more capability? Regards
... View more
- Tags:
- Data Processing
- Hive
Labels:
- Labels:
-
Apache Hive
12-02-2016
07:43 PM
Well Zyang, as you suggested i tried decreasing tez container size. Now i got the following error Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1480577643003_0005_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1480577643003_0005_1_01, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1480577643003_0005_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 Any suggestion? regards
... View more
12-02-2016
07:10 PM
Zyang, what is the purpose about decreasing Tez Container Size?
... View more
12-02-2016
06:43 PM
Hi Zyang, Can you tell me where can i find that information? Regards
... View more
12-02-2016
06:09 PM
Jean-Phillippe,
as you may guess, i am new with the Sandbox.
How can i know if there's enough room in my queue to fit containers? These are my setting.
... View more
12-02-2016
05:22 PM
Hi, I'm trying to do a select count(*) from tweets;
using Hive CLI. But as you can see, it is taking to much time (i guess) It is normal? Any tip? Regards
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez