About zack_riesland

zack_riesland · ‎03-15-2017

Thanks @Sumesh For several minutes, I get messages like this: DAGClientRPCImpl: GetDAGStatus via AM for app: <application ID> dag:<dag ID> IPC Client (client ID) connection to <data node> from <edge node> sending X Got ping response for sessionid: <session ID> after 0ms ... where X is a number that increases each time this gets printed. This entire time, it shows "-1" as the number of "total" and "pending" mappers, and 0 for everything else.

zack_riesland · ‎03-13-2017

Thanks Adnan, We use MR and Tez for different queries. But both query engines show this behavior.

zack_riesland · ‎03-13-2017

Lately, we've seen intermittent behavior where Hive queries take a very long time to start. For example, I submitted a query via Hive CLI on an edge node about 5 minutes ago, and it still isn't in the application manager. There is NOTHING else running. Zero use of our cluster happening right now. Usually, the queries eventually start. Ocassionally, I get an error like: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: FAIL Can anyone help me figure out why this is happening?

zack_riesland · ‎02-10-2017

Found it on another thread: https://community.hortonworks.com/questions/69533/convert-unix-timestamp-to-timestamp-format.html select from_unixtime(cast(time/1000 as bigint)) from table_1;

zack_riesland · ‎02-10-2017

Given a column of type bigint, with a mili-second precision timestamp, like this 1485172800000 How can I get hive to give me a date, like this: 1/23/2017 I've done it before and I don't believe a UDF is necessary, but I can't seem to get it to work for me today. Thanks!

zack_riesland · ‎01-19-2017

We were ultimately able to get everything back in shape, but it wasn't pretty. Too many steps to detail here.

zack_riesland · ‎01-19-2017

Ah! I see that there are 2 "archive" properties in the nifi.properties file: nifi.flow.configuration.archive.enabled nifi.content.repository.archive.enabled I was setting the former, but I need to set the latter.

zack_riesland · ‎01-19-2017

I noticed that the disk drive on my HDF server is almost full. I did some exploring, and found that the nifi/content_repository folder has 1.4TB of data in it, even with nothing in any queue, and nifi stopped. I have had nifi.flow.configuration.archive.enable=false in nifi/conf/nifi.properties (since I installed), but I see a whole folder full of data with today's date. The data in the content_repository folder is thousands of folders, named numerically ("1", "2", etc). Each folder has a single file named with an epoch timestamp. Can someone give me some guidance on what I can safely delete and how to keep this tool from eating up so much disk storage?

zack_riesland · ‎01-18-2017

Thanks Pranay, We followed these steps - though I just realized that we forgot to restart ambari-server. We were able to successfully delete the server. However, when we added the server, it now seems to be stuck on a modal dialog that says "Please wait while the hosts are being checked for potential problems..." In the ambari-agent logs on the server (being added), we see this repeated over and over again: INFO 2017-01-18 14:22:00,773 Controller.py:265 - Heartbeat response received (id = 109) INFO 2017-01-18 14:22:00,773 RecoveryManager.py:260 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-01-18 14:22:00,773 RecoveryManager.py:795 - Recovery is paused, likely tasks waiting in pipeline for this host. INFO 2017-01-18 14:22:10,674 Heartbeat.py:78 - Building Heartbeat: {responseId = 109, timestamp = 1484767330674, commandsInProgress = False, componentsMapped = False} INFO 2017-01-18 14:22:10,716 Controller.py:265 - Heartbeat response received (id = 110) INFO 2017-01-18 14:22:10,716 RecoveryManager.py:260 - METRICS_COLLECTOR needs recovery, desired = STARTED, and current = INSTALLED. INFO 2017-01-18 14:22:10,716 RecoveryManager.py:795 - Recovery is paused, likely tasks waiting in pipeline for this host. INFO 2017-01-18 14:22:20,617 Heartbeat.py:78 - Building Heartbeat: {responseId = 110, timestamp = 1484767340617, commandsInProgress = False, componentsMapped = False} INFO 2017-01-18 14:22:20,660 Controller.py:265 - Heartbeat response received (id = 111) Do you have any insights as to what may be the solution?

zack_riesland · ‎01-18-2017

One of the servers in our cluster failed due to multiple disk drive failures. The server was not a data node or master server - it was used as a journal node for HA, a zookkeeper server, HST server, Apache Thrift server, and Grafana server. Using Ambari, we put the server in maintenance mode, and then rebuilt it. It is now ready to come back online. We have ambari-agent installed, as well as necessary repos, etc. My question is: can I use the Ambari GUI and "delete" the server from the cluster, and then follow the steps to add the server back? Should I expect any issues, since it has the same name, IP address, etc.? Is there a better way to accomplish this? We basically want to bring it online and put it back to work doing everything it was doing before.

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: Hive queries taking LONG time to start

Re: Hive queries taking LONG time to start

Hive queries taking LONG time to start

Re: Hive: Convert int timestamp to date

Hive: Convert int timestamp to date

Re: Best way to re-add a server to HDP cluster?

Re: How to clear temp data from dataflow / nifi?

How to clear temp data from dataflow / nifi?

Re: Best way to re-add a server to HDP cluster?

Best way to re-add a server to HDP cluster?