About hirschs

hirschs · ‎06-30-2016

I wanted to post a quick followup on this thread. We recently found ourselves in a situation where we needed to deploy the hbase client code on an arbitrary number of machines and did not want the overhead of using Ambari. It was very straightforward to setup the Hortonworks repository reference and pull down hbase, however even after adding Phoenix the hbase shell would fail at startup with the dreaded (and spectacularly uninformative) exception: NativeException: java.io.IOException: java.lang.reflect.InvocationTargetException initialize at /usr/hdp/2.3.2.0-2950/hbase/lib/ruby/hbase/hbase.rb:42 (root) at /usr/hdp/2.3.2.0-2950/hbase/bin/hirb.rb:131 After almost 1/2 day of hair-pulling, I ran strace against the shell startup on a working node and compared it to the trace from the failing one. It turns out that the shell absolutely requires this directory path to exist (it can be empty): /hadoop/hbase/local/jars Once I created that hierarchy the shell was able to start successfully: $ mkdir /hadoop $ chmod 1777 /hadoop $ mkdir -p /hadoop/hbase/local/jars $ chmod -R 755 /hadoop/hbase Hopefully this will save someone else the time and aggravation.

hirschs · ‎03-16-2016

@stevel Fantastic! That's a great example of useful and practical documentation. I'll let you know what I turn up from making the REST calls.

hirschs · ‎03-15-2016

@stevel I made those two changes and restarted Spark. A job submitted with '--master yarn-client' still behaves as before, with the history server not correctly tracking the job. A job submitted with '--master yarn-cluster' does get picked up as a completed job in history, but when I drill in there is absolutely no information available relative to the job. The environment tab is populated, but not with anything obviously job-specific. The 'executors' tab has the following: executors.png which is suspiciously devoid of any actual activity. 'Stages', 'Storage' and 'Jobs' are completely blank. I understand in the abstract what you're asking for in terms of querying the ATS server, but it's going to take me some time to determine the required web-service calls and put that together. It's something I probably need to know about, but won't have the time to dig in for a day or so. Thanks for your help to this point! I'll try to get the rest of the information later this week.

hirschs · ‎03-14-2016

@stevel I moved the yarn timeline directory to /hadoop/yarn and restarted. I'm no longer seeing the 500 error from the Spark History UI, but it continues to list completed Spark jobs in 'incomplete' telling me that there are hundreds of tasks remaining to be run. The YARN history UI does correctly report that the job is complete. incomplete.png The developer who owns the application tells me that it appears to be returning proper results.

hirschs · ‎03-14-2016

@stevel I am seeing that same 500 error when working directory from the browser. Moving the timeline storage path is not a problem. I've read some suggestions about moving it to HDFS, but I'm not sure what other ramifications that may have so I'll stick with machine-local storage for now. Not sure if you saw one of my earlier posts where I mentioned that the Spark daemon log is filling with errors whose meaning is not clear (see beginning of thread above). Perhaps that will go away when I relocate the log directory.

hirschs · ‎03-14-2016

@stevel Hi. We are using HDP-2.3.2.0-2950, with all nodes running Centos 6.7. Not sure I know how to answer the question about logs. For starters, it's not easy to understand where these would be. If I assume to history server to be the machine that I connect to for the Spark History UI and I assume that job-related logs would be under /tmp, then there's nothing relevant on that box. If I look on the namenode I can see /tmp/hadoop/yarn/timeline with populated subdirectories. Are those what you are referring to? I restarted the history server and now things are utterly non-functional. The Spark History UI shows nothing under either complete or incomplete and displays an error: Last Operation Failure: java.io.IOException: Bad GET request: status code 500 against http://bigfoot6.watson.ibm.com:8188/ws/v1/timeline/spark_event_v01?fields=PRIMARYFILTERS,OTHERINFO; {"exception":"WebApplicationException","message":"java.io.IOException: org.iq80.leveldb.DBException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /tmp/hadoop/yarn/timeline/leveldb-timeline-store.ldb/005567.sst: No such file or directo Indeed, there is no file by that particular name, but there are dozens of other .sst files present. What is causing it to look for that specific file and, further, why is it giving up completely after not finding it? We are using the YARN history service as the backend. FYI: After restarting the history server, I'm getting this is in the daemon logs on the history server host: spark-spark-orgapachesparkdeployhistoryhistoryserv.txt It looks very unhappy. All of this had been working fine as recently as late January, and I have not (knowingly) made any changes whatsoever to the Spark history configuration. Please let me know if you need any further information. I've looked through the Hortonworks courses on Hadoop management, but haven't seen any syllabus that claims to cover troubleshooting at a sufficiently low level. If that's not the case, can you advise which of them would provide enough background to be able to help in a case such as this?

hirschs · ‎03-09-2016

Thanks, but we do not have a support agreement. We'll just have to live with it. I've provided all the information I have.

hirschs · ‎03-08-2016

I'm starting to get concerned about this issue. We have run about 50 jobs in Spark that return results without any exceptional conditions, and which the YARN UI reports as complete. All of them are languishing in the Spark UI incomplete job listing with over 150 steps (it claims) left to go. The offending operation is either 'toPandas' or 'treeAggregate at GradientDescent.scala:189'. I do not see any sign that these processes are actually alive. Why are they not being reported as done?

hirschs · ‎03-08-2016

The Tez view appears to be working correctly. That endless cascade of exceptions from the history server must be pointing to something specific, but I unfortunately do not know how to interpret it. One of our users mentioned to me that the lingering jobs in the Spark UI are all using a Python method called 'toPandas', while the few that do get properly noted as complete do not. Is that a useful clue? The Spark "incomplete" history continues to pile up dozens of jobs that are reported on the console (and by YARN) as being finished.

hirschs · ‎03-07-2016

More information: Around the time the history server stopped working correctly, a cascade of exceptions appeared in the spark logs: 2016-02-04 14:55:09,035 WARN timeline.TimelineDataManager (TimelineDataManager.java:doPostEntities(366)) - Skip the timeline entity: { id: tez_con tainer_e07_1453990729709_0165_01_000043, type: TEZ_CONTAINER_ID } org.apache.hadoop.yarn.exceptions.YarnException: The domain of the timeline entity { id: tez_container_e07_1453990729709_0165_01_000043, type: TEZ_ CONTAINER_ID } is not allowed to be changed from Tez_ATS_application_1453990729709_0165 to Tez_ATS_application_1453990729709_0165_wanghai_201602041 45330_86a58f3a-0891-4c24-bf0f-0375575077da:1 Does that shed any light on the underlying problem? The log contains > 50 MB of such messages.

Online	Offline
Last Visited	‎11-03-2016 03:14 PM

Member Since	‎02-11-2016 06:43 PM
Last Visited	‎11-03-2016 03:14 PM
Posts	53
Kudos received	21

Cloudera Community

Re: Installing LZO compression broke Hive complete...

Re: Unable to use lzo codec

Re: HBase shell throws exception at startup

Re: HBase shell throws exception at startup

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete

Re: Spark UI thinks job is incomplete