About ravi1

ravi1 · ‎05-25-2016

This is a case of corrupt pig.tar.gz in hdfs (/hdp/apps/<version>/pig) folder. I am not sure how it ended up with a corrupt version there on a fresh install based on ambari. But once I manually updated with pig.tar.gz from /usr/hdp/<version>/pig/, the error got resolved. However, confusing part is pig view throwing a completed unrelated error (File does not exist at /user/rmutyal/pig/jobs/test_23-05-2016-14-46-54/stdout)

ravi1 · ‎05-23-2016

HA failover is automatic by default if you enabled failover from ambari. Mapreduce jobs won't fail during a failover scenario.

ravi1 · ‎05-23-2016

No luck with that. This is a cluster with https configured for ambari

ravi1 · ‎05-23-2016

Please check if proxy users are properly set. https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_views_guide/content/_setup_HDFS_proxy_user.html

ravi1 · ‎05-23-2016

Pig jobs work from gateway node but fail from ambari-view. I crosschecked configs with proxyusers and I have ambariid which is running ambari-server configured there. Error from RM shows this. AM Container for appattempt_1463770749228_0048_000001 exited with exitCode: -1000 For more detailed output, check application tracking page:http://<hostname>:8088/cluster/app/application_1463770749228_0048Then, click on links to logs of each attempt. Diagnostics: ExitCodeException exitCode=2: gzip: /grid/6/hadoop/yarn/local/filecache/33_tmp/tmp_pig.tar.gz: unexpected end of file tar: Unexpected EOF in archive tar: Unexpected EOF in archive tar: Error is not recoverable: exiting now Failing this attempt Error from Pig view is File does not exist: /user/rmutyal/pig/jobs/test_23-05-2016-14-46-54/stdout but webhcat user configuration looks alright. What am I missing?

ravi1 · ‎05-23-2016

You don't need to restart HDP 2.4 cluster. But it is recommended to decommission the node with dead disk, change the disk and add it back to the cluster. This will ensure that data is evenly distributed on all the data disks on this node. 1. To decommission, go to ambari -> Host -> Datanode. That has an option to decommission. 2. Go to nodemanager to decommission it as well. 3. Once it goes to decommissioned state, stop Datanode and nodemanager on that host and replace the disk. 4. Start datanode and nodemanager back. 5. You will see an option to recommission at the same place. You can click on it to take it out of decommissioned state. No other services across the cluster need to be stopped and if you have more than 3 datanodes and your default rep. factor is 3, all services will continue.

ravi1 · ‎05-22-2016

Hadoop is distributed filesystem and distributed compute, so you can store and process any kind of data. I know that a lot of examples point of csv and DB imports since they are the most common use cases. I will give a list of ways of how the data that you listed can be used and processed in hadoop. You can see some blogs and public repos for examples. 1. csv Like you said you will see a lot of examples including in our sandbox tutorials. 2. doc You can put raw 'doc' documents into hdfs and use tika or tesseract to do OCR from these documents. 3. audio and video. You can put raw data again in hdfs. Processing depends on what you want to do with this data. You can extract metadata out of this data using yarn. 4. relational DB. You can take a look at sqoop examples on how you can ingest relations DB into HDFS and use hive/hcatalog to access this data.

ravi1 · ‎05-21-2016

If this is a production cluster and you are on support, I suggest opening a support ticket since any tweaks can lead to data loss. Before you more further, please take a back of NN metadata and edits from journal nodes.

ravi1 · ‎05-20-2016

If you are looking for opensource volume level encryption tools, we have seen LUKS being used. There will be some overhead from LUKS. You can take a look at LUKS at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Security_Guide/sec-Encryption.html

ravi1 · ‎05-20-2016

You can increase memory on your mappers. Take a look at mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, mapreduce.map.java.opts and mapreduce.reduce.java.opts. I think your mapreduce.map.memory.mb is set to 256MB based on the error. I don't know what else is running on your 3GB node and what heap is given, but you maybe able to allocate 1GB of it to yarn (container memory). It is also possible to get it to run on 15GB node by using node labels. You can also switch off nodemanager on 3GB node if other processes are running no this, so it uses 15GB node.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Pig jobs from pig view fail with unexcepted en...

Re: Will the MapReduce jobs fail when the NameNode...

Re: Pig jobs from pig view fail with unexcepted en...

Re: Issue in Tez View while running jobs from Hive...

Pig jobs from pig view fail with unexcepted end of...

Re: Please how do i do a hot disk swap on a node ...

Re: Hadoop in real life and Practical use and Proc...

Re: Unable to restrat standby Namenode

Re: Any best practices and recommended tools for v...

Re: Sqoop import failes due to lack of memory : fi...