About ravi1

ravi1 · ‎05-25-2016

This is a case of corrupt pig.tar.gz in hdfs (/hdp/apps/<version>/pig) folder. I am not sure how it ended up with a corrupt version there on a fresh install based on ambari. But once I manually updated with pig.tar.gz from /usr/hdp/<version>/pig/, the error got resolved. However, confusing part is pig view throwing a completed unrelated error (File does not exist at /user/rmutyal/pig/jobs/test_23-05-2016-14-46-54/stdout)

ravi1 · ‎05-22-2016

Hadoop is distributed filesystem and distributed compute, so you can store and process any kind of data. I know that a lot of examples point of csv and DB imports since they are the most common use cases. I will give a list of ways of how the data that you listed can be used and processed in hadoop. You can see some blogs and public repos for examples. 1. csv Like you said you will see a lot of examples including in our sandbox tutorials. 2. doc You can put raw 'doc' documents into hdfs and use tika or tesseract to do OCR from these documents. 3. audio and video. You can put raw data again in hdfs. Processing depends on what you want to do with this data. You can extract metadata out of this data using yarn. 4. relational DB. You can take a look at sqoop examples on how you can ingest relations DB into HDFS and use hive/hcatalog to access this data.

ravi1 · ‎05-26-2016

You can get a sandbox from http://hortonworks.com/downloads/#sandbox But you will need at least 8GB for the sandbox, so make sure you are on a machine that has 12-16GB RAM if you get that. If you don't have a machine with that amount of RAM, Azure/AWS is your option. Any further questions, please open a new thread for each question, so it won't be a long thread of question and answers.

marka_thorson · ‎05-17-2016

@Sagar Shimpi I am not using HA with this cluster (it is a small demo cluster) but I will take note of that for when we build future clusters. Thanks!

jyadav · ‎05-18-2016

@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.

ravi1 · ‎05-13-2016

If there are no HA components (Namenode HA and 2 hiveserver2 instances), then there is no depending with zookeeper. Check the hiveserver2.log from /var/log/hiveserver2/hiveserver2.log to see if you see any errors. If you have 2 hiveserver2 instances, they will register with zookeeper which may be when they are running into issues.

Dominika · ‎05-23-2016

Thanks @Ravi Mutyala and @Artem Ervits. After getting stuck I ended up starting over using instructions on Apache wiki. I'm not sure what exactly was different but no password-related problems.

ravi1 · ‎05-13-2016

1. NN data lost. Is it that the disk with NN directories crashed or have you deleted them? Is this with HA or non-HA. With non-HA, if both data directories of NN have no data, then you will run into data loss issues. You can revive and get to some state from secondary NN data directories but can not guarantee no data loss. If there is no useful data, you can always issue a NN format and start fresh. (You will need to manually update tez and mapreduce apps, you can get the information from manual install documentation) 2. Ambari not starting. Clean up /var/log and start it back. 3. Most likely HDFS services are not up. Filled up disk can kill processes. Once ambari is up, see which one is running and which one is not. Againt if NN data is lost, then NN will not start up.

rbiswas1 · ‎05-06-2016

@santosh rai Out of curiosity do you have any specific used case for using 2.2?

ravi1 · ‎05-05-2016

Not sure how your sandbox missed that folder in the first place. I sent you the steps from manual install. Should have been there in the first place, but since it was not there, we followed manual steps to put them there. Tez and mapreduce use the tar.gz files from /hdp/apps/<hdp_version>/ for submitting applications.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Pig jobs from pig view fail with unexcepted en...

Re: Hadoop in real life and Practical use and Proc...

Re: Will i be charged to my credit card for choosi...

Re: Hue Oozie: UnknownHostException error during s...

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: zookeeper and hiveserver2 relation

Re: Vagrant password

Re: Cluster regain Help: One of my Namenode direc...

Re: Where can I find HDP 2.2 version sandbox for d...

Re: Tez execution error during Hive query in HDP t...