Member since
07-18-2016
26
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2573 | 10-23-2017 06:11 AM | |
1337 | 03-09-2017 09:32 AM | |
10515 | 02-06-2017 04:09 AM |
04-17-2018
07:14 AM
Interesting, @Harsh J, we'll try that as well, and post back. Edit: it seems that this workaround indeed works - thanks again. Let's hope it gets patched soon - it seems relatively trivial to resolve, but I might be wrong. 🙂 Cheers
... View more
04-16-2018
07:52 AM
Hi @Harsh J, I know I'm reviving an old thread, but can you please comment on the fact that this "fix" still does not work in CDH 5.12.1, managed by CM? Even if the folders are owned by root, with 700, the folders still get created, and data is being written to an underlying FS, often /, which is not really good, don't you agree? Thanks, Milan
... View more
12-25-2017
12:17 AM
Hi @SandyCT, well, this system is broken a bit more than I expected, since owner of groups is also damaged. What did you run exactly? If I had to guess, some recursive chmod on /, or /etc? Before you try this last option, try switching to console (ctrl+alt+F1 on a normal pc, not sure about the vm), and logging in as root, with password "cloudera". If this does not work, for whichever reason, here's a way to reboot Centos 6 in "safe mode". I suggest you make a backup of the whole vm file/directory first. https://lintut.com/reset-forgotten-root-password-in-centos/ If this does not work (I cannot test now, since I don't have my vm around), replace " 1 " in the tutorial with "rw init=/bin/bash" In either case, this will grant you root, but fixing your vm might take a while. For example, your sudo command should be "---s--x--x", or something to that regard, /etc/sudoers "-r--r-----", and "/etc/group" -rw-r--r--. Have fun & good luck! 🙂
... View more
12-21-2017
11:47 AM
Hi @SandyCT, you should try the suggestion from my last message - if you didn't break "su" binary as well, command "su -" might give you root access. When prompted, give it the root password ("cloudera" by default, if you haven't changed it). Cheers, samurai
... View more
12-20-2017
01:06 AM
Actually, suggestion from @csguna gives me an idea - @SandyCT, if you can open a normal terminal, you can also execute "su -" as command. It will ask for root password, so try to give it same default password you had for cloudera user - by default it is "cloudera". If you are lucky, and "su" binary is unaffected by your chmod/chown, you just might get to root, and from there we'll manage. Please try, and let us know. Regards
... View more
12-20-2017
01:02 AM
Hello @csguna, I don't mean to intrude, but the problem described is caused by the permission or owner of the sudo command being changed, and sudo command has built-in checks against tampering, so that has to be fixed externally, for example by rebooting into some form of rescue mode. That being said, the email I've received from your post said "sudo su Password - cloudera". Did you edit your post? Or is there something else going on? Regards
... View more
12-18-2017
03:29 AM
Hi @SandyCT, can you give us more details? For example, which version of the Cloudera QuickStart VM are you using? There is a way to log in as root, but it's non-trivial, so I suggest we confirm there's no other way. For starters, can you post the output of "ls -la /usr/bin/sudo" from that QuickStart VM? These virtual machines are based on Centos 6.4, which by default can be rebooted in "safe mode", which should give you access to both your hive data, and the fix to the sudo ownership. But if you ask me, I would 1) fix sudo 2) back up all I need from that VM (hive, etc), and 3) start from scratch, using those backups. If you did what I think you did, sudo will not be the only thing that is not working. Regards, samurai
... View more
12-15-2017
08:19 AM
Hi reutsc, I guess it's a bit late, but in case you haven't managed to resolve this, the property that holds this information is called "dfs.journalnode.edits.dir", and it will point to a directory holding a folder named after a logical name of your cluster. So, for example, if the dfs.journalnode.edits.dir is set to /data/jn, the whole path will be /data/jn/xyz-hdp03. That folder should contain a folder "current", and a file called "in_use.lock", if it is in use. You can also tell by the age of the lock file, and by searching for the journalnode process in your process list. In case you are using Cloudera Manager, you can find the mentioned property in cluster -> HDFS -> Configuration. Hope this helps, camypaj
... View more
11-09-2017
07:40 AM
1 Kudo
I had the same problem until just now, on 5.12.x, and in my case it was caused by a problem with ssl, so if you use ssl, please read the following: "this might be caused by having more than one oracle java installed, and even worse, any version of openjdk java. Namely, make sure you have added the ca you used for the ssl to the java keystore of the java version you are using (you can find that out in process list). Also, make sure that keytool you are using is belonging to this version of java - so it's best to have only one version installed, or (if that is unavoidable), use the full path to keytool. Hope it helps."
... View more
11-09-2017
07:37 AM
I've discovered by painful experience that this might be caused by having more than one oracle java installed, and even worse, any version of openjdk java. Namely, make sure you have added the ca you used for the ssl to the java keystore of the java version you are using (you can find that out in process list). Also, make sure that keytool you are using is belonging to this version of java - so it's best to have only one version installed, or (if that is unavoidable), use the full path to keytool. Hope it helps.
... View more
10-25-2017
01:25 AM
No problem, I'm glad if it helps. 🙂 With secure connection you mean SSL or Keros/Sentry? I know SSL can be done, since I've done it in production environment, but with these dev/testing local setups, I don't think I did. The same goes for Kerberos - when I tested it, I did enable those, but normally you don't need that with local virtual machines, and dev/test data. Cheers 🙂
... View more
10-24-2017
08:16 AM
1 Kudo
Hi again Anna, regarding Windows question, it doesn't seem to be supported: link to CM documentation and regarding QuickStart VM, if I understood your question correctly, you can always set up your Hypervisor (VirtualBox, VmWare, KVM, etc) to forward certain ports from your workstation to the virtual machine, or to allow communication between 2 virtual machines, for example Qlick View vm with CM vm. I've tested both of these scenarios, and I can confirm that is possible. For example, I've had a VM running Rstudio, connecting to a VM running QuickStart from Cloudera. If I didn't understand your question correctly, feel free to give me some pointers, or more details 🙂 Cheers
... View more
10-23-2017
06:11 AM
Hi Annkaa, welcome to Cloudera community. 🙂 I like your idea to ask a question along with your intro message, makes it more practical 🙂 As for your question, Quickstart VM is definitely easier to begin with, but I would suggest you start testing in your own local VM, using for example VirtualBox. When you get the hang of it, you can proceed with the next step, installing a (pseudo-distributed) cluster with Cloudera Manager, preferably still in local virtual machines, and as the next step I would go with company servers. The only reason why I would go through these steps is that you have better control on your local vms, which is important for learning. One more thing that you need to consider is that there is no way to "install" QuickStart VM on an empty, clean, machine - you just import a pre-installed, working, single-node cluster in a virtual machine. Also, QuickStart VM is not meant for distributed, multi-node use, so I would definitely install a Cloudera Manager if you want to manage more nodes in the company infrastructure. As for the server O.S., I would (personally) go with some form of Linux, preferably a supported distro and version, because it will make it easier, and you'll find answers to common questions that apply to your situation more easily. I hope that answers some of your questions, but let me know if it doesn't. Cheers, samurai
... View more
03-09-2017
09:32 AM
2 Kudos
Hi @aawasthi, I know it has been a while since you asked this question, but I ran into a similar issue, and it can be caused by many things. What I would do in your case is to check if there is some firewall between the 2 datanodes (you can try with telnet), and if there isn't, try checking the number of *_wait connections on the source datanodes. I have found that some of the replicas for what I was trying to copy were placed on a datanode which was technically working, but had a lot of connections in a close_wait state, which were using the overall limits. Feel free to take a look at the answer below:
https://community.hortonworks.com/questions/38822/hdfs-exception.html#answer-38817
and the one below that, if you need more details.
I hope it helps,
camypaj
... View more
02-20-2017
07:38 AM
update to the current state: standby-standby state was caused by a corrupt state in zookeeper, and that was fixed by re-initializing the zookeeper state. Other questions still remain.
... View more
02-16-2017
02:11 AM
I guess I spoke too fast - on first attempt of "hadoop fs -ls", both namenodes ended up in a "standby" state..
... View more
02-16-2017
02:00 AM
update: after some time, I opted to proceed with the standby namenode, and that went fast, without problems. Then I checked the web interface on the standby namenode, it was fine, and then I discovered that primary namenode started the web interface as well, at some point between my post and this reply. Unfortunately, nothing in the log indicates the exact time.. I'll just assume that it is safe to proceed with the standby namenode after waiting for 3-5 mins 😕
... View more
02-15-2017
08:15 AM
Hi Cloudera Community, I hate to start a new topic, but I cannot seem to find a meaningful answer to this problem (I am encountering it for the second time now). Also, sorry for the rather long post. So, in short, I (ops) am upgrading a prelive cluster from cdh4.6 to cdh5.7 (hosted on wheezy). The reason we are going with cdh5.7 is because this is a multi-phase project, and QA upgrade was tested with 5.7. Configs and packages are managed by puppet, so they are relatively clean and consistent. I was following these steps https://www.cloudera.com/documentation/enterprise/5-7-x/topics/cdh_ig_earlier_cdh5_upgrade.html and came to the step of hdfs upgrade: sudo service hadoop-hdfs-namenode upgrade Instructions say: Look for a line that confirms the upgrade is complete, such as: /var/lib/hadoop-hdfs/cache/hadoop/dfs/<name> is complete.
The NameNode upgrade process can take a while, depending on the number of files. So, the command runs without error, and exits (like I expect from a init script) Well, first of all, I've never found this "is completed" message in the log file, or any similar message, when I was doing this upgrade for the first time (on qa cluster). Second of all, the namenode keeps running during (and after) this process (without the web interface, ofc), so at some point my datanodes start to connect, so I basically have to stop the namenode before starting it regularly. So, it might be that the cloudera steps are a bit misleading? My questions: 0) does it hurt that the datanodes are running? Should it be only namenode(s) and journalnodes? 1) when is it safe to stop the namenode? In other words, when is the operation "done"? After I see which of these in the namenode log: INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Starting upgrade of local storage directories.
old LV = -40; old CTime = 0.
new LV = -60; new CTime = XXXX
INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory XXXX
INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in XXXX
INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory XXXX
INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? false (staleImage=false, haEnabled=true, isRollingUpgrade=false)
INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at XXXX I also had these messages on all the journalnodes: INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Starting upgrade of storage directory XXX
INFO org.apache.hadoop.hdfs.qjournal.server.Journal: Starting upgrade of edits directory: .
old LV = -40; old CTime = 0.
new LV = -60; new CTime = XXXXXXX
INFO org.apache.hadoop.hdfs.server.namenode.NNUpgradeUtil: Performing upgrade of storage directory XXX 2) should I be concerned about this line: INFO org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector: No version file in FolderX (this was already asked by a forum member, but there was no answer) 3) I have checked the source from "NNUpgradeUtil.java" for both cdh5.7 and latest, and they both indicate that it should be done if there are both "current" and "previous" folders in your data folder is, since the last step is the rename of the tmp to previous. Please note that I didn't finalize the change yet, and I don't want to, because at this stage I want to be able to rollback if needed. 4) my setup was HA-enabled before (in 4.6), and I came across these instructions from Apache foundation website: https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html Now, is this applicable in my case? And if it is, why isn't this way mentioned in the Cloudera docs? For example, their "way" offers a way to check for status of the upgrade, and that obviously fails with what I did, since it's not a "rolling upgrade". Regards, Milan
... View more
Labels:
02-06-2017
04:09 AM
Hi sumit.nigam, I see that I'm a bit late to the party, but I found your thread while looking for a solution to a problem that I have as well. Are you hosting zookeepers on virtual machines, or on real hardware? Is zookeeper store in a dedicated disk? Depending on the version you are running, there are guides from zookeeper that might help: https://zookeeper.apache.org/doc/trunk/zookeeperStarted.html https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_maintenance For example, if your ZK cluster has been running for a while, maybe you need to clean up some of the logs. In either case, it always helps to know more details about the setup you are debugging. 🙂 Hope it helps (someone), camypaj
... View more
09-07-2016
01:10 AM
Yes, I was wondering the same thing! Can anyone else confirm that this is working? What you could try to pinpoint the problem is to test your spark setup, by using: MASTER=yarn-cluster /usr/lib/spark/bin/run-example SparkPi 10 or MASTER=yarn-client /usr/lib/spark/bin/run-example SparkPi 10 It is a wrapper script running the jobs via spark-submit. for me, yarn-cluster mode will not print any results, but I can confirm that the job works, and that it stores the result in the cluster. So, if it doesn't fail, and you can see that spark job was submitted and executed, the problem could still be on hive side.
... View more
09-06-2016
12:22 AM
Hm, I am seeing this on both 5.6, and 5.7.. What are you using as spark.master? I know it depends on the version of spark, but I'm using "yarn". Interesting tip: if you run "set;" in beeline, it will print out all the beeline settings (including hadoop ones, if configured properly).
... View more
09-05-2016
05:13 AM
No, unfortunately not - still having it. @invoker: Which versions do you have on your setup?
... View more
08-15-2016
11:43 AM
Thanks for the tip - I'll double-check my connection string.
... View more
08-12-2016
02:06 AM
Hi @rdub, can you please post how did you solve it? Since I'm facing a set of similar issues, and I either end up with "Error, return code 1" or "Error, return code 2". Thanks a lot, M
... View more
07-27-2016
03:24 AM
Hi @jehalter, I'm having the same issue, and after checking the systems and the documentation, I've come to the conclusion that the cdh5.7 rpm package for nfs gateway does not support systemd properly, so rpcbind was never called as a "dependent" service when starting nfs gateway. That is the main reason why it worked before. Interestingly enough, even though rpcbind is in startup, it will actually start only when started manually. It seems that if you use parcels for installing, they implemented a solution for this problem (judging by Cloudera documentation). I'll post more information here as I test out different solutions for this problem. Regards
... View more
07-26-2016
07:42 AM
Hello everybody, my name is Milan, and I'm pleased to meet you all 🙂 I have been in Hadoop operations since a year now. I was educated for programming business information systems, but I've been working in systems administration (mostly Linux) for about 10 years. I'm currently looking into better ways of deploying and monitoring cloudera hadoop clusters, for example CM vs rolling our own with puppet. My hobbies include cycling, swimming, reading, hiking, aikido.. 🙂 edit: I've noticed that either Okta or Cloudera Forums have a problem with a UTF character (non-ASCII) in my last name, and I have no idea how to fix it - there's no "edit profile" option.. Can anyone help?
... View more