About prismalytics

prismalytics · ‎02-27-2019

Hi @lwang: Yes, your resolution worked with one minor tweak: Need hdfs:/// instead of hdfs:// : user$ hadoop jar /opt/cloudera/parcels/CDH/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs:///tmp/1.parquet or, if fully-qualifying the HDFS host, then the following (where hdfs:// will do): user$ hadoop jar /opt/cloudera/parcels/CDH/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs://vps00:8020/tmp/1.parquet Thank you so very much! =:)

prismalytics · ‎02-21-2019

Hello Friends: On a relatively new installation of CDH6.1 (parcels) with one node for CDH manager and a second node for Master and Slave services (combined), I'm getting this error: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs"' after running this: user$ /opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/bin/parquet-tools \ cat hdfs://tmp/1.parquet Here is the output of hadoop classpath: /etc/hadoop/conf:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH- 6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hado op/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0 .p0.770702/lib/hadoop/libexec/../../hadoop-yarn/.//* Some pertinent environment variables: user$ env | egrep -i 'hadoop|classpath' HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop Finally, there are two JAVA distributions installed; one OpenJDK and the other installed by the CDH6.x installation wizard. I tried running the above parquet-tools command with each JAVA distribution exported, but both yield the same error. Here are the JAVA distributions: user$ ls -al /usr/java /usr/lib/jvm /usr/java: total 12 drwxr-xr-x 3 root root 4096 Feb 1 01:52 . drwxr-xr-x 14 root root 4096 Jan 21 21:01 .. lrwxrwxrwx 1 root root 21 Feb 1 01:52 current.d -> jdk1.8.0_141-cloudera drwxrwxr-x 8 root root 4096 Jan 21 21:01 jdk1.8.0_141-cloudera /usr/lib/jvm: total 24 drwxr-xr-x 4 root root 4096 Jan 21 20:44 . dr-xr-xr-x 44 root root 12288 Feb 6 19:02 .. lrwxrwxrwx 1 root root 26 Jan 21 20:44 java -> /etc/alternatives/java_sdk lrwxrwxrwx 1 root root 32 Jan 21 20:44 java-1.8.0 -> /etc/alternatives/java_sdk_1.8.0 lrwxrwxrwx 1 root root 40 Jan 21 20:44 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdk drwxr-xr-x 7 root root 4096 Jan 21 20:44 java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386 drwxr-xr-x 7 root root 4096 Jan 21 20:44 java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 lrwxrwxrwx 1 root root 34 Jan 21 20:44 java-openjdk -> /etc/alternatives/java_sdk_openjdk lrwxrwxrwx 1 root root 21 Jan 21 20:44 jre -> /etc/alternatives/jre lrwxrwxrwx 1 root root 27 Jan 21 20:44 jre-1.8.0 -> /etc/alternatives/jre_1.8.0 lrwxrwxrwx 1 root root 35 Jan 21 20:44 jre-1.8.0-openjdk -> /etc/alternatives/jre_1.8.0_openjdk lrwxrwxrwx 1 root root 49 Jan 21 20:44 jre-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386 -> java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386/jre lrwxrwxrwx 1 root root 51 Jan 21 20:44 jre-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 -> java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre lrwxrwxrwx 1 root root 29 Jan 21 20:44 jre-openjdk -> /etc/alternatives/jre_openjdk Note that the setup/cluster is set to use/prefer CDH's JAVA. Any ideas? P.S. But for this, the entire cluster is (and has been) running perfectly. Thank you!

prismalytics · ‎03-31-2017

So, I went for it and here are the answers (I found for myself) ... Having successfully upgraded to Parcels (so claims the CM UI), can I now remove all 110 CDH5 RPM from those six servers? YES. You can remove all RPMs except the ones named cloudera-manager-*.rpm. Those are still needed to start CM and manage the cluster. Very Good. Will doing so remove critical configurations in /etc/<service-type>/conf (and configurations elsewhere)? NO. Everything was preserved and still consulted by the Parcel-converted Cloudera Manager. Very Good. What about the Unix start/stop scripts, chkconfig-on configurations, etc? Will they be removed or affected? If yes, did the Parcel upgrade install its own version of these entries? BECAUSE, above, the 'cloudera-manager-*.rpm's were not removed, all Unix start/stop scrpts, chkconfig configurations, etc. were unaffected. Those RPM are what manage those items, and they were not removed. Very Good. So everything went well. There will always be some post conversion prodding and tweaking to get things totally clean and plumbed -- and this instance was no different -- but overall things went pretty well. I hope this helps others. =:)

prismalytics · ‎03-30-2017

Hello Friends: First some configuration information ... -- CDH and CM-Express versions: v5.10 -- O/S Dist and version: CentOS-6.8 Final --------------------------------------------------------------------- Six (qty. 6) nodes (actually LXC containers on one physical server: --------------------------------------------------------------------- [vps00]...........: The "master" for all services [vps01 - vps04]...: The "workers" for all services [vps10]...........: Where Cloudera Manager is installed. Via Cloudera Manager, I upgraded from RPMs to Parcels. Being 100% comfortable with RPMs and YUM, I didn't necessarily want to do this; but no alternative was provided to obtain Spark v2 RPMs. Each of the six servers is identical in terms of the CDH5.10 RPMs installed, so issuing the folllowing command on any of them return the same list of 110 CDH5.10 RPMs: user@vps$ rpm -qa | grep cdh5 Questions: Having successfully upgraded to Parcels (so claims the CM UI), can I now remove all 110 CDH5 RPM from those six servers? Will doing so remove critical configurations in /etc/<service-type>/conf (and configurations elsewhere)? What about the Unix start/stop scripts, chkconfig-on configurations, etc? Will they be removed or affected? If yes, did the Parcel upgrade install its own version of these entries? Basically, when I reboot these systems after removing those RPMs, will everything still start and work? Thank you in advance! =:)

prismalytics · ‎03-29-2017

I ended up repairing the issue after more work. The UI eventually revealed to me that the version of CDH (5.10) and CM (5.4) were not in sync. When I investigated why, I found that the entry in /etc/yum.repos.d/cloudera-manager.repo was pegged at CDH 5.4, so my 'yum updates' did not update CM (though it updated everything else). So that made sense. I updated the repo file, yum updated CM, and restarted. Then I let the UI walk me through a few upgrades and corrections of stale states. So I unfortunately don't know where the fix came. =:) But basically we can say that classpath.txt hadn't been updated properly. Now it has the correct entries. I'm glad I didn't brute-force things (not my style anyway). I doubt this one-off issue will help anyone, but who knows. =:)

prismalytics · ‎03-29-2017

Hello Friends: A quick preamble, and then a question ... I run CDH 5.10 on CentOS6 final for personal use (1-node for Master and CM; and 4-nodes for Workers/Slaves). They are all Linux LXC/Containers. It's been a while since I spun the cluster up, so the first thing I did was a 'yum update' of the nodes. No issues there. The cluster is up and running. All green statuses in CM. However, one thing that used to work but now does not, is the pyspark command. Whe I run now, I get the following exception: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream The jar file for that class is: /usr/lib/hadoop/client/hadoop-common.jar After troubleshooting -- again it's been a while since I used the cluster, so some things may have changed -- I determined that the SPARK_DIST_CLASSPATH environment variable was getting set, but did not contain any of the jars in that directory (including, of course, the one mentioned above). The script ultimately responsible for setting SPARK_DIST_CLASSPATH is: /etc/spark/conf/spark-env.sh and it consults the list of jars in the classpath.txt file to do so. Sadly, that file does not have any of the jars in the aforementioned directory. I could, of course, manually add them, but I found it odd that it did not have them in the first place. It seems like an important directory of jars to have included in classpath.txt. (again, in /usr/lib/hadoop/client/) So my questions (finally) ... =:) Any idea why the jars in that directory weren't included in classpath.txt? Was it perhaps just an upgrade issue? Has anyone had to manually add the jars in that directory? (again, in /usr/lib/hadoop/client/) Is /etc/spark/conf/classpath.txt meant to be edited? I'm curious. It seems odd that they were left out, and I don't want to just blindly add them in. Thank you in advance!

prismalytics · ‎01-15-2015

Hi Darren: I basically did what you prescribed, and it resolved the issue. I'll mark it as solved. Thank you again, prismal

prismalytics · ‎01-15-2015

Hi DLO: Thank you for the quick reply. Sadly, as it turns out, yes (good intuidion by you). =:) After building a pristine LXC (vps00), I cloned it. And while I changed network information for each clone, I didn't change the UUID piece (as I wasn't aware of it). Thank you for bringing that to my attention. CONFIRMATION: user@lxchost$ ssh -l user vps00 "cat /var/lib/cloudera-scm-agent/uuid" e8b6ade3-7838-47ed-ba8e-99bd3e5f97b5 user@lxchost$ ssh -l user vps01 "cat /var/lib/cloudera-scm-agent/uuid" e8b6ade3-7838-47ed-ba8e-99bd3e5f97b5 user@lxchost$ ssh -l user vps02 "cat /var/lib/cloudera-scm-agent/uuid" e8b6ade3-7838-47ed-ba8e-99bd3e5f97b5 user@lxchost$ ssh -l user vps03 "cat /var/lib/cloudera-scm-agent/uuid" e8b6ade3-7838-47ed-ba8e-99bd3e5f97b5 user@lxchost$ ssh -l user vps04 "cat /var/lib/cloudera-scm-agent/uuid" e8b6ade3-7838-47ed-ba8e-99bd3e5f97b5 Can this be hand-edited manually to fix (after shutting down the agent first, of course)? For example, changing the last two positions to make the UUIDs unique, like so? > vps00 -- UUID would end in '...00' > vps01 -- UUID would end in '...01' > vps02 -- UUID would end in '...02' > vps03 -- UUID would end in '...03' > vps04 -- UUID would end in '...04' Or, is there a preferred method for changing these? Also, do I have to purge some possibly cached information on the CM server? Finally, are/is there anything else lurking as non-unique in my clones (or just this UUID thing)? =:) Thank you again DLO, PSIAMAL

prismalytics · ‎01-15-2015

Hello Friends: THE DETAILS I have CDH5.3 (latest) installed on five CentOS-6.6 nodes [vps00 - vps04]. The CDH packages were installed via traditional YUM repositories. Each node is actually a Linux LXC container with it's own IP address (192.168.0.[180-184]). Each node also has the (latest) ClM Agent packages installed and running; and each /etc/cloudera-scm-agent/config.ini points to the CM. The Cloudera Manager server itself, is also a CentOS-6.6 Linux LXC container (vps10), and it's IP address is 192.168.0.190. All six hosts can communicate with each other, without issue; and there are no port restrictions either. We always run clusters via UNIX CLI without issue... As such HDFS, Map/Reduce jpbs, Storm, Zookeeper, and anything else you can think of, run flawlessly. THE PROBLEM: We decided to try Cloudera Manager UI today, but not all vps hosts are showing up in the UI. Although on vps10 (the Cloudera Manager server) we see agent connections to ALL vps nodes, the HOSTS section of the CM UI shows only one vps node at a time; and refreshing the page changes which (single) vps is shown. Strange. It seems like every is configured correctly, too. user@vps10$ sudo netstat -an | grep 192.168.0.18 tcp 0 0 192.168.0.190:7182 vps10 192.168.0.180:58605 ESTABLISHED vps00 tcp 0 0 192.168.0.190:7182 vps10 192.168.0.181:59878 ESTABLISHED vps01 tcp 0 0 192.168.0.190:7182 vps10 192.168.0.182:36202 ESTABLISHED vps02 tcp 0 0 192.168.0.190:7182 vps10 192.168.0.183:49203 ESTABLISHED vps03 tcp 0 0 192.168.0.190:7182 vps10 192.168.0.184:46649 ESTABLISHED vps04 Note that we want to simply add these (pre-configured) nodes manually to CM and to a Cluster within it, as they are already configured with CDH and working. Any ideas why all hosts aren't all showing up (... is only one showing up, and changes with each page refresh). Seems like a conflict. Hmm. Thank you in advance, PRISMAL

Online	Offline
Last Visited	‎02-27-2019 03:09 PM

Member Since	‎01-15-2015 12:57 PM
Last Visited	‎02-27-2019 03:09 PM
Posts	12
Kudos received	2

Cloudera Community

SOLVED: Re: Removing CDH-5.10 RPMs after upgrading...

SOLVED: Re: ${SPARK_DIST_CLASSPATH} does not inclu...

Re: parquet-tools :: No FileSystem for scheme hdfs

parquet-tools :: No FileSystem for scheme hdfs

SOLVED: Re: Removing CDH-5.10 RPMs after upgrading...

Removing CDH-5.10 RPMs after upgrading to Parcels ...

SOLVED: Re: ${SPARK_DIST_CLASSPATH} does not inclu...

${SPARK_DIST_CLASSPATH} does not include jars in /...

Re: Nodes (running CM Agent) not all showing up on...

Re: Nodes (running CM Agent) not all showing up on...

Nodes (running CM Agent) not all showing up on Clo...