Created on 02-21-2019 01:13 PM - edited 09-16-2022 07:10 AM
Hello Friends:
On a relatively new installation of CDH6.1 (parcels) with one node for CDH manager and a second node for Master and Slave services (combined), I'm getting this error:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "hdfs"'
after running this:
user$ /opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/bin/parquet-tools \ cat hdfs://tmp/1.parquet
Here is the output of hadoop classpath:
/etc/hadoop/conf:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop/.//*:/opt/cloudera/parcels/CDH- 6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-hdfs/./:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-hdfs/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hado op/libexec/../../hadoop-hdfs/.//*:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/.//*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/lib/hadoop/libexec/../../hadoop-yarn/lib/*:/opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0 .p0.770702/lib/hadoop/libexec/../../hadoop-yarn/.//*
Some pertinent environment variables:
user$ env | egrep -i 'hadoop|classpath' HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
Finally, there are two JAVA distributions installed; one OpenJDK and the other installed by the CDH6.x installation wizard. I tried running the above parquet-tools command with each JAVA distribution exported, but both yield the same error. Here are the JAVA distributions:
user$ ls -al /usr/java /usr/lib/jvm /usr/java: total 12 drwxr-xr-x 3 root root 4096 Feb 1 01:52 . drwxr-xr-x 14 root root 4096 Jan 21 21:01 .. lrwxrwxrwx 1 root root 21 Feb 1 01:52 current.d -> jdk1.8.0_141-cloudera drwxrwxr-x 8 root root 4096 Jan 21 21:01 jdk1.8.0_141-cloudera /usr/lib/jvm: total 24 drwxr-xr-x 4 root root 4096 Jan 21 20:44 . dr-xr-xr-x 44 root root 12288 Feb 6 19:02 .. lrwxrwxrwx 1 root root 26 Jan 21 20:44 java -> /etc/alternatives/java_sdk lrwxrwxrwx 1 root root 32 Jan 21 20:44 java-1.8.0 -> /etc/alternatives/java_sdk_1.8.0 lrwxrwxrwx 1 root root 40 Jan 21 20:44 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdk drwxr-xr-x 7 root root 4096 Jan 21 20:44 java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386 drwxr-xr-x 7 root root 4096 Jan 21 20:44 java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 lrwxrwxrwx 1 root root 34 Jan 21 20:44 java-openjdk -> /etc/alternatives/java_sdk_openjdk lrwxrwxrwx 1 root root 21 Jan 21 20:44 jre -> /etc/alternatives/jre lrwxrwxrwx 1 root root 27 Jan 21 20:44 jre-1.8.0 -> /etc/alternatives/jre_1.8.0 lrwxrwxrwx 1 root root 35 Jan 21 20:44 jre-1.8.0-openjdk -> /etc/alternatives/jre_1.8.0_openjdk lrwxrwxrwx 1 root root 49 Jan 21 20:44 jre-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386 -> java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.i386/jre lrwxrwxrwx 1 root root 51 Jan 21 20:44 jre-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64 -> java-1.8.0-openjdk-1.8.0.191.b12-1.el7_6.x86_64/jre lrwxrwxrwx 1 root root 29 Jan 21 20:44 jre-openjdk -> /etc/alternatives/jre_openjdk
Note that the setup/cluster is set to use/prefer CDH's JAVA.
Any ideas?
P.S. But for this, the entire cluster is (and has been) running perfectly.
Thank you!
Created 02-26-2019 03:29 PM
Hi @prismalytics,
As documented in the Apache Github, we need to execute with hadoop jar command for a file on HDFS filesystem.
---
#Run from hadoop
See Commands Usage for command to use
hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
---
So could you please execute hadoop jar command as following?
hadoop jar /opt/cloudera/parcels/<CDH-VERSION>/jars/parquet-tools-<VERSION>.jar <command> <hdfs path to parquet file>
e.g.
hadoop jar /opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs://tmp/1.parquet
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created 02-26-2019 03:29 PM
Hi @prismalytics,
As documented in the Apache Github, we need to execute with hadoop jar command for a file on HDFS filesystem.
---
#Run from hadoop
See Commands Usage for command to use
hadoop jar ./parquet-tools-<VERSION>.jar <command> my_parquet_file.lzo.parquet
---
So could you please execute hadoop jar command as following?
hadoop jar /opt/cloudera/parcels/<CDH-VERSION>/jars/parquet-tools-<VERSION>.jar <command> <hdfs path to parquet file>
e.g.
hadoop jar /opt/cloudera/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs://tmp/1.parquet
Thanks and hope this helps,
Li
Li Wang, Technical Solution Manager
Created on 02-27-2019 11:21 AM - edited 02-27-2019 11:27 AM
Hi @lwang:
Yes, your resolution worked with one minor tweak:
Need hdfs:/// instead of hdfs:// :
user$ hadoop jar /opt/cloudera/parcels/CDH/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs:///tmp/1.parquet
or, if fully-qualifying the HDFS host, then the following (where hdfs:// will do):
user$ hadoop jar /opt/cloudera/parcels/CDH/jars/parquet-tools-1.9.0-cdh6.1.0.jar cat hdfs://vps00:8020/tmp/1.parquet
Thank you so very much! =:)
Created 02-27-2019 11:47 AM
Li Wang, Technical Solution Manager