Support Questions

mtrepanier · ‎06-16-2017

Hi,

I'm looking to ingest a large amount of data from a public google cloud bucket, but our cluster is currently missing the google cloud connector. I thought copying the connector jar to /opt/cloudera/parcels/CDH/lib/hadoop would be sufficient, but after running the below command I am receiving the following error: No FileSystem for scheme: gs.

hdfs dfs -cp gs://gnomad-public/release-170228/gnomad.genomes.r2.0.1.sites.vds /my/local/hdfs/filesystem

Are any additional steps beyond this necessary?

mbigelow · ‎06-16-2017

It took a bit to track it down but here is the info you are looking for. There are really just two steps, download and put the jar in the Hadoop classpath, as you have already. The other step is to add the gs settings to the core-site.xml so that HDFS is aware of it as a additional filesystem (i.e. knows the class name and impl name).

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md

View solution in original post

mbigelow · ‎06-16-2017

It took a bit to track it down but here is the info you are looking for. There are really just two steps, download and put the jar in the Hadoop classpath, as you have already. The other step is to add the gs settings to the core-site.xml so that HDFS is aware of it as a additional filesystem (i.e. knows the class name and impl name).

https://github.com/GoogleCloudPlatform/bigdata-interop/blob/master/gcs/INSTALL.md

mtrepanier · ‎06-16-2017

Appreciate the quick response! I actually was originally following the guide you posted. In particular, I modified core-site.xml located at /etc/hadoop/conf as shown below. However, when running hdfs dfs -ls gs://gnomad-public (a public gcloud bucket), I get the following ClassNotFound: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem

It appears that this issue is tied to the connector jar not being part of the classpath, which is odd as I created a symbolic link to it both in /opt/cloudera/parcels/CDH/lib/hadoop and /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce. (The file is actually located in /opt/cloudera/parcels/CDH/jars - this is where all of the default connector jars were initially placed and I figured following the default placement would be the best practice).

<property>
<name>fs.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
<description>The FileSystem for gs: (GCS) uris.</description>
</property>
<property>
<name>fs.AbstractFileSystem.gs.impl</name>
<value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
<description>
The AbstractFileSystem for gs: (GCS) uris. Only necessary for use with Hadoop 2.
</description>
</property>
<property>
<name>fs.gs.project.id</name>
<value>PROJECTNAME</value>
<description>
Required. Google Cloud Project ID with access to configured GCS buckets.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.enable</name>
<value>true</value>
<description>
Whether to use a service account for GCS authorizaiton. If an email and
keyfile are provided (see google.cloud.auth.service.account.email and
google.cloud.auth.service.account.keyfile), then that service account
willl be used. Otherwise the connector will look to see if it running on
a GCE VM with some level of GCS access in it's service account scope, and
use that service account.
</description>
</property>
<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>'/etc/hadoop/conf/KEYFILENAME.json'</value>
<description>
The JSON key file of the service account used for GCS
access when google.cloud.auth.service.account.enable is true.
</description>
</property>

mbigelow · ‎06-16-2017

Did you restart HDFS after these changes?

csguna · ‎06-18-2017

Based on the eror that you have , the jar file is clearing missing a .class file (GoogleHadoopFileSystem)

you can extract the jar using 7 zip any tool to inspect .

.class file path that it is missing .

com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem

Solution is to download a fresh required jar , extra and check if it is having the all the .class file and add that to the path . this should resolve the error.

csguna · ‎06-18-2017

mtrepanier · ‎06-19-2017

Went ahead and downloaded a fresh .jar and followed the steps in the guide posted above - got it working! Appreciate the help.

Cloudera Community

Support Questions

Installing HDFS Google Cloud Connector

Hive and Google Cloud Storage

Accessing Google Cloud Storage via HDP

Cloudbreak - Google Cloud prerequisites

HDP Spark Connector for Google Cloud

config hdfs to distcp to/from google cloud storage

Streaming Ingest of Google Sheets with HDF 2.0

Configuring Google Cloud Platform and Cloudbreak C...

How to setup Hive Warehouse Connector in CML (CDP ...

Configuring Access to Google Cloud Storage

HDF installation on EC2