<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: config hdfs to distcp to/from google cloud storage in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143698#M52391</link>
    <description>&lt;A rel="user" href="https://community.cloudera.com/users/15584/phakinche.html" nodeid="15584"&gt;@Phakin Cheangkrachange&lt;/A&gt;The DistCp is a mapreduce job and the issue seems to be with the JVM created for the job. That is the "mapreduce.application.classpath" might not have picked this jar file before creating the JVM. &lt;P&gt;Could you please add /var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar at the end of mapreduce.application.classpath in the MapReduce2 service from Amabri and recycle the service so that the new JVM would pick this jar.&lt;/P&gt;&lt;P&gt;Let me know if it helps.&lt;/P&gt;</description>
    <pubDate>Tue, 24 Jan 2017 18:59:37 GMT</pubDate>
    <dc:creator>ssivachandran</dc:creator>
    <dc:date>2017-01-24T18:59:37Z</dc:date>
    <item>
      <title>config hdfs to distcp to/from google cloud storage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143697#M52390</link>
      <description>&lt;P&gt;Is there anyone who can guide me on how to add gcs-connector.jar to Hadoop on HDP 2.5 so that I can distcp from/to Google Cloud Storage?&lt;/P&gt;&lt;P&gt;I followed this &lt;A target="_blank" href="https://cloud.google.com/hadoop/google-cloud-storage-connector#manualinstallation"&gt;Manually installing the connector article&lt;/A&gt; and got this error&lt;/P&gt;&lt;PRE&gt;[centos@namenode ~]$ hadoop distcp gs://bucket/image.png /
17/01/24 07:33:22 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.0-hadoop2
17/01/24 07:33:24 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[gs://bucket/image.png], targetPath=/, targetPathExists=true, filtersFile='null'}
17/01/24 07:33:25 INFO impl.TimelineClientImpl: Timeline service address: &lt;A href="http://internal:8188/ws/v1/timeline/" target="_blank"&gt;http://internal:8188/ws/v1/timeline/&lt;/A&gt;
17/01/24 07:33:25 INFO client.RMProxy: Connecting to ResourceManager at internal/xxx.xxx.xxx.xxx:8050
17/01/24 07:33:26 INFO client.AHSProxy: Connecting to Application History server at internal/xxx.xxx.xxx.xxx:10200
17/01/24 07:33:28 WARN gcs.GoogleHadoopFileSystemBase: No working directory configured, using default: 'gs://bucket/'
17/01/24 07:33:30 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
17/01/24 07:33:30 INFO tools.SimpleCopyListing: Build file listing completed.
17/01/24 07:33:30 INFO tools.DistCp: Number of paths in the copy list: 1
17/01/24 07:33:30 INFO tools.DistCp: Number of paths in the copy list: 1
17/01/24 07:33:31 INFO impl.TimelineClientImpl: Timeline service address: &lt;A href="http://internal:8188/ws/v1/timeline/" target="_blank"&gt;http://internal:8188/ws/v1/timeline/&lt;/A&gt;
17/01/24 07:33:31 INFO client.RMProxy: Connecting to ResourceManager at internal/xxx.xxx.xxx.xxx:8050
17/01/24 07:33:31 INFO client.AHSProxy: Connecting to Application History server at internal/xxx.xxx.xxx.xxx:10200
17/01/24 07:33:31 INFO mapreduce.JobSubmitter: number of splits:1
17/01/24 07:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485241695662_0003
17/01/24 07:33:33 INFO impl.YarnClientImpl: Submitted application application_1485241695662_0003
17/01/24 07:33:33 INFO mapreduce.Job: The url to track the job: &lt;A href="http://internal:8088/proxy/application_1485241695662_0003/" target="_blank"&gt;http://internal:8088/proxy/application_1485241695662_0003/&lt;/A&gt;
17/01/24 07:33:33 INFO tools.DistCp: DistCp job-id: job_1485241695662_0003
17/01/24 07:33:33 INFO mapreduce.Job: Running job: job_1485241695662_0003
17/01/24 07:33:39 INFO mapreduce.Job: Job job_1485241695662_0003 running in uber mode : false
17/01/24 07:33:39 INFO mapreduce.Job:  map 0% reduce 0%
17/01/24 07:33:47 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
	... 17 more




17/01/24 07:33:52 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
	... 17 more




17/01/24 07:33:56 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
	at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
	at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
	at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
	... 17 more




Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143




17/01/24 07:34:01 INFO mapreduce.Job:  map 100% reduce 0%
17/01/24 07:34:01 INFO mapreduce.Job: Job job_1485241695662_0003 failed with state FAILED due to: Task failed task_1485241695662_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0




17/01/24 07:34:01 INFO mapreduce.Job: Counters: 8
	Job Counters 
		Failed map tasks=4
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=14884
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=14884
		Total vcore-milliseconds taken by all map tasks=14884
		Total megabyte-milliseconds taken by all map tasks=15241216
17/01/24 07:34:01 ERROR tools.DistCp: Exception encountered 
java.io.IOException: DistCp failure: Job job_1485241695662_0003 has failed: Task failed task_1485241695662_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0




	at org.apache.hadoop.tools.DistCp.waitForJobCompletion(DistCp.java:215)
	at org.apache.hadoop.tools.DistCp.execute(DistCp.java:158)
	at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.tools.DistCp.main(DistCp.java:462&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;hadoop fs -cp gs://... works&lt;/STRONG&gt; fine but it's very slow when moving a very large files. &lt;/P&gt;&lt;P&gt;I've added the gcs-connector.jar to every nodes in the cluster (NameNode, SNameNode, DataNodes) and also config the class path to add the jar file. &lt;/P&gt;&lt;P&gt;I've added this line to "hadoop-env template" on Ambari UI&lt;/P&gt;&lt;PRE&gt;export HADOOP_CLASSPATH=/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar:$HADOOP_CLASSPATH&lt;/PRE&gt;&lt;P&gt;Result from running "hadoop classpath" in NameNode &lt;/P&gt;&lt;PRE&gt;/usr/hdp/2.5.3.0-37/hadoop/conf:/usr/hdp/2.5.3.0-37/hadoop/lib/*:/usr/hdp/2.5.3.0-37/hadoop/.//*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/./:/usr/hdp/2.5.3.0-37/hadoop-hdfs/lib/*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/.//*:/usr/hdp/2.5.3.0-37/hadoop-yarn/lib/*:/usr/hdp/2.5.3.0-37/hadoop-yarn/.//*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/lib/*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/.//*:/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar::/usr/hdp/2.5.3.0-37/tez/*:/usr/hdp/2.5.3.0-37/tez/lib/*:/usr/hdp/2.5.3.0-37/tez/conf&lt;/PRE&gt;&lt;P&gt;Result from running "hadoop classpath" in one of my DataNode&lt;/P&gt;&lt;PRE&gt;/usr/hdp/2.5.3.0-37/hadoop/conf:/usr/hdp/2.5.3.0-37/hadoop/lib/*:/usr/hdp/2.5.3.0-37/hadoop/.//*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/./:/usr/hdp/2.5.3.0-37/hadoop-hdfs/lib/*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/.//*:/usr/hdp/2.5.3.0-37/hadoop-yarn/lib/*:/usr/hdp/2.5.3.0-37/hadoop-yarn/.//*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/lib/*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/.//*:/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar::mysql-connector-java.jar:/usr/hdp/2.5.3.0-37/tez/*:/usr/hdp/2.5.3.0-37/tez/lib/*:/usr/hdp/2.5.3.0-37/tez/conf&lt;/PRE&gt;&lt;P&gt;I can confirm that there is a file /var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar in every nodes.&lt;/P&gt;&lt;P&gt;I also add these 3 properties to Custom core-site on Ambari UI&lt;/P&gt;&lt;PRE&gt;fs.gs.project.id
fs.gs.impl
fs.AbstractFileSystem.gs.impl
&lt;/PRE&gt;&lt;P&gt;Any suggestion on how to make the distcp works?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 15:55:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143697#M52390</guid>
      <dc:creator>phakin_che</dc:creator>
      <dc:date>2017-01-24T15:55:13Z</dc:date>
    </item>
    <item>
      <title>Re: config hdfs to distcp to/from google cloud storage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143698#M52391</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/15584/phakinche.html" nodeid="15584"&gt;@Phakin Cheangkrachange&lt;/A&gt;The DistCp is a mapreduce job and the issue seems to be with the JVM created for the job. That is the "mapreduce.application.classpath" might not have picked this jar file before creating the JVM. &lt;P&gt;Could you please add /var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar at the end of mapreduce.application.classpath in the MapReduce2 service from Amabri and recycle the service so that the new JVM would pick this jar.&lt;/P&gt;&lt;P&gt;Let me know if it helps.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 18:59:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143698#M52391</guid>
      <dc:creator>ssivachandran</dc:creator>
      <dc:date>2017-01-24T18:59:37Z</dc:date>
    </item>
    <item>
      <title>Re: config hdfs to distcp to/from google cloud storage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143699#M52392</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14990/ssivachandran.html" nodeid="14990"&gt;@ssivachandran&lt;/A&gt; thank you so much. It works now!!!&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 19:41:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143699#M52392</guid>
      <dc:creator>phakin_che</dc:creator>
      <dc:date>2017-01-24T19:41:02Z</dc:date>
    </item>
    <item>
      <title>Re: config hdfs to distcp to/from google cloud storage</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143700#M52393</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15584/phakinche.html" nodeid="15584"&gt;@Phakin Cheangkrachange&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Glad to know that it worked! Kindly vote the answer since it helped you in resolving.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jan 2017 19:59:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/config-hdfs-to-distcp-to-from-google-cloud-storage/m-p/143700#M52393</guid>
      <dc:creator>ssivachandran</dc:creator>
      <dc:date>2017-01-24T19:59:54Z</dc:date>
    </item>
  </channel>
</rss>

