Member since
01-24-2017
2
Posts
0
Kudos Received
0
Solutions
01-24-2017
11:41 AM
@ssivachandran thank you so much. It works now!!!
... View more
01-24-2017
07:55 AM
Is there anyone who can guide me on how to add gcs-connector.jar to Hadoop on HDP 2.5 so that I can distcp from/to Google Cloud Storage? I followed this Manually installing the connector article and got this error [centos@namenode ~]$ hadoop distcp gs://bucket/image.png /
17/01/24 07:33:22 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.0-hadoop2
17/01/24 07:33:24 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[gs://bucket/image.png], targetPath=/, targetPathExists=true, filtersFile='null'}
17/01/24 07:33:25 INFO impl.TimelineClientImpl: Timeline service address: http://internal:8188/ws/v1/timeline/
17/01/24 07:33:25 INFO client.RMProxy: Connecting to ResourceManager at internal/xxx.xxx.xxx.xxx:8050
17/01/24 07:33:26 INFO client.AHSProxy: Connecting to Application History server at internal/xxx.xxx.xxx.xxx:10200
17/01/24 07:33:28 WARN gcs.GoogleHadoopFileSystemBase: No working directory configured, using default: 'gs://bucket/'
17/01/24 07:33:30 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
17/01/24 07:33:30 INFO tools.SimpleCopyListing: Build file listing completed.
17/01/24 07:33:30 INFO tools.DistCp: Number of paths in the copy list: 1
17/01/24 07:33:30 INFO tools.DistCp: Number of paths in the copy list: 1
17/01/24 07:33:31 INFO impl.TimelineClientImpl: Timeline service address: http://internal:8188/ws/v1/timeline/
17/01/24 07:33:31 INFO client.RMProxy: Connecting to ResourceManager at internal/xxx.xxx.xxx.xxx:8050
17/01/24 07:33:31 INFO client.AHSProxy: Connecting to Application History server at internal/xxx.xxx.xxx.xxx:10200
17/01/24 07:33:31 INFO mapreduce.JobSubmitter: number of splits:1
17/01/24 07:33:32 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1485241695662_0003
17/01/24 07:33:33 INFO impl.YarnClientImpl: Submitted application application_1485241695662_0003
17/01/24 07:33:33 INFO mapreduce.Job: The url to track the job: http://internal:8088/proxy/application_1485241695662_0003/
17/01/24 07:33:33 INFO tools.DistCp: DistCp job-id: job_1485241695662_0003
17/01/24 07:33:33 INFO mapreduce.Job: Running job: job_1485241695662_0003
17/01/24 07:33:39 INFO mapreduce.Job: Job job_1485241695662_0003 running in uber mode : false
17/01/24 07:33:39 INFO mapreduce.Job: map 0% reduce 0%
17/01/24 07:33:47 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
... 17 more
17/01/24 07:33:52 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
... 17 more
17/01/24 07:33:56 INFO mapreduce.Job: Task Id : attempt_1485241695662_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2214)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2746)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2759)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2777)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:386)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:218)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2120)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2212)
... 17 more
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
17/01/24 07:34:01 INFO mapreduce.Job: map 100% reduce 0%
17/01/24 07:34:01 INFO mapreduce.Job: Job job_1485241695662_0003 failed with state FAILED due to: Task failed task_1485241695662_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
17/01/24 07:34:01 INFO mapreduce.Job: Counters: 8
Job Counters
Failed map tasks=4
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=14884
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14884
Total vcore-milliseconds taken by all map tasks=14884
Total megabyte-milliseconds taken by all map tasks=15241216
17/01/24 07:34:01 ERROR tools.DistCp: Exception encountered
java.io.IOException: DistCp failure: Job job_1485241695662_0003 has failed: Task failed task_1485241695662_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
at org.apache.hadoop.tools.DistCp.waitForJobCompletion(DistCp.java:215)
at org.apache.hadoop.tools.DistCp.execute(DistCp.java:158)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:462 hadoop fs -cp gs://... works fine but it's very slow when moving a very large files. I've added the gcs-connector.jar to every nodes in the cluster (NameNode, SNameNode, DataNodes) and also config the class path to add the jar file. I've added this line to "hadoop-env template" on Ambari UI export HADOOP_CLASSPATH=/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar:$HADOOP_CLASSPATH Result from running "hadoop classpath" in NameNode /usr/hdp/2.5.3.0-37/hadoop/conf:/usr/hdp/2.5.3.0-37/hadoop/lib/*:/usr/hdp/2.5.3.0-37/hadoop/.//*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/./:/usr/hdp/2.5.3.0-37/hadoop-hdfs/lib/*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/.//*:/usr/hdp/2.5.3.0-37/hadoop-yarn/lib/*:/usr/hdp/2.5.3.0-37/hadoop-yarn/.//*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/lib/*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/.//*:/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar::/usr/hdp/2.5.3.0-37/tez/*:/usr/hdp/2.5.3.0-37/tez/lib/*:/usr/hdp/2.5.3.0-37/tez/conf Result from running "hadoop classpath" in one of my DataNode /usr/hdp/2.5.3.0-37/hadoop/conf:/usr/hdp/2.5.3.0-37/hadoop/lib/*:/usr/hdp/2.5.3.0-37/hadoop/.//*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/./:/usr/hdp/2.5.3.0-37/hadoop-hdfs/lib/*:/usr/hdp/2.5.3.0-37/hadoop-hdfs/.//*:/usr/hdp/2.5.3.0-37/hadoop-yarn/lib/*:/usr/hdp/2.5.3.0-37/hadoop-yarn/.//*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/lib/*:/usr/hdp/2.5.3.0-37/hadoop-mapreduce/.//*:/var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar::mysql-connector-java.jar:/usr/hdp/2.5.3.0-37/tez/*:/usr/hdp/2.5.3.0-37/tez/lib/*:/usr/hdp/2.5.3.0-37/tez/conf I can confirm that there is a file /var/lib/gcs-connector/gcs-connector-latest-hadoop2.jar in every nodes. I also add these 3 properties to Custom core-site on Ambari UI fs.gs.project.id
fs.gs.impl
fs.AbstractFileSystem.gs.impl
Any suggestion on how to make the distcp works?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop