Hi ,
I'm trying copy directory from hadoop to gcp
source='hdfs://Prod17HA/user/hive/warehouse/dl_tables.db/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23'
target='gs://eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4/testing/sc2_scam/'
export email="****"
export key_id="***"
export key="****"
hadoop distcp -libjars /usr/hdp/current/hadoop-client/gcs-connector-hadoop2-latest.jar -Dmapred.job.queue.name=test -Dfs.gs.impl="com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem" -Dfs.gs.auth.service.account.email="$email" -Dfs.gs.auth.service.account.private.key.id="$key_id" -Dfs.gs.auth.service.account.private.key="$key" -m $mapper "$source"/* $target
Error: java.lang.IllegalArgumentException: Invalid bucket name (eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4) or object name (testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.LegacyPathCodec.getPath(LegacyPathCodec.java:96)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.getGcsPath(GoogleHadoopFileSystem.java:172)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1044)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:236)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.net.URISyntaxException: Illegal character in path at index 109: gs://eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4/testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:588)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.LegacyPathCodec.getPath(LegacyPathCodec.java:91)
... 12 more
It seems the space in the directory/partition is creating the issue.
I was trying to retain the same format i.e '2020-11-25 23%3A25%3A23' in target.
Could someone please help me.
Thank you