Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Distcp failing while creating target directory with space

avatar
New Contributor
Hi ,
 
I'm trying copy directory from hadoop to gcp
 
source='hdfs://Prod17HA/user/hive/warehouse/dl_tables.db/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23'
target='gs://eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4/testing/sc2_scam/'
export email="****"
export key_id="***"
export key="****"
hadoop distcp -libjars /usr/hdp/current/hadoop-client/gcs-connector-hadoop2-latest.jar -Dmapred.job.queue.name=test -Dfs.gs.impl="com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem" -Dfs.gs.auth.service.account.email="$email" -Dfs.gs.auth.service.account.private.key.id="$key_id" -Dfs.gs.auth.service.account.private.key="$key" -m $mapper "$source"/* $target
 
Error: java.lang.IllegalArgumentException: Invalid bucket name (eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4) or object name (testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.LegacyPathCodec.getPath(LegacyPathCodec.java:96)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.getGcsPath(GoogleHadoopFileSystem.java:172)
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.getFileStatus(GoogleHadoopFileSystemBase.java:1044)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:236)
at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:52)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.net.URISyntaxException: Illegal character in path at index 109: gs://eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4/testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.checkChars(URI.java:3021)
at java.net.URI$Parser.parseHierarchical(URI.java:3105)
at java.net.URI$Parser.parse(URI.java:3053)
at java.net.URI.<init>(URI.java:588)
at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.LegacyPathCodec.getPath(LegacyPathCodec.java:91)
... 12 more
 
It seems the space in the directory/partition is creating the issue.
I was trying to retain the same format i.e '2020-11-25 23%3A25%3A23' in target.
 
Could someone please help me.
 
Thank you
1 REPLY 1

avatar
Master Guru

@SangramM The issue seems to be with the bucket name or path where you are trying to copy the data. 

look at the exception this is not accepting the target path due to some special character. 

Error: java.lang.IllegalArgumentException: Invalid bucket name (eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4) or object name (testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0)
Caused by: java.net.URISyntaxException: Illegal character in path at index 109: gs://eca02696p9u987tcbb55e477t94r3599baaf2469d4a99a19f78t5eebc4/testing/sc2_scam/src_rcv_ts=2020-11-25 23%3A25%3A23/000000_0

So I would suggest you to rectify the path name. 

This general thread might help you in terms of syntaxes: https://stackoverflow.com/questions/749709/how-to-deal-with-the-urisyntaxexception


Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.