Member since
11-28-2017
2
Posts
0
Kudos Received
0
Solutions
12-07-2017
07:13 PM
Thank you, the -filters option worked like a charm.
... View more
11-29-2017
01:17 AM
Hello, I'm running HDP 2.6, and attempting to use distcp to copy from a much older Hadoop cluster into HDP, so I'm running the distcp utility on the target cluster and accessing the source cluster via hftp:<host>:<port>/<path>. For example: hadoop distcp -i -log /distcp/logpath hftp://oldhadoop.hostname:50070/path/ /newpath In the source path there is a file with a space in its name 'Email Address.json', and while distcp is building the copy listing, it appears to fail to decode the name properly (stack trace is below). 17/11/28 16:17:45 INFO tools.DistCp: DistCp job log path: /distcp/logpath
Exception in thread "pool-5-thread-1" java.lang.AssertionError: Failed to decode URI: /path/Email Address.json
at org.apache.hadoop.util.ServletUtil.decodePath(ServletUtil.java:128)
at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.startElement(HftpFileSystem.java:446)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:465)
at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.listStatus(HftpFileSystem.java:484)
at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.listStatus(HftpFileSystem.java:492)
at org.apache.hadoop.hdfs.web.HftpFileSystem.listStatus(HftpFileSystem.java:499)
at org.apache.hadoop.tools.SimpleCopyListing$FileStatusProcessor.getFileStatus(SimpleCopyListing.java:535)
at org.apache.hadoop.tools.SimpleCopyListing$FileStatusProcessor.processItem(SimpleCopyListing.java:576)
at org.apache.hadoop.tools.util.ProducerConsumer$Worker.run(ProducerConsumer.java:190)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I was hoping that the -i (ignore) option would have ignored errors in creating the file listing as well as the copy phase, but that doesn't appear to be the case. Is there any way to exclude certain file names from the file listing, and/or other ways to possibly work around this issue?
... View more
Labels: