Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HDFS command hdfs dfs -ls throws fatal internal error java.lang.ArrayIndexOutOfBoundsException: 1

avatar
Rising Star

Hello All,

 

I have .har file on hdfs for which I am trying to check the list of files that it archived, but getting below error on CDH 5.9.2 cluster.

 


[user1@usnbka700p ~]$ hdfs dfs -ls har:///user/user1/HDFSArchival/Output1/Archive-13-10-2017-03-10.har
-ls: Fatal internal error
java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.hadoop.fs.HarFileSystem$HarStatus.<init>(HarFileSystem.java:597)
at org.apache.hadoop.fs.HarFileSystem$HarMetaData.parseMetaData(HarFileSystem.java:1201)
at org.apache.hadoop.fs.HarFileSystem$HarMetaData.access$000(HarFileSystem.java:1098)
at org.apache.hadoop.fs.HarFileSystem.initialize(HarFileSystem.java:166)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2711)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:325)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:235)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:218)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:102)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:372)

 

However I can see size of .har file as below.

hdfs dfs -du -s -h /user/user1/HDFSArchival/Output1/Archive-13-10-2017-03-10.har
16.5 G 49.5 G /user/user1/HDFSArchival/Output1/Archive-13-10-2017-03-10.har

 

Also hdfs command hdfs dfs -ls works for other files. Kindly refer to below logs.

hdfs dfs -ls har:///user/user1/HDFSArchival/Output1/Archive-12-10-2017-07-10.har
Found 1 items
drwxr-xr-x - user1 user1 0 2017-10-12 07:12 har:///user/user1/HDFSArchival/Output1/Archive-12-10-2017-07-10.har/ArchivalTemp

 

Can you please suggest on this?

 

Thanks,

Priya

6 REPLIES 6

avatar
Expert Contributor

It looks like your har file is mal-formed. Inside the har file there is an index file called _index.

The index file is expected to be in the format of <filename> <dir> pair in each line, and the later part of the line seems to get lost.

avatar
Rising Star
Hi weichiu,

Thanks for the reply. Does that mean I need to take backup of files to har file again or not? Got to know that there are files with .tsb in two folders that I am archiving.

Please suggest.

Thanks,
Priya

avatar
Expert Contributor
If you still have the source file, try to archive it again and see if it still produces the same error. If so, I'd be interested in knowing what's inside that _index file.

avatar
Rising Star
I don;t have source file though, do I need to restore it again?

Thanks,
Priya

avatar
Expert Contributor

I reproduced the error by intentionally corrupt the _index file.

 

If you meant "restore" by unarchiving the har file with hdfs dfs -cp command, I find it returns the same AIOOBE, so you won't be able to unarchive it.

 

Your best bet is to download the _index file, manually repair it, replace the _index file, and see how it goes.

 

Meanwhile, I filed an Apache jira HADOOP-14950 to handle the AIOOBE better, but it wouldn't help fix your corrupt _index file.

avatar
Rising Star
Thanks weichiu for the help.