Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

impala-shell returns impalad: TSocket read 0 bytes

avatar
New Contributor

Hello

 

system: centos 6.6

Hadoop 2.5.0-cdh5.3.0

impalad version 2.1.0

java version "1.8.0_11"

 

Running a imple query on impala-shell I get the following error:

   Connected to n1:21000
   Query: select key from auth limit 4
   Query finished, fetching results ...
   Error communicating with impalad: TSocket read 0 bytes
   Could not execute command: select key from auth limit 4

 

The query is quite simple but the dataset is quite big.

impala log file reports the following error:

    ==> impalad.INFO <==
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007f7ccbc94d20, pid=19181, tid=140170971166464
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_11-b12) (build 1.8.0_11-b12)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.11-b03 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C  [libc.so.6+0x89d20]  memcpy+0x3c0
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /var/run/cloudera-scm-agent/process/8600-impala-IMPALAD/hs_err_pid19181.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.sun.com/bugreport/crash.jsp
    #
   

Doen anyone have any idea on how to solve thi error?

 

Thanks

 

1 ACCEPTED SOLUTION

avatar

If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.

View solution in original post

5 REPLIES 5

avatar

It would be helpful if you had the hs_err_pid*.log file that is mentioned in the error message.

 

What format is the auth table? Is there anything notable about the data? E.g. large strings.

avatar
New Contributor

Hello Tim,

 

Here is the link for the file:

https://drive.google.com/file/d/0B1h4gv1ES8DeOVJyR1BMZHYwNHM/view?usp=sharing

 

The table is stored in text format and the result should be a siries of integers

 

Hope this helps (I'm kind of new to haddop)

 

Thanks

avatar

If the table is a large compressed text file, you're probably running into this issue: https://issues.cloudera.org/browse/IMPALA-2249 . We have a fix in newer versions of Impala to prevent the crash, but we don't support compressed text files of > 1GB for some compressed text file formats.

avatar
New Contributor

Hello Tim,

 

Thanks.

We've actually confirmed that the datasets are compressed text files.

 

What would you recomend? Converting the text datasets to parquet? is this possible?

 

Again thanks for all your help.

 

Best regards,

Pedro Silva

avatar

If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.