Support Questions

subdrives · ‎01-14-2016

Hello

system: centos 6.6

Hadoop 2.5.0-cdh5.3.0

impalad version 2.1.0

java version "1.8.0_11"

Running a imple query on impala-shell I get the following error:

   Connected to n1:21000
   Query: select key from auth limit 4
   Query finished, fetching results ...
   Error communicating with impalad: TSocket read 0 bytes
   Could not execute command: select key from auth limit 4

The query is quite simple but the dataset is quite big.

impala log file reports the following error:

    ==> impalad.INFO <==
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGSEGV (0xb) at pc=0x00007f7ccbc94d20, pid=19181, tid=140170971166464
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_11-b12) (build 1.8.0_11-b12)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.11-b03 mixed mode linux-amd64 compressed oops)
    # Problematic frame:
    # C [libc.so.6+0x89d20] memcpy+0x3c0
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /var/run/cloudera-scm-agent/process/8600-impala-IMPALAD/hs_err_pid19181.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.sun.com/bugreport/crash.jsp
    #

Doen anyone have any idea on how to solve thi error?

Thanks

Tim Armstrong · ‎01-18-2016

If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.

View solution in original post

Tim Armstrong · ‎01-14-2016

It would be helpful if you had the hs_err_pid*.log file that is mentioned in the error message.

What format is the auth table? Is there anything notable about the data? E.g. large strings.

subdrives · ‎01-14-2016

Hello Tim,

Here is the link for the file:

https://drive.google.com/file/d/0B1h4gv1ES8DeOVJyR1BMZHYwNHM/view?usp=sharing

The table is stored in text format and the result should be a siries of integers

Hope this helps (I'm kind of new to haddop)

Thanks

Tim Armstrong · ‎01-14-2016

If the table is a large compressed text file, you're probably running into this issue: https://issues.cloudera.org/browse/IMPALA-2249 . We have a fix in newer versions of Impala to prevent the crash, but we don't support compressed text files of > 1GB for some compressed text file formats.

subdrives · ‎01-18-2016

Hello Tim,

Thanks.

We've actually confirmed that the datasets are compressed text files.

What would you recomend? Converting the text datasets to parquet? is this possible?

Again thanks for all your help.

Best regards,

Pedro Silva

Tim Armstrong · ‎01-18-2016

If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.

Cloudera Community

Support Questions

impala-shell returns impalad: TSocket read 0 bytes