Created on 01-14-2016 10:23 AM - edited 09-16-2022 02:57 AM
Hello
system: centos 6.6
Hadoop 2.5.0-cdh5.3.0
impalad version 2.1.0
java version "1.8.0_11"
Running a imple query on impala-shell I get the following error:
Connected to n1:21000
Query: select key from auth limit 4
Query finished, fetching results ...
Error communicating with impalad: TSocket read 0 bytes
Could not execute command: select key from auth limit 4
The query is quite simple but the dataset is quite big.
impala log file reports the following error:
==> impalad.INFO <==
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f7ccbc94d20, pid=19181, tid=140170971166464
#
# JRE version: Java(TM) SE Runtime Environment (8.0_11-b12) (build 1.8.0_11-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.11-b03 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C [libc.so.6+0x89d20] memcpy+0x3c0
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /var/run/cloudera-scm-agent/process/8600-impala-IMPALAD/hs_err_pid19181.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
Doen anyone have any idea on how to solve thi error?
Thanks
Created 01-18-2016 09:27 AM
If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.
Created 01-14-2016 10:29 AM
It would be helpful if you had the hs_err_pid*.log file that is mentioned in the error message.
What format is the auth table? Is there anything notable about the data? E.g. large strings.
Created 01-14-2016 11:09 AM
Hello Tim,
Here is the link for the file:
https://drive.google.com/file/d/0B1h4gv1ES8DeOVJyR1BMZHYwNHM/view?usp=sharing
The table is stored in text format and the result should be a siries of integers
Hope this helps (I'm kind of new to haddop)
Thanks
Created 01-14-2016 02:42 PM
If the table is a large compressed text file, you're probably running into this issue: https://issues.cloudera.org/browse/IMPALA-2249 . We have a fix in newer versions of Impala to prevent the crash, but we don't support compressed text files of > 1GB for some compressed text file formats.
Created 01-18-2016 08:19 AM
Hello Tim,
Thanks.
We've actually confirmed that the datasets are compressed text files.
What would you recomend? Converting the text datasets to parquet? is this possible?
Again thanks for all your help.
Best regards,
Pedro Silva
Created 01-18-2016 09:27 AM
If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.