Hi, I think I have found the problem or how to avoid it. It seems that the source process that generates the files we upload have been changed. It has been changed so last line of generated CSV files foesnt have a final "\n" character (0x0A). It seem that impala doesn't like this for TEXT CSV LZO files. Editing the files and adding a final "\n" to last line, makes everything to work again. It is strange for me as a "\n" on the last line means there is another line (empty line). The team removed it because it generates problems on other part of the system and it seems impala doesn't like it. Now that I know what to look for I have found the registered issue: https://issues.apache.org/jira/browse/IMPALA-1476 But it says it is fixed in version 2.2 and I'm using version v2.2.0-cdh5 and it still happens. So, Is it really fixed on final version? Do I need to open an issue on this?
... View more
Hi, thanks for your quick response. Well, abut upgrading, not possible at the time, this is an Oracle BDA on a client with CDH distribution, and upgrading will involve upgrading from Oracle perspective that will mean: firmware, OS, hadoop, etc like last time we did it, and this cluster supports a 24x7 service. About timestamp ranges, if you see both of my queries, none of them uses a column on select clause, just the partition column, and if both queries validates all columns, then both queries should fail on a case of timestamp out of range value, and not only the one with "GROUP BY" clause. I've been doing some testing, and it seems, by what ever reason, now it doesn't like "GROUP BY" clause and many simple queries fail using it. It fails with all partitions I have tested, event the ones that worked before last friday, now they fail. And as I pointed out on my previous post, we have another Oracle BDA with exactly the same CDH distribution and impala version, were I copied the data of this partition, and in that cluster it works. So my impression is that for some reason: we have reached some limit or metadata is corrupted. When I say some limit, may be some combination of number os schemas, tables and partitions defined, and that is why since last Friday everything started to fail. When I say metadata corruption, may be tables definitions, partitions definitions or stats for those tables/partitions got corrupted and that is why it now fails. Really don't know th reason. It is very strange.
... View more
Hi, thanks for your reply. I don't think this is the case of a corrupt LZO. I have had other cases of corrupt LZO files, and they are clear on log files, and on those cases, any query against the partition that contains the corrupt lzo file fails. In this case you can see how a simple select count(1) works and with a group by fails. There are no stats for this partition so for the first query to work it needs to read the whole content of the several files in the partition. Also, as I pointed it out, I copied the files on this partition (it fails in all partitions) to another platform with the same impala version and it works. So I don't think it is a corrupt LZO case. Also, I have removed and created the partition again, and still the same error.
... View more
Hi, thanks for your reply. Trying to attach file hs_err_pid16825.log always fails or says " The contents of the attachment doesn't match its file type". I have even tried to compress it and attach the compressed file, but it doesn't work. So, how can I post the hs_err_pid16825.log file?
... View more
Hi all, since last Friday 2017-07-14, impala started to crash on most queries, but particuary on a concrete table. Before that date everything worked fine, and from that date, even some simple queries fail on data stored before that date. This is an example of queries that works and queries that doesn't: [ods1node06:21000] > select count(1) from subscriber_history_daily where partition_date = 20170701;
Query: select count(1) from subscriber_history_daily where partition_date = 20170701
| count(1) |
| 57266624 |
Fetched 1 row(s) in 13.05s
[ods1node06:21000] > select partition_date, count(1) from subscriber_history_daily where partition_date = 20170701 group by partition_date;
Query: select partition_date, count(1) from subscriber_history_daily where partition_date = 20170701 group by partition_date
[ods1node06:21000] > As you can see no response on the second one because severasl impala daemons crashed. This is what I found on the log on one of those impala daemons: #
# A fatal error has been detected by the Java Runtime Environment:
# SIGSEGV (0xb) at pc=0x00007f7dcff4c0c3, pid=16825, tid=140175760123648
# JRE version: Java(TM) SE Runtime Environment (8.0_45-b14) (build 1.8.0_45-b14)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.45-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C 0x00007f7dcff4c0c3
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
# An error report file with more information is saved as:
# If you would like to submit a bug report, please visit:
# It also fails trying to compute stats for any of the table partitions. Table has 1147 partitions with an average of 57 millions rows each, about 8Tb of data in the table. Table is a CSV LZO compressed external table. Impala has assigned 60Gb memory. I also tried to copy the data to another hadoop platform we have with same version of impala, recreate the table definition and partition, copy the data and query it, and it works perfectly, and in this impala also worked before last friday. Any indications on how to proceed to fincd out what happend and why this simple queries now fail? Thanks
... View more