Member since
07-29-2015
535
Posts
140
Kudos Received
103
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6053 | 12-18-2020 01:46 PM | |
3930 | 12-16-2020 12:11 PM | |
2782 | 12-07-2020 01:47 PM | |
1987 | 12-07-2020 09:21 AM | |
1277 | 10-14-2020 11:15 AM |
01-20-2016
11:50 AM
You can create a 1-row dummy table like this: select 1 id, 'a' d from (select 1) dual where 1 = 1 You also have to rewrite the query to avoid an uncorrelated not exists. You can do something like: select 1 id, 'a' d from (select 1) dual where (select count(*) from employee where empid > 20000) = 0 Computing the count might be expensive so you could add a limit like select 1 id, 'a' d from (select 1) dual where (select count(*) from (select id from employee where empid > 20000 limit 1) emp) = 0
... View more
01-18-2016
09:27 AM
1 Kudo
If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.
... View more
01-14-2016
02:42 PM
1 Kudo
If the table is a large compressed text file, you're probably running into this issue: https://issues.cloudera.org/browse/IMPALA-2249 . We have a fix in newer versions of Impala to prevent the crash, but we don't support compressed text files of > 1GB for some compressed text file formats.
... View more
01-14-2016
10:29 AM
It would be helpful if you had the hs_err_pid*.log file that is mentioned in the error message. What format is the auth table? Is there anything notable about the data? E.g. large strings.
... View more
12-27-2015
07:23 PM
The advice in this thread is out of date: memory usage for joins and aggregations has been improved a lot in CDH5.5. Your issue is something different since the query doesn't have a join or group by (aggregation) in it. The first step to understand this better is to look at the impalad logs: there is usually some information in there about why the memory limit was exceeded and what operators were consuming memory.
... View more
12-18-2015
09:17 AM
1 Kudo
I misread your question and didn't realise you wanted the per-host peak, PerHostPeakMemUsage gives you exactly what you want.
... View more
12-18-2015
09:16 AM
You can find that information in the runtime profile. There are various ways to get it - e.g. from impala-shell you can run profile; after the query. The PerHostPeakMemUsage counter will tell you the peak memory usage for each impala instance executing the query. I think getting the numbers for each host and summing them gives you roughly what you want. 200 joins sounds like an interesting query - let us know how it goes.
... View more
12-02-2015
10:30 AM
It looks like your Impala's process memory limit is set to 1GB. Queries can't use more than the process memory limit, even if the query memory limit is set higher. To work around this you or your administrator need to restart impala with a higher process memory limit. 1GB is very low and you will run out of memory on many queries. The process memory limit is set when impala is started with the -mem_limit option to impalad. The default is 80% of the machine's physical memory. The valid options are described in impalad --help -mem_limit (Process memory limit specified as number of bytes ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'), or percentage of the physical memory ('<int>%'). Defaults to bytes if no unit is given) type: string default: "80%"
... View more
11-06-2015
09:06 AM
1 Kudo
The Impala documentation lists the supported file formats: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_file_formats.html
... View more
11-05-2015
10:46 PM
1 Kudo
The table use the ORCFile format, which we don't support in Impala. We recommend using Parquet, which is supported by Hive, MapReduce, Impala and many other systems.
... View more
- « Previous
- Next »