About Tim Armstrong

Tim Armstrong · ‎01-20-2016

You can create a 1-row dummy table like this: select 1 id, 'a' d from (select 1) dual where 1 = 1 You also have to rewrite the query to avoid an uncorrelated not exists. You can do something like: select 1 id, 'a' d from (select 1) dual where (select count(*) from employee where empid > 20000) = 0 Computing the count might be expensive so you could add a limit like select 1 id, 'a' d from (select 1) dual where (select count(*) from (select id from employee where empid > 20000 limit 1) emp) = 0

Tim Armstrong · ‎01-18-2016

If you can switch to Parquet, that's probably the best solution: it's generally the most performant file format for reading and produces the smallest file sizes. If for some reason you need to stick with text, the uncompressed data size needs to be < 1GB per file.

Tim Armstrong · ‎01-14-2016

If the table is a large compressed text file, you're probably running into this issue: https://issues.cloudera.org/browse/IMPALA-2249 . We have a fix in newer versions of Impala to prevent the crash, but we don't support compressed text files of > 1GB for some compressed text file formats.

Tim Armstrong · ‎01-14-2016

It would be helpful if you had the hs_err_pid*.log file that is mentioned in the error message. What format is the auth table? Is there anything notable about the data? E.g. large strings.

Tim Armstrong · ‎12-27-2015

The advice in this thread is out of date: memory usage for joins and aggregations has been improved a lot in CDH5.5. Your issue is something different since the query doesn't have a join or group by (aggregation) in it. The first step to understand this better is to look at the impalad logs: there is usually some information in there about why the memory limit was exceeded and what operators were consuming memory.

Tim Armstrong · ‎12-18-2015

I misread your question and didn't realise you wanted the per-host peak, PerHostPeakMemUsage gives you exactly what you want.

Tim Armstrong · ‎12-18-2015

You can find that information in the runtime profile. There are various ways to get it - e.g. from impala-shell you can run profile; after the query. The PerHostPeakMemUsage counter will tell you the peak memory usage for each impala instance executing the query. I think getting the numbers for each host and summing them gives you roughly what you want. 200 joins sounds like an interesting query - let us know how it goes.

Tim Armstrong · ‎12-02-2015

It looks like your Impala's process memory limit is set to 1GB. Queries can't use more than the process memory limit, even if the query memory limit is set higher. To work around this you or your administrator need to restart impala with a higher process memory limit. 1GB is very low and you will run out of memory on many queries. The process memory limit is set when impala is started with the -mem_limit option to impalad. The default is 80% of the machine's physical memory. The valid options are described in impalad --help -mem_limit (Process memory limit specified as number of bytes ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'), or percentage of the physical memory ('<int>%'). Defaults to bytes if no unit is given) type: string default: "80%"

Tim Armstrong · ‎11-06-2015

The Impala documentation lists the supported file formats: http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/impala_file_formats.html

Tim Armstrong · ‎11-05-2015

The table use the ORCFile format, which we don't support in Impala. We recommend using Parquet, which is supported by Hive, MapReduce, Impala and many other systems.

Online	Offline
Last Visited	‎02-11-2021 06:07 PM

Member Since	‎07-29-2015 04:07 PM
Last Visited	‎02-11-2021 06:07 PM
Posts	535
Kudos received	140

Cloudera Community

Re: Impala Queries which were previously working a...

Re: Impala queries are not distributing to all the...

Re: impala - `recover partitions` points to old da...

Re: impala catalog server JVM

Re: Impala - On-demand metadata

Re: impala alternative to oracle dual table

Re: impala-shell returns impalad: TSocket read 0 b...

Re: impala-shell returns impalad: TSocket read 0 b...

Re: impala-shell returns impalad: TSocket read 0 b...

Re: Backend 6:Memory Limit Exceeded" in impala 2 (...

Re: Overall peak memory usage of a query?

Re: Overall peak memory usage of a query?

Re: "Memory Limit Exceeded" error on Impala when i...

Re: Unable to run query select * from <table_name>

Re: Unable to run query select * from <table_name>