About mauricio

mauricio · ‎06-04-2018

We recently enabled hdfs caching for two tables to try and speed up a whole class of queries that are very similar, generally following this pattern: SELECT x,y,z FROM ( SELECT x,y,z FROM table1 WHERE blah UNION ALL SELECT x,y,z FROM table2 WHERE blah ) x ORDER BY x DESC, y DESC LIMIT 20001 OFFSET 0 ... but we didn't get much runtime improvement. Digging in it looks like 80% of the time is spent on CodeGen: 5.25s, of that CompileTime: 1.67s and OptimizationTime: 3.51s (see profile fragment below for this sample run). With set DISABLE_CODEGEN=true query goes from ~6 seconds to ~1 second, but docs state this should not be used generally, so hesitant to add that in actual live production reports, and would rather want to understand root cause. Both tables are parquet, fully hdfs-cached. Both are wide-ish: 253 and 126 cols respectively, but inner queries project only 20 cols to the outer. CDH 5.13 / Impala 2.10. Happy to send full profile file by direct mail. Thanks in advance, -mauricio 78:MERGING-EXCHANGE 1 5s307ms 5s307ms 73 101 0 0 UNPARTITIONED 49:TOP-N 30 341.689us 880.634us 73 101 873.00 KB 39.28 KB 00:UNION 30 240.707us 3.190ms 73 1.61K 8.81 MB 0 ... F35:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | Per-Host Resources: mem-estimate=0B mem-reservation=0B PLAN-ROOT SINK | mem-estimate=0B mem-reservation=0B | 78:MERGING-EXCHANGE [UNPARTITIONED] | order by: action_date DESC, action_id ASC | limit: 101 | mem-estimate=0B mem-reservation=0B | tuple-ids=47 row-size=398B cardinality=101 | F34:PLAN FRAGMENT [RANDOM] hosts=18 instances=18 Per-Host Resources: mem-estimate=206.48MB mem-reservation=14.44MB 49:TOP-N [LIMIT=101] | order by: action_date DESC, action_id ASC | mem-estimate=39.28KB mem-reservation=0B | tuple-ids=47 row-size=398B cardinality=101 | 00:UNION ... >>>>> F34 Fragment for a sample node (all very similar): Hdfs split stats (<volume id>:<# splits>/<split lengths>): 8:1/38.32 MB Filter 4 arrival: 5s339ms AverageThreadTokens: 1.00 BloomFilterBytes: 3.0 MiB InactiveTotalTime: 0ns PeakMemoryUsage: 27.4 MiB PeakReservation: 14.4 MiB PeakUsedReservation: 0 B PerHostPeakMemUsage: 58.6 MiB RowsProduced: 1 TotalNetworkReceiveTime: 261.74us TotalNetworkSendTime: 313.68us TotalStorageWaitTime: 4.96us TotalThreadsInvoluntaryContextSwitches: 583 TotalThreadsTotalWallClockTime: 5.37s TotalThreadsSysTime: 53ms TotalThreadsUserTime: 5.20s TotalThreadsVoluntaryContextSwitches: 169 TotalTime: 5.43s >> Fragment Instance Lifecycle Timings (0ns) >> DataStreamSender (dst_id=78) (1ms) >> CodeGen (5.25s) CodegenTime: 26ms CompileTime: 1.67s <<<<<<<<<<<<< ???? InactiveTotalTime: 0ns LoadTime: 0ns ModuleBitcodeSize: 1.9 MiB NumFunctions: 729 NumInstructions: 35,078 OptimizationTime: 3.51s <<<<<<<<<<<<< ???? PeakMemoryUsage: 17.1 MiB PrepareTime: 66ms TotalTime: 5.25s >> SORT_NODE (id=49) (94ms) >> UNION_NODE (id=0) (93ms) >> HASH_JOIN_NODE (id=48) (9ms)

mauricio · ‎05-22-2018

Can anyone explain whatis RowBatchQueueGetWaitTime? I'm looking into a slow-ish query that is taking 2 to 3 seconds to do hdfs scan on most nodes and I don't see why it should take that long: 3 or so files per node only a couple K each, cached (and confirmed all read from cache). The only thing that looks odd is this metric. Here's a sample relevant profile fragment (about same for all executors): >>> HDFS_SCAN_NODE (id=0) (1.96s) Hdfs split stats (<volume id>:<# splits>/<split lengths>): 18:1/69.15 KB 20:2/142.83 KB ExecOption: PARQUET Codegen Enabled, Codegen enabled: 3 out of 3 Runtime filters: Not all filters arrived (arrived: [1], missing [0]), waited for 352ms Hdfs Read Thread Concurrency Bucket: 0:100% 1:0% 2:0% 3:0% 4:0% 5:0% 6:0% 7:0% 8:0% 9:0% 10:0% 11:0% 12:0% 13:0% 14:0% 15:0% File Formats: PARQUET/SNAPPY:156 AverageHdfsReadThreadConcurrency: 0.00 AverageScannerThreadConcurrency: 1.00 BytesRead: 228.0 KiB BytesReadDataNodeCache: 228.0 KiB BytesReadLocal: 228.0 KiB BytesReadRemoteUnexpected: 0 B BytesReadShortCircuit: 228.0 KiB CachedFileHandle**bleep**Count: 0 CachedFileHandlesMissCount: 159 DecompressionTime: 188.47us InactiveTotalTime: 0ns MaxCompressedTextFileLength: 0 B NumColumns: 52 NumDictFilteredRowGroups: 0 NumDisksAccessed: 0 NumRowGroups: 3 NumScannerThreadsStarted: 1 NumScannersWithNoReads: 0 NumStatsFilteredRowGroups: 0 PeakMemoryUsage: 499.3 KiB PerReadThreadRawHdfsThroughput: 0 B/s RemoteScanRanges: 0 RowBatchQueueGetWaitTime: 1.60s RowBatchQueuePutWaitTime: 0ns RowsRead: 426 RowsReturned: 2 RowsReturnedRate: 1 per second ScanRangesComplete: 3 ScannerThreadsInvoluntaryContextSwitches: 8 ScannerThreadsTotalWallClockTime: 1.89s MaterializeTupleTime(*): 16ms ScannerThreadsSysTime: 10ms ScannerThreadsUserTime: 73ms ScannerThreadsVoluntaryContextSwitches: 393 TotalRawHdfsReadTime(*): 0ns TotalReadThroughput: 88.4 KiB/s TotalTime: 1.96s >>> Filter 0 (1.00 MB) (0ns) InactiveTotalTime: 0ns Rows processed: 0 Rows rejected: 0 Rows total: 426 TotalTime: 0ns >>> Filter 1 (1.00 MB) (0ns) InactiveTotalTime: 0ns Rows processed: 426 Rows rejected: 424 Rows total: 426 TotalTime: 0ns Thanks in advance! -m

mauricio · ‎03-27-2018

Thanks very much Tim for looking up the JIRA. Yikes, been open since 2014. As John pointed out there, column order info must be in the metastore since hive show create table displays fine, so seems like this should be a simple change to how impala reads that info. Upvoted the JIRA.

mauricio · ‎03-26-2018

When I create an impala/hive table over an hbase table, the columns in Impala appear in alphabetical order instead of as defined. Not a blocker, but really annoying and might become an issue down the road. Anyone know what could be happening? We're on CDH 5.13, thanks. Defined in hbase with: create 'irdw_sandbox:date_dim', {NAME => 'mcf', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'true', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'PREFIX_TREE', COMPRESSION => 'SNAPPY', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} Defined in beehive with: CREATE EXTERNAL TABLE irdw_sandbox.hbase_date_dim ( key STRING, id INT, sqldate TIMESTAMP, year INT, quarter_of_year INT, month_of_year INT, week_of_year INT, day_of_year INT, day_name STRING, day_of_week INT, day_of_month INT, day_type STRING, month_name STRING, week_of_year_mtos INT ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos" ) TBLPROPERTIES("hbase.table.name" = "irdw_sandbox:date_dim") and a show create table in hive looks fine: CREATE EXTERNAL TABLE `hbase_date_dim`( `key` string COMMENT 'from deserializer', `id` int COMMENT 'from deserializer', `sqldate` timestamp COMMENT 'from deserializer', `year` int COMMENT 'from deserializer', `quarter_of_year` int COMMENT 'from deserializer', `month_of_year` int COMMENT 'from deserializer', `week_of_year` int COMMENT 'from deserializer', `day_of_year` int COMMENT 'from deserializer', `day_name` string COMMENT 'from deserializer', `day_of_week` int COMMENT 'from deserializer', `day_of_month` int COMMENT 'from deserializer', `day_type` string COMMENT 'from deserializer', `month_name` string COMMENT 'from deserializer', `week_of_year_mtos` int COMMENT 'from deserializer') ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos', 'serialization.format'='1') TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'hbase.table.name'='irdw_sandbox:date_dim', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0', but a show create table in impala (after invalidate metadata to recognize the new table) doesn't: CREATE EXTERNAL TABLE irdw_sandbox.hbase_date_dim ( key STRING, day_name STRING, day_of_month INT, day_of_week INT, day_of_year INT, day_type STRING, id INT, month_name STRING, month_of_year INT, quarter_of_year INT, sqldate TIMESTAMP, week_of_year INT, week_of_year_mtos INT, year INT ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping'=':key,mcf:id,mcf:sqldate,mcf:year,mcf:quarter_of_year,mcf:month_of_year,mcf:week_of_year,mcf:day_of_year,mcf:day_name,mcf:day_of_week,mcf:day_of_month,mcf:day_type,mcf:month_name,mcf:week_of_year_mtos', 'serialization.format'='1') TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'hbase.table.name'='irdw_sandbox:date_dim', 'numFiles'='0', 'numRows'='-1', 'rawDataSize'='-1', 'storage_handler'='org.apache.hadoop.hive.hbase.HBaseStorageHandler', 'totalSize'='0')

mauricio · ‎01-01-2018

So this query hangs the daemon but other queries run fine? Oh wow that's strange, I would expect it to fail with out-of-memory error or something. Do you have really low memory on the daemon, like under 16GB? You should review the daemon INFO log under /var/log/impalad and search for the query id, see how it progresses and where it gets stuck (you can compare entries vs. another query that runs fine). You'll see a lot of info about fragments being setup and distributed.

mauricio · ‎12-31-2017

Well if you can't access the impala UI on that node then you have bigger problems than that query. Perhaps your impalad is hung? or maybe you have a firewall or network policy that is not allowing you to access that port? Could you first of all try restarting that impalad?

mauricio · ‎12-31-2017

For us it didn't appear to be any particular table having too many files or partitions, but rather the catalog tracking too many of them overall. So definitely compact the most fragmented ones to start with, but the goal is to lower total files. We use impala itself, doing an insert overwrite in place. This does result in a short outage as queries will fail for a few seconds on that table (if reading the same partitions being overwritten), so we schedule this late at night. For a typical table partitioned by event_date_yearmonth, and account_id_mod (i.e. account_id % 10), we typically will compact the current month (which has data coming in throughout the day so many new small files) with: insert overwrite sometable partition(event_date_yearmonth, and account_id_mod) select * from sometable where event_date_yearmonth = '201712' This will result in 1 file in each partition (or more if partition's data bigger than block size), and all account_id_mod partitions for event_date_yearmonth 201712 will be rewritten, while other months will not be touched. Notice I didn't specify partition values in the partition clause, so it's fully dynamic, and therefore the * works ( * returns all the schema columns AND the partition columns). Caution though: having 1 file per partition will decrease parallelism and query performance if common use case is to read a single partition at a time. If so, you can set PARQUET_FILE_SIZE before the insert, to create files in each partition smaller than the defaul 128m.

mauricio · ‎12-30-2017

How are you submitting your query? If through impala-shell then you should see something like "Query progress can be monitored at: http://[coordinator_hostname]:25000/query_plan?query_id=984ed18511f4ae82:9ccc11c300000000 " and you could go there to see its progress. Or you can start impala-shell with --live_summary and see progress of each fragment in realtime. If through odbc/jdbc, and you're specifying a node directly (not through haproxy), then you can go directly here for that node: http://[coordinator_hostname]:25000/queries and see any queries running there, even if for some reason they're not coming up in CM. https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_live_summary.html#live_summary

mauricio · ‎12-30-2017

We had this issue too (on different versions most recently with 5.10) and tried many things over a 6 month period: - Fiddled with catalog server config, such as -num_metadata_loading_threads=32 in the advanced config snippet. - Increased catalog memory to 24GB I think, but since not helpful ended up going back to 16GB - Refactored many of our jobs to drastically lower the number of refreshes and invalidate metadatas - Went from doing a daily compute stats on our tables to weekly on Saturdays. All was pretty much to no avail, then we noticed some tables were not being defragmented by our maintenance jobs and had upwards of 100K files (each!). We fixed that and started compacting others more aggressively, so our tables went from having over 2 million total files to about 1.4 That did the trick. No more long catalog metadata operations. Hope that helps.

mauricio · ‎11-14-2017

We often need to restart a node to do some quick maintenance, such as reconfiguring a disk or changing an OS setting which requires a machine restart. Also, we use Impala not only for interactive user queries but also for many of our ETL job queries, and these slightly longer queries of course die if a single node processing fragments becomes unavailable, killing the corresponding job, exposing us to data corruption. Therefore we are always forced to pause all our jobs. I know we can gracefully decommission a node, but it can take hours to move all the dfs data out and then back in, so not worth it when trying to do a quick restart. So is there a way (via CM, shell or API) to tell an impalad to simply stop taking new fragments in preparation for a restart? (we can also easily remove from haproxy so it doesn't take new queries as coordinator). thanks!

Online	Offline
Last Visited	‎07-05-2020 12:33 PM

Member Since	‎12-13-2013 04:54 PM
Last Visited	‎07-05-2020 12:33 PM
Posts	39
Kudos received	7

Cloudera Community

Re: Limit number of parquet files when doing an in...

Very slow CodeGen taking 80% of runtime

Meaning of RowBatchQueueGetWaitTime metric

Re: Impala-over-hbase table has columns in alphabe...

Impala-over-hbase table has columns in alphabetica...

Re: Debugging query in Impala

Re: Debugging query in Impala

Re: Locking in Impala

Re: Debugging query in Impala

Re: Locking in Impala

How to gracefully stop an impalad?