Welcome to the Cloudera Community

Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Who agreed with this topic

Cannot process row that is bigger than the IO size impala

avatar
Contributor

I am getting the following warning    WARNINGS: Cannot process row that is bigger than the IO size (row_size=8.13 MB). To run this query, increase the IO size (--read_size option).   How do I increase the IO size?   I am trying to run impala with a fairly good size group_concat function    explain plan  ---------------- Estimated Per-Host Requirements: Memory=5.31GB VCores=2 F03:PLAN FRAGMENT [HASH(stf.reporting_date)] WRITE TO HDFS [zetl_processing.temp_payment_thresholds_detail_array_07day, OVERWRITE=true, PARTITION-KEYS=(stf.reporting_date)] | partitions=5263 | hosts=6 per-host-mem=418.43MB | 07:EXCHANGE [HASH(stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F02:PLAN FRAGMENT [HASH(stf.account,stf.account_role,stf.reporting_date)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=07, HASH(stf.reporting_date)] 06:AGGREGATE [FINALIZE] | output: group_concat:merge(stf.array_details) | group by: stf.account, stf.account_role, stf.reporting_date | hosts=6 per-host-mem=2.62GB | tuple-ids=3 row-size=113B cardinality=23305710 | 05:EXCHANGE [HASH(stf.account,stf.account_role,stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(stf.account,stf.account_role,stf.reporting_date)] 03:AGGREGATE | output: group_concat(concat('"tx_hash":"', tpt.tx_hash, '"', ' , "executed_time":', CAST(tpt.executed_time AS STRING), ' , "USD_Value":', CAST((tpt.delivered_amount * coalesce(weightedavg_rate_xrp, CAST(0 AS DOUBLE))) * coalesce(weighted_avg_rate_xrp_to_usd_snapswap, CAST(0 AS DOUBLE)) AS STRING), ' , "Original_Currency":"', tpt.destination_currency, '"', CASE WHEN tpt.destination_currency != 'XRP' THEN concat(' , "Original_Issuer":"', coalesce(tpt.destination_issuer, ''), '"') ELSE '' END, ' , "Original_Amount":"', CAST(tpt.delivered_amount AS STRING), '"')) | group by: tpt.account, tpt.account_role, lu7.desired_close_date | hosts=6 per-host-mem=2.70GB | tuple-ids=3 row-size=113B cardinality=23305710 | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: tpt.close_date_human = lu7.lookup_prior_7_days | hosts=6 per-host-mem=19.59KB | tuple-ids=0,1 row-size=317B cardinality=23305710 | |--04:EXCHANGE [BROADCAST] | hosts=3 per-host-mem=0B | tuple-ids=1 row-size=48B cardinality=380 | 00:SCAN HDFS [zetl_processing.temp_payment_thresholds_base tpt, RANDOM] partitions=863/863 files=863 size=1.12GB table stats: 23305710 rows total column stats: all hosts=6 per-host-mem=144.00MB tuple-ids=0 row-size=269B cardinality=23305710 F01:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=04, BROADCAST] 01:SCAN HDFS [commonlookup.lu_rolling_07day_window lu7, RANDOM] partitions=1/1 files=1 size=287.83KB predicates: lu7.desired_close_date < '20140801', lu7.desired_close_date >= '20140601' table stats: 37984 rows total column stats: all hosts=3 per-host-mem=32.00MB tuple-ids=1 row-size=48B cardinality=380 Query Info Query ID: d944a0ee63451bba:d5f663d6231b1b98 User: ubuntu Database: zETL_processing Coordinator: hdpnode2 Query Type: DML Query State: EXCEPTION Start Time: May 14, 2015 10:35:11 PM End Time: May 14, 2015 10:35:17 PM Duration: 6s Rows Produced: 0 Aggregate Peak Memory Usage: 1.7 GiB Bytes Streamed: 295.9 MiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: ubuntu Estimated per Node Peak Memory: 5.3 GiB File Formats: PARQUET/SNAPPY HDFS Average Scan Range: 1.4 MiB HDFS Bytes Read: 812.5 MiB HDFS Bytes Read From Cache: 812.5 MiB HDFS Bytes Read From Cache Percentage: 100 HDFS Bytes Written: 0 B HDFS Local Bytes Read: 812.5 MiB HDFS Local Bytes Read Percentage: 100 HDFS Read Throughput: 2.4 GiB/s HDFS Remote Bytes Read: 0 B HDFS Remote Bytes Read Percentage: 0 HDFS Short Circuit Bytes Read: 812.5 MiB HDFS Short Circuit Bytes Read Percentage: 100 Impala Version: impalad version 2.1.3-cdh5 RELEASE (build 20816e26a150d20d2c92c470aa40b342521b243e) Memory Accrual: 4,037,728,440 byte seconds Network Address: 172.30.0.99:60117 Node with Peak Memory Usage: hdpnode1:22000 Per Node Peak Memory Usage: 1.7 GiB Planning Wait Time: 37ms Planning Wait Time Percentage: 1 Pool: root.ubuntu Query Status: Cannot process row that is bigger than the IO size (row_size=10.97 MB). To run this query, increase the IO size (--read_size option). Rows Inserted: 0 Session ID: 6949ce8608cd351a:cf18a1473f815e90 Session Type: BEESWAX Statistics Missing: false Threads: CPU Time: 1.3m Threads: CPU Time Percentage: 83 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 943ms Threads: Network Send Wait Time Percentage: 1 Threads: Storage Wait Time: 14.99s Threads: Storage Wait Time Percentage: 16 Threads: Total Time: 1.6m Work CPU Time: 1.3m   Query Timeline Start execution: 36.18us (36.18us) Planning finished: 37ms (37ms) Ready to start remote fragments: 44ms (6ms) Remote fragments started: 766ms (721ms) Request finished: 5.98s (5.22s) Unregister query: 5.99s (7ms) Query Fragments

Who agreed with this topic