Welcome to the Cloudera Community

scratch28 · ‎05-14-2015

I am getting the following warning WARNINGS: Cannot process row that is bigger than the IO size (row_size=8.13 MB). To run this query, increase the IO size (--read_size option). How do I increase the IO size? I am trying to run impala with a fairly good size group_concat function explain plan ---------------- Estimated Per-Host Requirements: Memory=5.31GB VCores=2 F03:PLAN FRAGMENT [HASH(stf.reporting_date)] WRITE TO HDFS [zetl_processing.temp_payment_thresholds_detail_array_07day, OVERWRITE=true, PARTITION-KEYS=(stf.reporting_date)] | partitions=5263 | hosts=6 per-host-mem=418.43MB | 07:EXCHANGE [HASH(stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F02:PLAN FRAGMENT [HASH(stf.account,stf.account_role,stf.reporting_date)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=07, HASH(stf.reporting_date)] 06:AGGREGATE [FINALIZE] | output: group_concat:merge(stf.array_details) | group by: stf.account, stf.account_role, stf.reporting_date | hosts=6 per-host-mem=2.62GB | tuple-ids=3 row-size=113B cardinality=23305710 | 05:EXCHANGE [HASH(stf.account,stf.account_role,stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(stf.account,stf.account_role,stf.reporting_date)] 03:AGGREGATE | output: group_concat(concat('"tx_hash":"', tpt.tx_hash, '"', ' , "executed_time":', CAST(tpt.executed_time AS STRING), ' , "USD_Value":', CAST((tpt.delivered_amount * coalesce(weightedavg_rate_xrp, CAST(0 AS DOUBLE))) * coalesce(weighted_avg_rate_xrp_to_usd_snapswap, CAST(0 AS DOUBLE)) AS STRING), ' , "Original_Currency":"', tpt.destination_currency, '"', CASE WHEN tpt.destination_currency != 'XRP' THEN concat(' , "Original_Issuer":"', coalesce(tpt.destination_issuer, ''), '"') ELSE '' END, ' , "Original_Amount":"', CAST(tpt.delivered_amount AS STRING), '"')) | group by: tpt.account, tpt.account_role, lu7.desired_close_date | hosts=6 per-host-mem=2.70GB | tuple-ids=3 row-size=113B cardinality=23305710 | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: tpt.close_date_human = lu7.lookup_prior_7_days | hosts=6 per-host-mem=19.59KB | tuple-ids=0,1 row-size=317B cardinality=23305710 | |--04:EXCHANGE [BROADCAST] | hosts=3 per-host-mem=0B | tuple-ids=1 row-size=48B cardinality=380 | 00:SCAN HDFS [zetl_processing.temp_payment_thresholds_base tpt, RANDOM] partitions=863/863 files=863 size=1.12GB table stats: 23305710 rows total column stats: all hosts=6 per-host-mem=144.00MB tuple-ids=0 row-size=269B cardinality=23305710 F01:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=04, BROADCAST] 01:SCAN HDFS [commonlookup.lu_rolling_07day_window lu7, RANDOM] partitions=1/1 files=1 size=287.83KB predicates: lu7.desired_close_date < '20140801', lu7.desired_close_date >= '20140601' table stats: 37984 rows total column stats: all hosts=3 per-host-mem=32.00MB tuple-ids=1 row-size=48B cardinality=380 Query Info Query ID: d944a0ee63451bba:d5f663d6231b1b98 User: ubuntu Database: zETL_processing Coordinator: hdpnode2 Query Type: DML Query State: EXCEPTION Start Time: May 14, 2015 10:35:11 PM End Time: May 14, 2015 10:35:17 PM Duration: 6s Rows Produced: 0 Aggregate Peak Memory Usage: 1.7 GiB Bytes Streamed: 295.9 MiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: ubuntu Estimated per Node Peak Memory: 5.3 GiB File Formats: PARQUET/SNAPPY HDFS Average Scan Range: 1.4 MiB HDFS Bytes Read: 812.5 MiB HDFS Bytes Read From Cache: 812.5 MiB HDFS Bytes Read From Cache Percentage: 100 HDFS Bytes Written: 0 B HDFS Local Bytes Read: 812.5 MiB HDFS Local Bytes Read Percentage: 100 HDFS Read Throughput: 2.4 GiB/s HDFS Remote Bytes Read: 0 B HDFS Remote Bytes Read Percentage: 0 HDFS Short Circuit Bytes Read: 812.5 MiB HDFS Short Circuit Bytes Read Percentage: 100 Impala Version: impalad version 2.1.3-cdh5 RELEASE (build 20816e26a150d20d2c92c470aa40b342521b243e) Memory Accrual: 4,037,728,440 byte seconds Network Address: 172.30.0.99:60117 Node with Peak Memory Usage: hdpnode1:22000 Per Node Peak Memory Usage: 1.7 GiB Planning Wait Time: 37ms Planning Wait Time Percentage: 1 Pool: root.ubuntu Query Status: Cannot process row that is bigger than the IO size (row_size=10.97 MB). To run this query, increase the IO size (--read_size option). Rows Inserted: 0 Session ID: 6949ce8608cd351a:cf18a1473f815e90 Session Type: BEESWAX Statistics Missing: false Threads: CPU Time: 1.3m Threads: CPU Time Percentage: 83 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 943ms Threads: Network Send Wait Time Percentage: 1 Threads: Storage Wait Time: 14.99s Threads: Storage Wait Time Percentage: 16 Threads: Total Time: 1.6m Work CPU Time: 1.3m Query Timeline Start execution: 36.18us (36.18us) Planning finished: 37ms (37ms) Ready to start remote fragments: 44ms (6ms) Remote fragments started: 766ms (721ms) Request finished: 5.98s (5.22s) Unregister query: 5.99s (7ms) Query Fragments

Cloudera Community

Welcome to the Cloudera Community

Who agreed with this topic

Cannot process row that is bigger than the IO size impala