Support Questions
Find answers, ask questions, and share your expertise

Cannot process row that is bigger than the IO size impala

Contributor

I am getting the following warning    WARNINGS: Cannot process row that is bigger than the IO size (row_size=8.13 MB). To run this query, increase the IO size (--read_size option).   How do I increase the IO size?   I am trying to run impala with a fairly good size group_concat function    explain plan  ---------------- Estimated Per-Host Requirements: Memory=5.31GB VCores=2 F03:PLAN FRAGMENT [HASH(stf.reporting_date)] WRITE TO HDFS [zetl_processing.temp_payment_thresholds_detail_array_07day, OVERWRITE=true, PARTITION-KEYS=(stf.reporting_date)] | partitions=5263 | hosts=6 per-host-mem=418.43MB | 07:EXCHANGE [HASH(stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F02:PLAN FRAGMENT [HASH(stf.account,stf.account_role,stf.reporting_date)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=07, HASH(stf.reporting_date)] 06:AGGREGATE [FINALIZE] | output: group_concat:merge(stf.array_details) | group by: stf.account, stf.account_role, stf.reporting_date | hosts=6 per-host-mem=2.62GB | tuple-ids=3 row-size=113B cardinality=23305710 | 05:EXCHANGE [HASH(stf.account,stf.account_role,stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(stf.account,stf.account_role,stf.reporting_date)] 03:AGGREGATE | output: group_concat(concat('"tx_hash":"', tpt.tx_hash, '"', ' , "executed_time":', CAST(tpt.executed_time AS STRING), ' , "USD_Value":', CAST((tpt.delivered_amount * coalesce(weightedavg_rate_xrp, CAST(0 AS DOUBLE))) * coalesce(weighted_avg_rate_xrp_to_usd_snapswap, CAST(0 AS DOUBLE)) AS STRING), ' , "Original_Currency":"', tpt.destination_currency, '"', CASE WHEN tpt.destination_currency != 'XRP' THEN concat(' , "Original_Issuer":"', coalesce(tpt.destination_issuer, ''), '"') ELSE '' END, ' , "Original_Amount":"', CAST(tpt.delivered_amount AS STRING), '"')) | group by: tpt.account, tpt.account_role, lu7.desired_close_date | hosts=6 per-host-mem=2.70GB | tuple-ids=3 row-size=113B cardinality=23305710 | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: tpt.close_date_human = lu7.lookup_prior_7_days | hosts=6 per-host-mem=19.59KB | tuple-ids=0,1 row-size=317B cardinality=23305710 | |--04:EXCHANGE [BROADCAST] | hosts=3 per-host-mem=0B | tuple-ids=1 row-size=48B cardinality=380 | 00:SCAN HDFS [zetl_processing.temp_payment_thresholds_base tpt, RANDOM] partitions=863/863 files=863 size=1.12GB table stats: 23305710 rows total column stats: all hosts=6 per-host-mem=144.00MB tuple-ids=0 row-size=269B cardinality=23305710 F01:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=04, BROADCAST] 01:SCAN HDFS [commonlookup.lu_rolling_07day_window lu7, RANDOM] partitions=1/1 files=1 size=287.83KB predicates: lu7.desired_close_date < '20140801', lu7.desired_close_date >= '20140601' table stats: 37984 rows total column stats: all hosts=3 per-host-mem=32.00MB tuple-ids=1 row-size=48B cardinality=380 Query Info Query ID: d944a0ee63451bba:d5f663d6231b1b98 User: ubuntu Database: zETL_processing Coordinator: hdpnode2 Query Type: DML Query State: EXCEPTION Start Time: May 14, 2015 10:35:11 PM End Time: May 14, 2015 10:35:17 PM Duration: 6s Rows Produced: 0 Aggregate Peak Memory Usage: 1.7 GiB Bytes Streamed: 295.9 MiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: ubuntu Estimated per Node Peak Memory: 5.3 GiB File Formats: PARQUET/SNAPPY HDFS Average Scan Range: 1.4 MiB HDFS Bytes Read: 812.5 MiB HDFS Bytes Read From Cache: 812.5 MiB HDFS Bytes Read From Cache Percentage: 100 HDFS Bytes Written: 0 B HDFS Local Bytes Read: 812.5 MiB HDFS Local Bytes Read Percentage: 100 HDFS Read Throughput: 2.4 GiB/s HDFS Remote Bytes Read: 0 B HDFS Remote Bytes Read Percentage: 0 HDFS Short Circuit Bytes Read: 812.5 MiB HDFS Short Circuit Bytes Read Percentage: 100 Impala Version: impalad version 2.1.3-cdh5 RELEASE (build 20816e26a150d20d2c92c470aa40b342521b243e) Memory Accrual: 4,037,728,440 byte seconds Network Address: 172.30.0.99:60117 Node with Peak Memory Usage: hdpnode1:22000 Per Node Peak Memory Usage: 1.7 GiB Planning Wait Time: 37ms Planning Wait Time Percentage: 1 Pool: root.ubuntu Query Status: Cannot process row that is bigger than the IO size (row_size=10.97 MB). To run this query, increase the IO size (--read_size option). Rows Inserted: 0 Session ID: 6949ce8608cd351a:cf18a1473f815e90 Session Type: BEESWAX Statistics Missing: false Threads: CPU Time: 1.3m Threads: CPU Time Percentage: 83 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 943ms Threads: Network Send Wait Time Percentage: 1 Threads: Storage Wait Time: 14.99s Threads: Storage Wait Time Percentage: 16 Threads: Total Time: 1.6m Work CPU Time: 1.3m   Query Timeline Start execution: 36.18us (36.18us) Planning finished: 37ms (37ms) Ready to start remote fragments: 44ms (6ms) Remote fragments started: 766ms (721ms) Request finished: 5.98s (5.22s) Unregister query: 5.99s (7ms) Query Fragments

3 REPLIES 3

Re: Cannot process row that is bigger than the IO size impala

Master Collaborator

The suggested "--read_size" option is an Impalad startup option that you will have to increase to your desired size.

Please refer to these docs to see how to set those options via Cloudera Manager:

http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/v2-0-x/topics/impala_confi...

Re: Cannot process row that is bigger than the IO size impala

Contributor

thanks

Documentation is very generalized

I assume that I have to add it to

 Impala Daemon Command Line Argument Advanced Configuration Snippet (Safety Valve)

as -read_size=1000000

Re: Cannot process row that is bigger than the IO size impala

Explorer

@scratch28 wrote:

I am getting the following warning    WARNINGS: Cannot process row that is bigger than the IO size (row_size=8.13 MB). To run this query, increase the IO size (--read_size option).   How do I increase the IO size?   I am trying to run impala with a fairly good size group_concat function    explain plan  ---------------- Estimated Per-Host Requirements: Memory=5.31GB VCores=2 F03:PLAN FRAGMENT [HASH(stf.reporting_date)] WRITE TO HDFS [zetl_processing.temp_payment_thresholds_detail_array_07day, OVERWRITE=true, PARTITION-KEYS=(stf.reporting_date)] | partitions=5263 | hosts=6 per-host-mem=418.43MB | 07:EXCHANGE [HASH(stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F02:PLAN FRAGMENT [HASH(stf.account,stf.account_role,stf.reporting_date)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=07, HASH(stf.reporting_date)] 06:AGGREGATE [FINALIZE] | output: group_concat:merge(stf.array_details) | group by: stf.account, stf.account_role, stf.reporting_date | hosts=6 per-host-mem=2.62GB | tuple-ids=3 row-size=113B cardinality=23305710 | 05:EXCHANGE [HASH(stf.account,stf.account_role,stf.reporting_date)] hosts=6 per-host-mem=0B tuple-ids=3 row-size=113B cardinality=23305710 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(stf.account,stf.account_role,stf.reporting_date)] 03:AGGREGATE | output: group_concat(concat('"tx_hash":"', tpt.tx_hash, '"', ' , "executed_time":', CAST(tpt.executed_time AS STRING), ' , "USD_Value":', CAST((tpt.delivered_amount * coalesce(weightedavg_rate_xrp, CAST(0 AS DOUBLE))) * coalesce(weighted_avg_rate_xrp_to_usd_snapswap, CAST(0 AS DOUBLE)) AS STRING), ' , "Original_Currency":"', tpt.destination_currency, '"', CASE WHEN tpt.destination_currency != 'XRP' THEN concat(' , "Original_Issuer":"', coalesce(tpt.destination_issuer, ''), '"') ELSE '' END, ' , "Original_Amount":"', CAST(tpt.delivered_amount AS STRING), '"')) | group by: tpt.account, tpt.account_role, lu7.desired_close_date | hosts=6 per-host-mem=2.70GB | tuple-ids=3 row-size=113B cardinality=23305710 | 02:HASH JOIN [INNER JOIN, BROADCAST] | hash predicates: tpt.close_date_human = lu7.lookup_prior_7_days | hosts=6 per-host-mem=19.59KB | tuple-ids=0,1 row-size=317B cardinality=23305710 | |--04:EXCHANGE [BROADCAST] | hosts=3 per-host-mem=0B | tuple-ids=1 row-size=48B cardinality=380 | 00:SCAN HDFS [zetl_processing.temp_payment_thresholds_base tpt, RANDOM] partitions=863/863 files=863 size=1.12GB table stats: 23305710 rows total column stats: all hosts=6 per-host-mem=144.00MB tuple-ids=0 row-size=269B cardinality=23305710 F01:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=04, BROADCAST] 01:SCAN HDFS [commonlookup.lu_rolling_07day_window lu7, RANDOM] partitions=1/1 files=1 size=287.83KB predicates: lu7.desired_close_date < '20140801', lu7.desired_close_date >= '20140601' table stats: 37984 rows total column stats: all hosts=3 per-host-mem=32.00MB tuple-ids=1 row-size=48B cardinality=380 Query Info Query ID: d944a0ee63451bba:d5f663d6231b1b98 User: ubuntu Database: zETL_processing Coordinator: hdpnode2 Query Type: DML Query State: EXCEPTION Start Time: May 14, 2015 10:35:11 PM End Time: May 14, 2015 10:35:17 PM Duration: 6s Rows Produced: 0 Aggregate Peak Memory Usage: 1.7 GiB Bytes Streamed: 295.9 MiB Client Fetch Wait Time: 0ms Client Fetch Wait Time Percentage: 0 Connected User: ubuntu Estimated per Node Peak Memory: 5.3 GiB File Formats: PARQUET/SNAPPY HDFS Average Scan Range: 1.4 MiB HDFS Bytes Read: 812.5 MiB HDFS Bytes Read From Cache: 812.5 MiB HDFS Bytes Read From Cache Percentage: 100 HDFS Bytes Written: 0 B HDFS Local Bytes Read: 812.5 MiB HDFS Local Bytes Read Percentage: 100 HDFS Read Throughput: 2.4 GiB/s HDFS Remote Bytes Read: 0 B HDFS Remote Bytes Read Percentage: 0 HDFS Short Circuit Bytes Read: 812.5 MiB HDFS Short Circuit Bytes Read Percentage: 100 Impala Version: impalad version 2.1.3-cdh5 RELEASE (build 20816e26a150d20d2c92c470aa40b342521b243e) Memory Accrual: 4,037,728,440 byte seconds Network Address: 172.30.0.99:60117 Node with Peak Memory Usage: hdpnode1:22000 Per Node Peak Memory Usage: 1.7 GiB Planning Wait Time: 37ms Planning Wait Time Percentage: 1 Pool: root.ubuntu Query Status: Cannot process row that is bigger than the IO size (row_size=10.97 MB). To run this query, increase the IO size (--read_size option). Rows Inserted: 0 Session ID: 6949ce8608cd351a:cf18a1473f815e90 Session Type: BEESWAX Statistics Missing: false Threads: CPU Time: 1.3m Threads: CPU Time Percentage: 83 Threads: Network Receive Wait Time: 0ms Threads: Network Receive Wait Time Percentage: 0 Threads: Network Send Wait Time: 943ms Threads: Network Send Wait Time Percentage: 1 Threads: Storage Wait Time: 14.99s Threads: Storage Wait Time Percentage: 16 Threads: Total Time: 1.6m Work CPU Time: 1.3m   Query Timeline Start execution: 36.18us (36.18us) Planning finished: 37ms (37ms) Ready to start remote fragments: 44ms (6ms) Remote fragments started: 766ms (721ms) Request finished: 5.98s (5.22s) Unregister query: 5.99s (7ms) Query Fragments



Even I too get the same issue...

Pls let me know if your issue got resolved..  I am unable to understand the url info.  Let me know what and how it got fixed

 


Error:
Error during Execute
S1000(110)[Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] : Runtime Error: Query c84c7d6cb8a0327c:5ecc0ab24b4c689e: 71% Complete (26354 out of 36624)
Cannot process row that is bigger than the IO size (row_size=10.11 MB). To run this query, increase the IO size (--read_size option).
Cannot process row that is bigger than the IO size (row_size=10.11 MB). To run this query, increase the IO size (--read_size option).
Cannot process row that is bigger than the IO size (row_size=10.11 MB). T (54.96 secs)