About mauricio

mauricio · ‎10-10-2017

Another option I forgot to mention: if your table is partitioned, and your insert query uses dynamic partitioning, it will generate 1 file per partition: insert into table2 partition(par1,par2) select col1, col2 .. colN, par1, par2 from table1; ... again up to the max parquet file size currently set, so you can play with that max to achieve 2 files per partition. https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_partitioning.html#partition_static_dynamic

mauricio · ‎10-10-2017

You can do "set NUM_NODES=1" in your session (before your query), which will cause it to be processed in a single node (just in the coordinator). It will produce 1 file, up to the default max size of parquet files. You can do "set PARQUET_FILE_SIZE=XX" to fine-tune that max file size up or down until you get it split exactly into 2 files (it will take some trial and error because this is an upper bound - files are actually quite a bit smaller than the limit in my experience). But beware the docs state NUM_NODES is not for production use, especially on big tables, as it can put a lot of pressure on a single host and crash that impalad. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_query_options.html -m

mauricio · ‎09-19-2017

I see no way to specify a compression default in the create table statement, so I tried: SET COMPRESSION_CODEC=gzip;insert overwrite <text_table> select .. from <another_table> and got "Writing to compressed text table is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS to override." but ALLOW_UNSUPPORTED_FORMATS shouldn't be used according to docs. Is there a trick to having impala write text files compressed? -Mauricio

mauricio · ‎11-15-2016

I've opened IMPALA-4492. Thanks Tim. -m

mauricio · ‎11-08-2016

Thanks Tim, yeah even just having the real-time start timestamp for each fragment/instance would be very helpful to isolate consistent slow worker nodes. Should I open a JIRA for that? Quick follow up question (last one promise), here's another weird instance of that same query which happens often: relatively small (by comparison) hdfs scan of 1MB and 86K rows, all local, all from cache, and still took 8 seconds vs. about the normal 400ms in all other nodes. Since disk is not a factor here, what else could be holding up the read from cache? cluster is barely loaded at 20% cpu.

mauricio · ‎11-04-2016

I'm not sure where I go from here with this insight though... there are no actual timestamps in the fragments, only individual timings, so no idea how to find out which ones in which nodes started late. Any ideas on that, anyone? thanks, -m

mauricio · ‎11-04-2016

Thanks so much Tim. I compared summaries out of profiles from a fast and a slow run of the (basically) same query. They are pretty much the same!! however I did notice in the timeline that the slow query took 6 seconds before 'ready to start fragments' and 'all fragments started'. So I guess there were some straggler nodes, but outside of fragment processing because none of the 'Max Time' values look high: FAST: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail ---------------------------------------------------------------------------------------------------------------------------------- 20:MERGING-EXCHANGE 1 99.260us 99.260us 7 10 0 -1.00 B UNPARTITIONED 10:SORT 22 677.665us 828.653us 7 10 24.02 MB 16.00 MB 19:AGGREGATE 22 201.337ms 244.372ms 7 10 2.28 MB 10.00 MB FINALIZE 18:EXCHANGE 22 20.432us 140.472us 28 10 0 0 HASH(CASE WHEN cld.platform... 09:AGGREGATE 22 219.170ms 287.685ms 28 10 1.98 MB 10.00 MB STREAMING 08:HASH JOIN 22 7.359ms 71.569ms 153.93K 851 2.04 MB 761.00 B INNER JOIN, PARTITIONED |--17:EXCHANGE 22 7.094us 20.544us 7 761 0 0 HASH(dd.id) | 04:SCAN HDFS 1 8.870ms 8.870ms 7 761 175.15 KB 32.00 MB irdw_prod.date_dim dd 16:EXCHANGE 22 1.589ms 8.964ms 183.28K 851 0 0 HASH(event_date_local_dim_id) 07:HASH JOIN 22 200.085ms 267.669ms 183.28K 851 12.04 MB 1.71 MB INNER JOIN, PARTITIONED |--15:EXCHANGE 22 1.728ms 2.233ms 1.43M 1.43M 0 0 HASH(cld.id) | 03:SCAN HDFS 2 11.642ms 12.940ms 1.43M 1.43M 11.59 MB 64.00 MB irdw_prod.client_dim cld 14:EXCHANGE 22 1.203ms 2.218ms 183.28K 851 0 0 HASH(fact.client_dim_id) 06:HASH JOIN 22 204.626ms 256.269ms 183.28K 851 6.03 MB 428.48 KB INNER JOIN, PARTITIONED |--13:EXCHANGE 22 2.719ms 11.037ms 2.19M 2.19M 0 0 HASH(md.id) | 02:SCAN HDFS 8 4.838ms 6.003ms 2.19M 2.19M 6.11 MB 40.00 MB irdw_prod.media_dim md 12:EXCHANGE 22 825.834us 4.673ms 183.28K 851 0 0 HASH(fact.media_dim_id) 05:HASH JOIN 22 199.360ms 253.233ms 183.28K 851 2.02 MB 18.00 B INNER JOIN, BROADCAST |--11:EXCHANGE 22 8.630us 10.408us 3 2 0 0 BROADCAST | 01:SCAN HDFS 1 23.969ms 23.969ms 3 2 181.51 KB 32.00 MB irdw_prod.campaign_dim cd 00:SCAN HDFS 22 814.857ms 1s106ms 183.28K 2.86M 3.18 MB 320.00 MB irdw_prod.agg_daily_activit... Ready to start 122 remote fragments: 24,632,776 All 122 remote fragments started: 40,539,024 First dynamic filter received: 523,467,712 Rows available: 1,742,258,728 SLOW: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------------------- 20:MERGING-EXCHANGE 1 115.304us 115.304us 4 10 0 -1.00 B UNPARTITIONED 10:SORT 22 711.942us 971.640us 4 10 24.02 MB 16.00 MB 19:AGGREGATE 22 224.158ms 356.244ms 4 10 2.28 MB 10.00 MB FINALIZE 18:EXCHANGE 22 17.424us 116.992us 24 10 0 0 HASH(CASE WHEN cld.platform... 09:AGGREGATE 22 239.932ms 376.739ms 24 10 1.98 MB 10.00 MB STREAMING 08:HASH JOIN 22 9.258ms 126.508ms 7.00K 1.27K 2.04 MB 761.00 B INNER JOIN, PARTITIONED |--17:EXCHANGE 22 6.449us 14.136us 7 761 0 0 HASH(dd.id) | 04:SCAN HDFS 1 31.094ms 31.094ms 7 761 175.15 KB 32.00 MB irdw_prod.date_dim dd 16:EXCHANGE 22 313.646us 762.564us 24.74K 1.27K 0 0 HASH(event_date_local_dim_id) 07:HASH JOIN 22 222.134ms 336.441ms 24.74K 1.27K 12.04 MB 1.71 MB INNER JOIN, PARTITIONED |--15:EXCHANGE 22 2.364ms 3.331ms 1.43M 1.43M 0 0 HASH(cld.id) | 03:SCAN HDFS 2 17.363ms 21.651ms 1.43M 1.43M 11.66 MB 64.00 MB irdw_prod.client_dim cld 14:EXCHANGE 22 319.401us 541.207us 24.74K 1.27K 0 0 HASH(fact.client_dim_id) 06:HASH JOIN 22 238.946ms 399.160ms 24.74K 1.27K 6.03 MB 428.04 KB INNER JOIN, PARTITIONED |--13:EXCHANGE 22 2.509ms 3.938ms 2.19M 2.19M 0 0 HASH(md.id) | 02:SCAN HDFS 7 14.627ms 28.996ms 2.19M 2.19M 3.27 MB 48.00 MB irdw_prod.media_dim md 12:EXCHANGE 22 265.672us 600.188us 24.74K 1.27K 0 0 HASH(fact.media_dim_id) 05:HASH JOIN 22 220.025ms 363.591ms 24.74K 1.27K 2.02 MB 18.00 B INNER JOIN, BROADCAST |--11:EXCHANGE 22 12.656us 17.408us 2 2 0 0 BROADCAST | 01:SCAN HDFS 1 10.060ms 10.060ms 2 2 181.48 KB 32.00 MB irdw_prod.campaign_dim cd 00:SCAN HDFS 22 551.595ms 1s062ms 24.74K 4.26M 2.59 MB 320.00 MB irdw_prod.agg_daily_activit... Ready to start 121 remote fragments: 36,909,268 All 121 remote fragments started: 6,567,144,968 First dynamic filter received: 6,567,170,788 Rows available: 8,395,137,540

mauricio · ‎11-04-2016

Hello everyone, trying to hunt down some slow query times here. Focusing on just one type of query, it normally runs in about 1 second, but about 1/3 of the time it takes 6 or even 8 seconds. Looking at the profile of a slow run, hdfs scans are super fast (under 1 second and usually closer to 400ms) in all 22 nodes, as expected, because we're partition pruning aggresively and using hdfs caching, HOWEVER the entire 00 fragment takes a whole 6 seconds, most of it in TotalNetworkSendTime. How can I know what send time to which other plan node is the culprit, since this node (00) seems to be connected to all other nodes in the plan (query has 4 inner joins)? Using CDH 5.7 Happy to send entire profile directly, but don't want to post publicly. Thanks in advance! -m | 00:SCAN HDFS [irdw_prod.agg_daily_activity_cam_mp_client_performance fact, RANDOM] partitions=2/482 files=194 size=76.01MB runtime filters: RF000 -> event_date_local_dim_id, RF001 -> fact.client_dim_id, RF002 -> fact.media_dim_id, RF003 -> campaign_dim_id table stats: 1006261351 rows total column stats: all hosts=22 per-host-mem=320.00MB tuple-ids=0 row-size=88B cardinality=4259958

mauricio · ‎12-03-2015

Thanks Dimitris. I've commented in the ticket and upvoted it. Hope you guys can get it in progress soon!

mauricio · ‎12-01-2015

Could anyone help point me where I should look into why catalog update takes so long after an insert overwrite? From profile I can see that data was written in about 7s but it took another 107s to update catalog. Remote fragments started: 2,249,771,266 DML data written: 7,268,752,004 DML Metastore update finished: 114,757,731,166 Request finished: 114,786,664,689 ... - MetastoreUpdateTimer: 107,517,902,343 It uses dynamic partitioning, though I don't think that's a factor here: insert overwrite table action_fact_a partition(p_action_date_ym='201511',p_campaign_id_mod=5, p_publisher_id_mod) select field1, etc.... There are 10 p_publisher_id_mod partitions so 10 files generated, each only a couple MB, so no more than 30MB altogether, and I don't think any more than 10 blocks were deleted and 10 inserted. Cluster is not particularly under load and performance of this operation is pretty stable. 10 DNs. Source table is text, target (partitioned) is parquet. Much appreciated!

Online	Offline
Last Visited	‎07-05-2020 12:33 PM

Member Since	‎12-13-2013 04:54 PM
Last Visited	‎07-05-2020 12:33 PM
Posts	39
Kudos received	7

Cloudera Community

Re: Limit number of parquet files when doing an in...

Re: Limit number of parquet files when doing an in...

Re: Limit number of parquet files when doing an in...

Inserting to text table compressed

Re: Help diagnosing slow query (even though fast h...

Re: Help diagnosing slow query (even though fast h...

Re: Help diagnosing slow query (even though fast h...

Re: Help diagnosing slow query (even though fast h...

Help diagnosing slow query (even though fast hdfs ...

Re: Very slow catalog update after insert overwrit...

Very slow catalog update after insert overwrite