About lizhenmxcz

lizhenmxcz · ‎04-29-2016

can you give me some advise to benchmark the impala? include the tools, dataset etc.

lizhenmxcz · ‎04-28-2016

tim, thank you very much. I think i should try another benchmark data

lizhenmxcz · ‎04-28-2016

Tim,thank you for reply, yes ,I see i really have the data skew.But how can i to avoid it? By the way, how can i get the details of the partitioned join?

lizhenmxcz · ‎04-26-2016

Hi: i use three node with 64 memory to run join operation. But the the memory of one node is exceed no matter i set the mem_limit properties with the values of -1B,60G,50G or 40G. While other nodes only use 10G+ memory. Here is the part of profiles: Query (id=6043a6bbde2f997a:3b3072b5dcfd1d89): Summary: Session ID: 4141715fae32f3b5:5d20c4f989cea7b8 Session Type: BEESWAX Start Time: 2016-04-26 18:11:31.958572000 End Time: 2016-04-27 04:19:46.109358000 Query Type: DML Query State: EXCEPTION Query Status: Memory limit exceeded Cannot perform hash join at node with id 2. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 8. Number of rows 2052516353. Impala Version: impalad version 2.3.0-cdh5.5.0 RELEASE (build 0c891d79aa38f297d244855a32f1e17280e2129b) User: root Connected User: root Delegated User: Network Address: ::ffff:192.168.55.247:40383 Default Db: default Sql Statement: insert into table result select parquet_bigdatabench_dw_order_300g.buyer_id,sum(parquet_bigdatabench_dw_item_300g.goods_amount) as total from parquet_bigdatabench_dw_order_300g join [shuffle] parquet_bigdatabench_dw_item_300g on parquet_bigdatabench_dw_item_300g.order_id = parquet_bigdatabench_dw_order_300g.order_id group by parquet_bigdatabench_dw_order_300g.buyer_id limit 10 Coordinator: bigdata3:22000 Plan: ---------------- Estimated Per-Host Requirements: Memory=39.27GB VCores=2 F04:PLAN FRAGMENT [UNPARTITIONED] WRITE TO HDFS [default.result, OVERWRITE=false] | partitions=1 | hosts=1 per-host-mem=unavailable | 08:EXCHANGE [UNPARTITIONED] limit: 10 hosts=3 per-host-mem=unavailable tuple-ids=2 row-size=12B cardinality=10 F03:PLAN FRAGMENT [HASH(parquet_bigdatabench_dw_order_300g.buyer_id)] DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=08, UNPARTITIONED] 07:AGGREGATE [FINALIZE] | output: sum:merge(parquet_bigdatabench_dw_item_300g.goods_amount) | group by: parquet_bigdatabench_dw_order_300g.buyer_id | limit: 10 | hosts=3 per-host-mem=9.21GB | tuple-ids=2 row-size=12B cardinality=2247426048 | 06:EXCHANGE [HASH(parquet_bigdatabench_dw_order_300g.buyer_id)] hosts=3 per-host-mem=0B tuple-ids=2 row-size=12B cardinality=2247426048 F02:PLAN FRAGMENT [HASH(parquet_bigdatabench_dw_item_300g.order_id)] DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, HASH(parquet_bigdatabench_dw_order_300g.buyer_id)] 03:AGGREGATE | output: sum(parquet_bigdatabench_dw_item_300g.goods_amount) | group by: parquet_bigdatabench_dw_order_300g.buyer_id | hosts=3 per-host-mem=27.63GB | tuple-ids=2 row-size=12B cardinality=2247426048 | 02:HASH JOIN [INNER JOIN, PARTITIONED] | hash predicates: parquet_bigdatabench_dw_item_300g.order_id = parquet_bigdatabench_dw_order_300g.order_id | hosts=3 per-host-mem=11.47GB | tuple-ids=1,0 row-size=20B cardinality=4103971316 | |--05:EXCHANGE [HASH(parquet_bigdatabench_dw_order_300g.order_id)] | hosts=3 per-host-mem=0B | tuple-ids=0 row-size=8B cardinality=4200000000 | 04:EXCHANGE [HASH(parquet_bigdatabench_dw_item_300g.order_id)] hosts=3 per-host-mem=0B tuple-ids=1 row-size=12B cardinality=4200000000 F01:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(parquet_bigdatabench_dw_order_300g.order_id)] 00:SCAN HDFS [default.parquet_bigdatabench_dw_order_300g, RANDOM] partitions=1/1 files=87 size=21.15GB table stats: 4200000000 rows total column stats: all hosts=3 per-host-mem=176.00MB tuple-ids=0 row-size=8B cardinality=4200000000 F00:PLAN FRAGMENT [RANDOM] DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(parquet_bigdatabench_dw_item_300g.order_id)] 01:SCAN HDFS [default.parquet_bigdatabench_dw_item_300g, RANDOM] partitions=1/1 files=258 size=63.82GB table stats: 4200000000 rows total column stats: all hosts=3 per-host-mem=176.00MB tuple-ids=1 row-size=12B cardinality=4200000000 ---------------- Estimated Per-Host Mem: 42170573209 Estimated Per-Host VCores: 2 Admission result: Admitted immediately Request Pool: root.root ExecSummary: Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail --------------------------------------------------------------------------------------------------------------------------- 08:EXCHANGE 1 10h8m 10h8m 0 10 0 -1.00 B UNPARTITIONED 07:AGGREGATE 3 135.686ms 407.60ms 0 2.25B 155.03 MB 9.21 GB FINALIZE 06:EXCHANGE 3 15.975ms 24.767ms 1.20M 2.25B 0 0 HASH(parquet_bigdatabench_d... 03:AGGREGATE 3 887.849ms 1s340ms 1.20M 2.25B 155.02 MB 27.63 GB 02:HASH JOIN 3 3h19m 9h53m 1.50M 4.10B 31.46 GB 11.47 GB INNER JOIN, PARTITIONED |--05:EXCHANGE 3 1m2s 2m5s 4.20B 4.20B 0 0 HASH(parquet_bigdatabench_d... | 00:SCAN HDFS 3 12s695ms 16s494ms 4.20B 4.20B 485.76 MB 176.00 MB default.parquet_bigdatabenc... 04:EXCHANGE 3 59s722ms 2m59s 4.20B 4.20B 0 0 HASH(parquet_bigdatabench_d... 01:SCAN HDFS 3 14s341ms 19s831ms 4.20B 4.20B 205.20 MB 176.00 MB default.parquet_bigdatabenc...

lizhenmxcz · ‎04-11-2016

hi: The datanode has three dir to store the data such as /data/1,/data/2, /data/3. I delete the /data/1 in the datanode by mistake. Then the hdfs shows missing blocks. I copy the data from /data/3 to /data/1 , but it didn't work. Thanks for regard leezy

lizhenmxcz · ‎04-08-2016

hi: my cluster is down when i change the namenode dir, but when i recover it , the hive table is the metastore,but it's location is mistake and i cannot select any rows. Then i want change the location of the table, but it show me the error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow and the hive metastore error logs in the follow: 2016-04-08 16:35:19,101 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-3-thread-134]: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5417) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3452) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_cascade(HiveMetaStore.java:3404) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) at com.sun.proxy.$Proxy12.alter_table_with_cascade(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_cascade.getResult(ThriftHiveMetastore.java:9400) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_cascade.getResult(ThriftHiveMetastore.java:9384) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.<init>(Path.java:116) at org.apache.hadoop.fs.Path.<init>(Path.java:94) at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:188) at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:542) at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:179) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:237) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3432) ... 21 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow at java.net.URI.checkPath(URI.java:1804) at java.net.URI.<init>(URI.java:752) at org.apache.hadoop.fs.Path.initialize(Path.java:203) ... 28 more

lizhenmxcz · ‎12-09-2015

i user cm version 5.4.5, after i reboot the cluster, i also has the same problem. The admin console output "Request to the Service Monitor failed. This may cause slow page responses. View the status of the Service Monitor.", who can solve it

Online	Offline
Last Visited	‎05-02-2016 09:35 PM

Member Since	‎12-09-2015 11:24 PM
Last Visited	‎05-02-2016 09:35 PM
Posts	8

Cloudera Community

Re: impala memory limit exceed

Re: impala memory limit exceed

Re: impala memory limit exceed

impala memory limit exceed

how to recover missing blocks of hdfs after delete...

Hive Metastore MetaException: java.lang.IllegalArg...

Re: CM5 Service Monitor Failed & Could Not Connect...