Member since
12-09-2015
8
Posts
0
Kudos Received
0
Solutions
04-29-2016
02:02 AM
can you give me some advise to benchmark the impala? include the tools, dataset etc.
... View more
04-28-2016
10:23 PM
tim, thank you very much. I think i should try another benchmark data
... View more
04-28-2016
02:43 AM
Tim,thank you for reply, yes ,I see i really have the data skew.But how can i to avoid it? By the way, how can i get the details of the partitioned join?
... View more
04-26-2016
07:31 PM
Hi:
i use three node with 64 memory to run join operation. But the the memory of one node is exceed no matter i set the mem_limit properties with the values of -1B,60G,50G or 40G. While other nodes only use 10G+ memory. Here is the part of profiles:
Query (id=6043a6bbde2f997a:3b3072b5dcfd1d89):
Summary:
Session ID: 4141715fae32f3b5:5d20c4f989cea7b8
Session Type: BEESWAX
Start Time: 2016-04-26 18:11:31.958572000
End Time: 2016-04-27 04:19:46.109358000
Query Type: DML
Query State: EXCEPTION
Query Status:
Memory limit exceeded
Cannot perform hash join at node with id 2. Repartitioning did not reduce the size of a spilled partition. Repartitioning level 8. Number of rows 2052516353.
Impala Version: impalad version 2.3.0-cdh5.5.0 RELEASE (build 0c891d79aa38f297d244855a32f1e17280e2129b)
User: root
Connected User: root
Delegated User:
Network Address: ::ffff:192.168.55.247:40383
Default Db: default
Sql Statement: insert into table result select parquet_bigdatabench_dw_order_300g.buyer_id,sum(parquet_bigdatabench_dw_item_300g.goods_amount) as total from parquet_bigdatabench_dw_order_300g join [shuffle] parquet_bigdatabench_dw_item_300g on parquet_bigdatabench_dw_item_300g.order_id = parquet_bigdatabench_dw_order_300g.order_id group by parquet_bigdatabench_dw_order_300g.buyer_id limit 10
Coordinator: bigdata3:22000
Plan:
----------------
Estimated Per-Host Requirements: Memory=39.27GB VCores=2
F04:PLAN FRAGMENT [UNPARTITIONED]
WRITE TO HDFS [default.result, OVERWRITE=false]
| partitions=1
| hosts=1 per-host-mem=unavailable
|
08:EXCHANGE [UNPARTITIONED]
limit: 10
hosts=3 per-host-mem=unavailable
tuple-ids=2 row-size=12B cardinality=10
F03:PLAN FRAGMENT [HASH(parquet_bigdatabench_dw_order_300g.buyer_id)]
DATASTREAM SINK [FRAGMENT=F04, EXCHANGE=08, UNPARTITIONED]
07:AGGREGATE [FINALIZE]
| output: sum:merge(parquet_bigdatabench_dw_item_300g.goods_amount)
| group by: parquet_bigdatabench_dw_order_300g.buyer_id
| limit: 10
| hosts=3 per-host-mem=9.21GB
| tuple-ids=2 row-size=12B cardinality=2247426048
|
06:EXCHANGE [HASH(parquet_bigdatabench_dw_order_300g.buyer_id)]
hosts=3 per-host-mem=0B
tuple-ids=2 row-size=12B cardinality=2247426048
F02:PLAN FRAGMENT [HASH(parquet_bigdatabench_dw_item_300g.order_id)]
DATASTREAM SINK [FRAGMENT=F03, EXCHANGE=06, HASH(parquet_bigdatabench_dw_order_300g.buyer_id)]
03:AGGREGATE
| output: sum(parquet_bigdatabench_dw_item_300g.goods_amount)
| group by: parquet_bigdatabench_dw_order_300g.buyer_id
| hosts=3 per-host-mem=27.63GB
| tuple-ids=2 row-size=12B cardinality=2247426048
|
02:HASH JOIN [INNER JOIN, PARTITIONED]
| hash predicates: parquet_bigdatabench_dw_item_300g.order_id = parquet_bigdatabench_dw_order_300g.order_id
| hosts=3 per-host-mem=11.47GB
| tuple-ids=1,0 row-size=20B cardinality=4103971316
|
|--05:EXCHANGE [HASH(parquet_bigdatabench_dw_order_300g.order_id)]
| hosts=3 per-host-mem=0B
| tuple-ids=0 row-size=8B cardinality=4200000000
|
04:EXCHANGE [HASH(parquet_bigdatabench_dw_item_300g.order_id)]
hosts=3 per-host-mem=0B
tuple-ids=1 row-size=12B cardinality=4200000000
F01:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=05, HASH(parquet_bigdatabench_dw_order_300g.order_id)]
00:SCAN HDFS [default.parquet_bigdatabench_dw_order_300g, RANDOM]
partitions=1/1 files=87 size=21.15GB
table stats: 4200000000 rows total
column stats: all
hosts=3 per-host-mem=176.00MB
tuple-ids=0 row-size=8B cardinality=4200000000
F00:PLAN FRAGMENT [RANDOM]
DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, HASH(parquet_bigdatabench_dw_item_300g.order_id)]
01:SCAN HDFS [default.parquet_bigdatabench_dw_item_300g, RANDOM]
partitions=1/1 files=258 size=63.82GB
table stats: 4200000000 rows total
column stats: all
hosts=3 per-host-mem=176.00MB
tuple-ids=1 row-size=12B cardinality=4200000000
----------------
Estimated Per-Host Mem: 42170573209
Estimated Per-Host VCores: 2
Admission result: Admitted immediately
Request Pool: root.root
ExecSummary:
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------
08:EXCHANGE 1 10h8m 10h8m 0 10 0 -1.00 B UNPARTITIONED
07:AGGREGATE 3 135.686ms 407.60ms 0 2.25B 155.03 MB 9.21 GB FINALIZE
06:EXCHANGE 3 15.975ms 24.767ms 1.20M 2.25B 0 0 HASH(parquet_bigdatabench_d...
03:AGGREGATE 3 887.849ms 1s340ms 1.20M 2.25B 155.02 MB 27.63 GB
02:HASH JOIN 3 3h19m 9h53m 1.50M 4.10B 31.46 GB 11.47 GB INNER JOIN, PARTITIONED
|--05:EXCHANGE 3 1m2s 2m5s 4.20B 4.20B 0 0 HASH(parquet_bigdatabench_d...
| 00:SCAN HDFS 3 12s695ms 16s494ms 4.20B 4.20B 485.76 MB 176.00 MB default.parquet_bigdatabenc...
04:EXCHANGE 3 59s722ms 2m59s 4.20B 4.20B 0 0 HASH(parquet_bigdatabench_d...
01:SCAN HDFS 3 14s341ms 19s831ms 4.20B 4.20B 205.20 MB 176.00 MB default.parquet_bigdatabenc...
... View more
Labels:
- Labels:
-
Apache Impala
04-11-2016
07:35 PM
hi: The datanode has three dir to store the data such as /data/1,/data/2, /data/3. I delete the /data/1 in the datanode by mistake. Then the hdfs shows missing blocks. I copy the data from /data/3 to /data/1 , but it didn't work. Thanks for regard leezy
... View more
Labels:
- Labels:
-
HDFS
04-08-2016
01:36 AM
hi: my cluster is down when i change the namenode dir, but when i recover it , the hive table is the metastore,but it's location is mistake and i cannot select any rows. Then i want change the location of the table, but it show me the error: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow and the hive metastore error logs in the follow: 2016-04-08 16:35:19,101 ERROR org.apache.hadoop.hive.metastore.RetryingHMSHandler: [pool-3-thread-134]: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5417) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3452) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_cascade(HiveMetaStore.java:3404) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) at com.sun.proxy.$Proxy12.alter_table_with_cascade(Unknown Source) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_cascade.getResult(ThriftHiveMetastore.java:9400) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_cascade.getResult(ThriftHiveMetastore.java:9384) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow at org.apache.hadoop.fs.Path.initialize(Path.java:206) at org.apache.hadoop.fs.Path.<init>(Path.java:116) at org.apache.hadoop.fs.Path.<init>(Path.java:94) at org.apache.hadoop.hive.metastore.Warehouse.getTablePath(Warehouse.java:188) at org.apache.hadoop.hive.metastore.Warehouse.getFileStatusesForUnpartitionedTable(Warehouse.java:542) at org.apache.hadoop.hive.metastore.MetaStoreUtils.updateUnpartitionedTableStatsFast(MetaStoreUtils.java:179) at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:237) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3432) ... 21 more Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://nameservice1parquet_armin_flow at java.net.URI.checkPath(URI.java:1804) at java.net.URI.<init>(URI.java:752) at org.apache.hadoop.fs.Path.initialize(Path.java:203) ... 28 more
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
HDFS
-
Security
12-09-2015
11:43 PM
i user cm version 5.4.5, after i reboot the cluster, i also has the same problem. The admin console output "Request to the Service Monitor failed. This may cause slow page responses. View the status of the Service Monitor.", who can solve it
... View more