Member since
07-01-2015
460
Posts
78
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1370 | 11-26-2019 11:47 PM | |
1314 | 11-25-2019 11:44 AM | |
9550 | 08-07-2019 12:48 AM | |
2199 | 04-17-2019 03:09 AM | |
3535 | 02-18-2019 12:23 AM |
09-06-2018
12:12 AM
Thanks for the clarification, this is the hdfs-site.xml. I am little bit confused, so the WebHDFS REST API is listening on the same port as the NameNode's UI? Thanks <property>
<name>dfs.ha.namenodes.hanameservice</name>
<value>namenode18,namenode21</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hanameservice.namenode18</name>
<value>ip-10-197-31-86.eu-west-1.compute.internal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.hanameservice.namenode18</name>
<value>ip-10-197-31-86.eu-west-1.compute.internal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.hanameservice.namenode18</name>
<value>ip-10-197-31-86.eu-west-1.compute.internal:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.hanameservice.namenode18</name>
<value>ip-10-197-31-86.eu-west-1.compute.internal:50470</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hanameservice.namenode21</name>
<value>ip-10-197-7-125.eu-west-1.compute.internal:8020</value>
</property>
<property>
<name>dfs.namenode.servicerpc-address.hanameservice.namenode21</name>
<value>ip-10-197-7-125.eu-west-1.compute.internal:8022</value>
</property>
<property>
<name>dfs.namenode.http-address.hanameservice.namenode21</name>
<value>ip-10-197-7-125.eu-west-1.compute.internal:50070</value>
</property>
<property>
<name>dfs.namenode.https-address.hanameservice.namenode21</name>
<value>ip-10-197-7-125.eu-west-1.compute.internal:50470</value>
... View more
09-05-2018
11:35 PM
Without knowing more details it looks like your agent does not have (configured or stored) the certificate of the Cloudera Manager server, thus cannot connect to 7183 port. Make sure you follow Cloudera's guide how to setup TLS https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_sg_config_tls_security.html
... View more
09-04-2018
10:51 PM
1 Kudo
Yes it is. But you have to prepare the keytab files in advance on your REST API server and prepare a mapping file. Then you can switch between users. See the Simba documentation for more details
... View more
09-03-2018
01:00 AM
1 Kudo
This is also my experience. There is no way to escape from this large memory hungry queries by hoping that you trade it off for a disk spill and you deliver the results (slower than expected but still deliver) The Impala just works in memory, you have to accept it. And the memory limit is not for the spilling, but for exactly this reason, to kill the query, to avoid consuming all the resources. So if you were hoping that there is a setting, which can tell the Impala to not consume more memory but still finish the query, I will dissapoint you, I did not find anything like that.
... View more
09-03-2018
12:46 AM
1 Kudo
As far as I understand the docs the zookeeper is "open" and if you want to create something "protected" you have to do it explicitly. Create a protected znode from within the ZooKeeper CLI. Make sure that you substitute YOUR-REALM as appropriate. create /znode1 znode1data sasl:zkcli@{{YOUR-REALM}}:cdwra https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cdh_sg_zookeeper_security.html
... View more
08-30-2018
06:22 AM
1 Kudo
Hi, you have to create an ssl_context before you open the connection and point it to the CA certificate: import ssl
context = ssl.create_default_context(cafile=cmcertpath)
api = ApiResource( cm_host, '7183', username=username, password=password, use_tls=True, ssl_context=context)
... View more
08-30-2018
05:59 AM
Hi,
having a non-ha setup of kerberized HDFS where the Namenode is running but
netstat -an | grep 50070
does not return anything (running as root on namenode host)
The WebHDFS is enabled in CM:
dfs.webhdfs.enabled=True
Any hints why this port is not open and why the NN is not listening? Thanks
... View more
Labels:
- Labels:
-
Cloudera Manager
-
HDFS
-
Kerberos
08-30-2018
01:46 AM
1 Kudo
Problem resolved: the issue was with the overall memory consumption on the HiveServer2. The hive job submitted do not run in the HS2 JVM, but as an extra java process and the OS was killing it because it could not allocate such amount of memory.
... View more
08-29-2018
03:57 AM
You can materialize this metric table with 60 rows, and then Impala will broadcast it to every node (or Hive will run a map-side join) so it will not affect the shuffling over network. However the result set will be N times larger, but that's the point of the query right? (produce from 100k with 10 metrics a 1M table)
... View more
08-29-2018
12:36 AM
Hi, some of the Hive queries started to fail with this error message, so I checked the web and community, and usually the response is the setting the autoconvert.join to false can help. The error appears very quickly, almost immediately after the query is submitted. I know that Hive is trying to create a map-side join but I dont understand why it fails, when X months it was running ok. The actual tables are very small (<1000 rows) so I dont think the hash-table can cause the out-of-memory. 2018-08-29 04:23:46,468 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,468 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,468 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=acquireReadWriteLocks from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: </PERFLOG method=acquireReadWriteLocks start=1535509426468 end=1535509426480 duration=12 from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: Executing command(queryId=hive_20180829042323_49ab08f3-b5e3-45db-9ca5-94ce91d63b11): INSERT OVERWRITE TABLE `prod_work`.`table` SELECT
`esub`.`yesterday` AS `yesterday`,
`esub`.`product_group` AS `product_group`,
`esub`.`priceplan` AS `priceplan`,
`esub`.`brand` AS `brand`,
`esub`.`brand_desc` AS `brand_desc`,
`esub`.`segment` AS `segment`,
`esub`.`product_payment_type` AS `product_payment_type`,
`esub`.`esub` AS `esub`,
`rnch`.`rnch` AS `rnch`,
`gradd`.`gradd` AS `gradd`
FROM `prod_work`.`tablea` `esub`
LEFT JOIN `prod_work`.`tableb` `rnch`
ON `esub`.`priceplan` = `rnch`.`priceplan`
LEFT JOIN `prod_work`.`tablec` `gradd`
ON `esub`.`priceplan` = `gradd`.`priceplan`
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: Query ID = hive_20180829042323_49ab08f3-b5e3-45db-9ca5-94ce91d63b11
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: Total jobs = 1
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: </PERFLOG method=TimeToSubmit start=1535509426468 end=1535509426480 duration=12 from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,480 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,481 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: Starting task [Stage-6:MAPREDLOCAL] in serial mode
2018-08-29 04:23:46,481 INFO org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: [HiveServer2-Background-Pool: Thread-138349]: Generating plan file file:/tmp/hive/dbb7f6a5-a988-466b-8170-58bbcc924fb9/hive_2018-08-29_04-23-46_256_3553242332203703514-2517/-local-10004/plan.xml
2018-08-29 04:23:46,481 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=serializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2018-08-29 04:23:46,481 INFO org.apache.hadoop.hive.ql.exec.Utilities: [HiveServer2-Background-Pool: Thread-138349]: Serializing MapredLocalWork via kryo
2018-08-29 04:23:46,483 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: </PERFLOG method=serializePlan start=1535509426481 end=1535509426483 duration=2 from=org.apache.hadoop.hive.ql.exec.Utilities>
2018-08-29 04:23:46,532 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Background-Pool: Thread-138349]: HiveConf of name hive.server2.idle.session.timeout_check_operation does not exist
2018-08-29 04:23:46,532 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Background-Pool: Thread-138349]: HiveConf of name hive.sentry.conf.url does not exist
2018-08-29 04:23:46,532 WARN org.apache.hadoop.hive.conf.HiveConf: [HiveServer2-Background-Pool: Thread-138349]: HiveConf of name hive.entity.capture.input.URI does not exist
2018-08-29 04:23:46,543 INFO org.apache.hadoop.hdfs.DFSClient: [HiveServer2-Background-Pool: Thread-138349]: Created token for hive: HDFS_DELEGATION_TOKEN owner=hive/ip-10-197-23-43.eu-west-1.compute.internal@DOMAIN.LOCAL, renewer=hive, realUser=, issueDate=1535509426541, maxDate=1536114226541, sequenceNumber=1381844, masterKeyId=457 on ha-hdfs:hanameservice
2018-08-29 04:23:46,544 INFO org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: [HiveServer2-Background-Pool: Thread-138349]: Executing: /opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/jars/hive-common-1.1.0-cdh5.13.3.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -libjars file:///opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/hive/auxlib/hive-exec-1.1.0-cdh5.13.3-core.jar,file:///opt/cloudera/parcels/CDH-5.13.3-1.cdh5.13.3.p0.2/lib/hive/auxlib/hive-exec-core.jar -localtask -plan file:/tmp/hive/dbb7f6a5-a988-466b-8170-58bbcc924fb9/hive_2018-08-29_04-23-46_256_3553242332203703514-2517/-local-10004/plan.xml -jobconffile file:/tmp/hive/dbb7f6a5-a988-466b-8170-58bbcc924fb9/hive_2018-08-29_04-23-46_256_3553242332203703514-2517/-local-10005/jobconf.xml
2018-08-29 04:23:46,583 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-138349]: Execution failed with exit status: 1
2018-08-29 04:23:46,583 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-138349]: Obtaining error information
2018-08-29 04:23:46,583 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-138349]:
Task failed!
Task ID:
Stage-6
Logs:
2018-08-29 04:23:46,583 ERROR org.apache.hadoop.hive.ql.exec.Task: [HiveServer2-Background-Pool: Thread-138349]: /var/log/hive/hadoop-cmf-CD-HIVE-nRaFPvFN-HIVESERVER2-ip-10-197-23-43.eu-west-1.compute.internal.log.out
2018-08-29 04:23:46,583 ERROR org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask: [HiveServer2-Background-Pool: Thread-138349]: Execution failed with exit status: 1
2018-08-29 04:23:46,584 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
2018-08-29 04:23:46,584 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: </PERFLOG method=Driver.execute start=1535509426480 end=1535509426584 duration=104 from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,584 INFO org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-138349]: Completed executing command(queryId=hive_20180829042323_49ab08f3-b5e3-45db-9ca5-94ce91d63b11); Time taken: 0.104 seconds
2018-08-29 04:23:46,584 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,594 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-138349]: </PERFLOG method=releaseLocks start=1535509426584 end=1535509426594 duration=10 from=org.apache.hadoop.hive.ql.Driver>
2018-08-29 04:23:46,595 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-138349]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748) Is it probably related to the HiveServer memory configuration? Do I understand it correctly that this fail happens inside the HiveServer2 JVM, when it tries to build a hash-table for a map-side join? Thanks
... View more
Labels:
- Labels:
-
Apache Hive