Member since
12-10-2018
11
Posts
3
Kudos Received
0
Solutions
04-07-2020
10:27 AM
Hi All, while working with Spark action in oozie, we are facing issue with hive-warehouse-connector. This is only related when we run spark action using oozie in yarn cluster mode. hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar is a fat file and contains all different jar files which conflict with spark jars. Dont know how to proceed. we tried putting hive-connector jar file in oozie's spark folders and using spark --jar option. But no luck Oozie spark action example: <spark> <master>${sparkMaster}</master> <mode>cluster</mode> <name>XXX</name> <class>XXXX</class> <jar>${nameNode}${baseDir}/analytics-1.0.jar</jar> <spark-opts>--jars /usr/hdp/current/spark2-client/jars/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar --conf spark.yarn.appMasterEnv.HADOOP_USER_NAME=${hiveDbUser} --conf spark.driver.userClassPathFirst=false</spark-opts> </spark> Error: SLF4J: Found binding in [jar:file:/hadoop/yarn/local/filecache/9331/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop/yarn/local/usercache/hiveuser1/filecache/323/zz-hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.io.retry.RetryUtils.getDefaultRetryPolicy(Lorg/apache/hadoop/conf/Configuration;Ljava/lang/String;ZLjava/lang/String;Ljava/lang/String;Ljava/lang/Class;)Lorg/apache/hadoop/io/retry/RetryPolicy;
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:318)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:234)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:232)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:232)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:197)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:197)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:838)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) Thanks
... View more
02-10-2020
02:42 AM
2 Kudos
We are already in Feb, Are we making CDP (cloudera manager etc) open source? Earlier It was said, It would be open sourced in Feb 2020. Right now CDP 7.x is only available with trails for 2 months OR its a dead end without Subscription.
... View more
01-08-2020
12:54 AM
Still its not clear, If its open source so why Cloudera is asking for trail version installation. Earlier it was not like that CDH 6 or 5. At-least we can download the parcles and install as CDH and later we can purchase if we want. Same as CDH. Its confusing 🙂
... View more
01-08-2020
12:46 AM
Hi All, While working with spark 2.3, Hive and yarn (HDP 3.1), Every job works fine and competes gracefully. But Overall spark job takes the same time as it takes time with larger data even if we have very small data. For example: scheduling job on yarn takes some time either we have large data or small data. So, simple spark queries on small data also takes time and finish in 45 secs to 1+ mins (I guess, which includes yarn's scheduling and resource management time) and databases takes only few seconds to run same query. Can we reduce the time with spark if we are using HDP 3.1 ---- 6 machines cluster. OR do we have any another mode to run spark with small data available in Hive in less time at-least for testing only. Thanks
... View more
01-06-2020
02:52 AM
1 Kudo
I have a simple question, Is CDP (CDH 7.x) open source?
when I try to download the cm7 from this URL https://archive.cloudera.com/p/cm7/
Its ask for password.
I am following this URL https://docs.cloudera.com/cdpdc/7.0/installation/topics/cdpdc-configure-repository.html
Thanks
... View more
11-05-2019
02:52 AM
Hi All,
We are using HDP 3.10 and trying to run spark 2.3.2 job using oozie. But getting the error below. Please let me know what we are missing?
Workflow.xml file.
<workflow-app name="ABC" xmlns="uri:oozie:workflow:0.4"> <parameters> <property> <name>sparkMaster</name> <value>yarn</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> </parameters> <start to="ABC"/> <action name="Demand_History_Rollup"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${resourceManager}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>default</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> <property> <name>oozie.libpath</name> <value>${nameNode}/user/oozie/share/lib*</value> </property> <property> <name>oozie.action.sharelib.for.spark</name> <value>spark,hive2</value> </property> </configuration> <master>yarn</master> <mode>cluster</mode> <name>Demand_History_Rollup</name> <class>com.xxx.ABC</class> <jar>${nameNode}${baseDir}/workflows/spark/abc.jar</jar> <spark-opts>--files ${nameNode}${baseDir}/workflows/hive-site.xml,${nameNode}${baseDir}/workflows/core-site.xml,${nameNode}${baseDir}/workflows/hdfs-site.xml</spark-opts> <arg>HIVE_JDBC_STRING=${hiveConnectionString}</arg> <arg>HIVE_DB_NAME=${hiveDbName}</arg> <arg>HIVE_DB_USER=${hiveDbUser}</arg> <arg>HIVE_DB_PASSWORD=${hiveDbPassword}</arg> <arg>HADOOP_USER_NAME=${hiveDbUser}</arg> <arg>SPARK_MASTER=${sparkMaster}</arg> </spark> <ok to="end"/> <error to="kill"/> </action> <kill name="kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hadoop/yarn/local/filecache/83112/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop/yarn/local/filecache/83012/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop/yarn/local/filecache/83123/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.AbstractMethodError: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy()Lorg/apache/hadoop/io/retry/FailoverProxyProvider$ProxyInfo;
at org.apache.hadoop.io.retry.RetryInvocationHandler$ProxyDescriptor.<init>(RetryInvocationHandler.java:197)
at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:328)
at org.apache.hadoop.io.retry.RetryInvocationHandler.<init>(RetryInvocationHandler.java:322)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:59)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:147)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:234)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:232)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:232)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:197)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814)
at org.apache.spark.deploy.yarn.ApplicationMaster.<init>(ApplicationMaster.scala:197)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:838)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
... View more
Labels:
06-28-2019
03:21 PM
I did hive integration with Ranger and PAM authentication which is working correct on single machine environment. Can somebody explain what is the difference between PAM - sshd & passwd? which one we should use in multiple machine HDP environment. Where Ranger Usersync and Hive server2 is on two different hosts. Thanks in Advance
... View more
Labels:
06-26-2019
03:03 AM
We have integrated Hive and Ranger (HDP 3.1) and we are able to login using beeline. We have created Username and password using Ranger's user creation page. But beeline only validate with username not with password. It accepts any password with same user and we got access with tables or schema. 1). If we provide wrong username then it says access denied 2). if we provide correct username and wrong password then it can login and access data which is wrong and should say access denied. Thanks in Advance.
... View more
06-22-2019
01:41 PM
HI All, I am working to integrate ranger and presto and following this URL but ended with error and integration is not successful. Steps: Created a ranger-${RANGER_VERSION}-presto-plugin.tar.gz file. Extracted tar file to ranger-presto-plugin Changed install.properies file. Run ./enable-presto-plugin.sh It was successful run but It did not create access-control.properties & rules.json file. And I created these files manually in /etc/presto. Now presto is not starting and failed with the below error. 2019-06-21T04:59:21.731Z ERROR main com.facebook.presto.server.PrestoServer Access control ranger is not registered java.lang.IllegalStateException: Access control ranger is not registered at com.google.common.base.Preconditions.checkState(Preconditions.java:585) at com.facebook.presto.security.AccessControlManager.setSystemAccessControl(AccessControlManager.java:136) at com.facebook.presto.security.AccessControlManager.loadSystemAccessControl(AccessControlManager.java:118) at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:142) at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:73) Do we need to use ranger-servicedef-presto.json anywhere because its not mentioned in ranger presto integration URL. OR any another link which I can follow which is having all information. Thanks Bharat Bhushan
... View more
05-04-2019
03:43 PM
Hi All, I am working on Ranger Kylin plugin installation. And I have completed the plugin installation. Ranger is showing Kylin services and policy on its web page. Ranger admin UI is successfully connected with kylin and policy cache json file in etc/ranger is also created. I have also updated kylin.properties file with "kylin.server.external-acl-provider". Now, when I create a new user using ranger admin UI and give permission to this user on kylin policy. After that using the same user I am not able login in Kylin GUI. I have checked logs and it says User not found. Only spring security code is in exception, No ranger code stack trace is available in exception. I guess, Kylin is not switched to ranger auth. Do I need to add more properties in kylin.properties to switch on Ranger policy? So, Kylin can authenticate using ranger users? Thanks
... View more
12-10-2018
07:37 PM
Hi All, I have a already setup HDP 3.0 cluster having total 7 machines with 3 worker nodes. I also installed Ranger to do security and a user already created on hive. Now I want to access hive table using spark. I have tried both using spark-shell and java code But both is getting same error. I am using spark llap to provide username/password for hive. But spark got an Exception:"No service instances found in registry" Log scala> df.show [Stage 0:> (0 + 0) / 1]18/12/10 17:08:51 WARN TaskSetManager: Stage 0 contains a task of very large size (423 KB). The maximum recommended task size is 100 KB.[Stage 0:> (0 + 1) / 1]18/12/10 17:08:55 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, s1.us-east4-c.c.asdf-224606.internal, executor 2): java.lang.RuntimeException: java.io.IOException: No service instances found in registry at com.hortonworks.spark.sql.hive.llap.HiveWarehouseDataReaderFactory.createDataReader(HiveWarehouseDataReaderFactory.java:66) If my table is having 0 records than it works fine and show me a blank table but if records are greater than 0 than it gives me exception. Thanks in advanced.
... View more
Labels: