About jeff_watson

jeff_watson · ‎02-27-2020

Turning on debug mode showed a little bit of additional information and eventually got me to look at the code and added a few debug lines of my own to /usr/hdp/current/superset/lib/python3.6/site-packages/flask_appbuilder/security/manager.py in _search_ldap to show the filter_str and username being passed to the LDAP search. I saw the filter_str was set to userPrinicipalName=jeff.watson@our.domain, so I got rid of the @our.domain adding AUTH_LDAP_APPEND_DOMAIN, but that still didn't work. I finally remembered that Ranger used sSAMAcountName as an AD search name,so I changed add AUTH_LDAP_UID_FIELD as sAMAccountName and poof, LDAP logins work. Note: Ambari settings aren't saved where the command line version can find them until I saved and restarted superset in Ambari, then stopped it again so I could run it interactively to see the debug logging. I'm busy and lazy, so I didn't start removing other settings to see what I needed or didn't need, so here are the settings that worked for me. Our cluster is Kerberized and uses self signed certificates. AUTH_LDAP_UID_FIELD=sAMAccountName AUTH_LDAP_BIND_USER=CN=Bind,OU=Admin,dc=our,dc=domain AUTH_LDAP_SEARCH=OU=Employees,dc=our,dc=domain AUTH_LDAP_SERVER=ldap://our.domain AUTH_LDAP=AUTH_LDAP AUTH_LDAP_ALLOW_SELF_SIGNED=True AUTH_LDAP_APPEND_DOMAIN=False AUTH_LDAP_FIRSTNAME_FIELD=givenName AUTH_LDAP_LASTNAME_FIELD=sn AUTH_LDAP_USE_TLS=False AUTH_USER_REGISTRATION=True ENABLE_KERBEROS_AUTHENTICATION=True KERBEROS_KEYTAB=/etc/security/keytabs/superset.headless.keytab KERBEROS_PRINCIPAL=superset-sdrdev@OUR.DOMAIN

jeff_watson · ‎11-14-2019

Ditto on the need. I filled in all the fields in Ambari (in a Kerberized cluster) and no luck. Is there supposed to be any logging somewhere other than /var/log/superset/superset.log to show either issues or attempts to login?

jeff_watson · ‎04-17-2019

As the last step in my process, I need to check to see if any more files exist in an HDFS directory. I tried using FetchHDFS which can take an existing flow file (unlike ListHDFS which won't accept an incoming flow file), but I discovered the hard way that FetchHDFS can't take wildcards, only an HDFS path and filename. I looked for, but can't find anything on calling existing Java HDFS methods from ExecuteScript and groovy. I was hoping not to need to build a custom processor. The only option I've come up with so far is to write a small standalone Java app and call it using ExecuteStreamCommand. But that loads a JVM every time (presumably). Any other ideas?

jeff_watson · ‎11-03-2017

Something to consider is the downstream of Hive uses as well. We used String initially (mainly because for the data we were loading, we weren't given the data types, so String allowed us to ignore it). However, when we started using SAS, those String fields all converted to varchar(32k) and caused headaches on that end. We converted to varchar(x).

jeff_watson · ‎08-07-2017

Artem, can you tell me how or where the ambari-admin-password-reset command is located. I'm guessing it's an alias, but it's not set in root on the ambari server. So where is it defined since you're saying it's not a file of some sort.

jeff_watson · ‎07-04-2017

I forgot I ripped the --jars part out of the spark-submit above because the text was too long. Here's that part. I'm seeing on some other gripes in StackOverflow about database drivers missing points to the driver not being included in the class path. As you can see in the --conf spark.driver.extraClassPath, --conf spark.executor.extraClassPath, and --jars, I tried to provide the /usr/hdp/current/phoenix-client/phoenix-client.jar driver in all contexts. spark-submit --jars /home/jwatson/sdr/bin/e2parser-1.0.jar,/home/jwatson/sdr/bin/f18parser-1.0.jar,/home/jwatson/sdr/bin/mdanparser-1.0.jar,/home/jwatson/sdr/bin/regimerecog-1.0.jar,/home/jwatson/sdr/bin/tsvparser-1.0.jar,/home/jwatson/sdr/bin/xmlparser-1.0.jar,/home/jwatson/sdr/bin/aws-java-sdk-1.11.40.jar,/home/jwatson/sdr/bin/aws-java-sdk-s3-1.11.40.jar,/home/jwatson/sdr/bin/jackson-annotations-2.6.5.jar,/home/jwatson/sdr/bin/jackson-core-2.6.5.jar,/home/jwatson/sdr/bin/jackson-databind-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-paranamer-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-scala_2.10-2.6.5.jar,/home/jwatson/sdr/bin/miglayout-swing-4.2.jar,/home/jwatson/sdr/bin/commons-configuration-1.6.jar,/home/jwatson/sdr/bin/xml-security-impl-1.0.jar,/home/jwatson/sdr/bin/metrics-core-2.2.0.jar,/home/jwatson/sdr/bin/jcommon-1.0.0.jar,/home/jwatson/sdr/bin/ojdbc6.jar,/home/jwatson/sdr/bin/jopt-simple-4.5.jar,/home/jwatson/sdr/bin/ucanaccess-3.0.1.jar,/home/jwatson/sdr/bin/httpcore-nio-4.4.5.jar,/home/jwatson/sdr/bin/nifi-site-to-site-client-1.0.0.jar,/home/jwatson/sdr/bin/nifi-spark-receiver-1.0.0.jar,/home/jwatson/sdr/bin/commons-compiler-2.7.8.jar,/home/jwatson/sdr/bin/janino-2.7.8.jar,/home/jwatson/sdr/bin/hsqldb-2.3.1.jar,/home/jwatson/sdr/bin/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,/home/jwatson/sdr/bin/slf4j-api-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-log4j12-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-simple-1.7.21.jar,/home/jwatson/sdr/bin/snappy-java-1.1.1.7.jar,/home/jwatson/sdr/bin/snakeyaml-1.7.jar,local://usr/hdp/current/hadoop-client/client/hadoop-common.jar,local://usr/hdp/current/hadoop-client/client/hadoop-mapreduce-client-core.jar,local://usr/hdp/current/hadoop-client/client/jetty-util.jar,local://usr/hdp/current/hadoop-client/client/netty-all-4.0.23.Final.jar,local://usr/hdp/current/hadoop-client/client/paranamer-2.3.jar,local://usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar,local://usr/hdp/current/hadoop-client/lib/httpclient-4.5.2.jar,local://usr/hdp/current/hadoop-client/lib/jetty-6.1.26.hwx.jar,local://usr/hdp/current/hadoop-client/lib/joda-time-2.8.1.jar,local://usr/hdp/current/hadoop-client/lib/log4j-1.2.17.jar,local://usr/hdp/current/hbase-client/lib/hbase-client.jar,local://usr/hdp/current/hbase-client/lib/hbase-common.jar,local://usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar,local://usr/hdp/current/hbase-client/lib/hbase-protocol.jar,local://usr/hdp/current/hbase-client/lib/hbase-server.jar,local://usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,local://usr/hdp/current/hive-client/lib/antlr-runtime-3.4.jar,local://usr/hdp/current/hive-client/lib/commons-collections-3.2.2.jar,local://usr/hdp/current/hive-client/lib/commons-dbcp-1.4.jar,local://usr/hdp/current/hive-client/lib/commons-pool-1.5.4.jar,local://usr/hdp/current/hive-client/lib/datanucleus-api-jdo-4.2.1.jar,local://usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar,local://usr/hdp/current/hive-client/lib/datanucleus-rdbms-4.1.7.jar,local://usr/hdp/current/hive-client/lib/geronimo-jta_1.1_spec-1.1.1.jar,local://usr/hdp/current/hive-client/lib/hive-exec.jar,local://usr/hdp/current/hive-client/lib/hive-jdbc.jar,local://usr/hdp/current/hive-client/lib/hive-metastore.jar,local://usr/hdp/current/hive-client/lib/jdo-api-3.0.1.jar,local://usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar,local://usr/hdp/current/phoenix-client/phoenix-client.jar,local://usr/hdp/current/spark-client/lib/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar

jeff_watson · ‎07-04-2017

As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.

jeff_watson · ‎07-04-2017

One final note: this code runs successfully in Eclipse using --master local[*], it's just the YARN cluster mode where things break down. Go figure.

jeff_watson · ‎07-04-2017

One last point, the driver upstream from the above connects to Phoenix successfully during the app startup. The code above is where we are querying Phoenix with the SQL query shown to pull rows and kick off an RDD per row returned. It seems like we enter a different context in the call to sparkContext.newAPIHadoopRDD() and the foreach(rdd -> ....) and the stack trace gives me the impression we are (duh) somewhere between the driver and the executors that are trying to instantiate the Phoenix driver. In other parts of the code I had to add a Class.forName("org.apache.phoenix.jdbc.PhoenixDriver") to get rid of this exception, thus I added that code prior to creating the Java Spark Context, before the call to newAPIHadoopRDD() and in the start of the foreach(rdd ->....), but to no avail.

jeff_watson · ‎07-04-2017

Here is the slightly longer exception that is logged by the outer part of my application 2017-06-28 21:28:25 ERROR AppDriver Fatal exception encountered. Job aborted. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, master.vm.local): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure; at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134) at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure; at java.sql.DriverManager.getConnection(DriverManager.java:689) at java.sql.DriverManager.getConnection(DriverManager.java:208) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98) at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57) at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:116) ... 12 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1882) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:919) at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:323) at org.apache.spark.rdd.RDD.foreach(RDD.scala:917) at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:332) at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:46) at mil.navy.navair.sdr.common.framework.ReloadInputReader.readInputInDriver(ReloadInputReader.java:137) Caused by: java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure; at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134) at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Online	Offline
Last Visited	‎07-23-2024 02:35 PM

Member Since	‎05-11-2016 04:58 PM
Last Visited	‎07-23-2024 02:35 PM
Posts	35
Kudos received	4

Cloudera Community

Re: NiFi GetHBase connect refused

Re: How to configure Superset to use LDAP?

Re: How to configure Superset to use LDAP?

How to determine if files exist in HDFS directory

Re: Hive STRING vs VARCHAR Performance

Re: How to reset Ambari Admin password?

Re: Phoenix driver not found in Spark job

Re: Phoenix driver not found in Spark job

Re: Phoenix driver not found in Spark job

Re: Phoenix driver not found in Spark job

Re: Phoenix driver not found in Spark job