Member since
05-11-2016
35
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1470 | 12-06-2016 04:19 PM |
02-27-2020
07:45 AM
Turning on debug mode showed a little bit of additional information and eventually got me to look at the code and added a few debug lines of my own to /usr/hdp/current/superset/lib/python3.6/site-packages/flask_appbuilder/security/manager.py in _search_ldap to show the filter_str and username being passed to the LDAP search. I saw the filter_str was set to userPrinicipalName=jeff.watson@our.domain, so I got rid of the @our.domain adding AUTH_LDAP_APPEND_DOMAIN, but that still didn't work. I finally remembered that Ranger used sSAMAcountName as an AD search name,so I changed add AUTH_LDAP_UID_FIELD as sAMAccountName and poof, LDAP logins work. Note: Ambari settings aren't saved where the command line version can find them until I saved and restarted superset in Ambari, then stopped it again so I could run it interactively to see the debug logging. I'm busy and lazy, so I didn't start removing other settings to see what I needed or didn't need, so here are the settings that worked for me. Our cluster is Kerberized and uses self signed certificates. AUTH_LDAP_UID_FIELD=sAMAccountName AUTH_LDAP_BIND_USER=CN=Bind,OU=Admin,dc=our,dc=domain AUTH_LDAP_SEARCH=OU=Employees,dc=our,dc=domain AUTH_LDAP_SERVER=ldap://our.domain AUTH_LDAP=AUTH_LDAP AUTH_LDAP_ALLOW_SELF_SIGNED=True AUTH_LDAP_APPEND_DOMAIN=False AUTH_LDAP_FIRSTNAME_FIELD=givenName AUTH_LDAP_LASTNAME_FIELD=sn AUTH_LDAP_USE_TLS=False AUTH_USER_REGISTRATION=True ENABLE_KERBEROS_AUTHENTICATION=True KERBEROS_KEYTAB=/etc/security/keytabs/superset.headless.keytab KERBEROS_PRINCIPAL=superset-sdrdev@OUR.DOMAIN
... View more
11-14-2019
02:34 PM
Ditto on the need. I filled in all the fields in Ambari (in a Kerberized cluster) and no luck. Is there supposed to be any logging somewhere other than /var/log/superset/superset.log to show either issues or attempts to login?
... View more
04-17-2019
05:47 PM
As the last step in my process, I need to check to see if any more files exist in an HDFS directory. I tried using FetchHDFS which can take an existing flow file (unlike ListHDFS which won't accept an incoming flow file), but I discovered the hard way that FetchHDFS can't take wildcards, only an HDFS path and filename. I looked for, but can't find anything on calling existing Java HDFS methods from ExecuteScript and groovy. I was hoping not to need to build a custom processor. The only option I've come up with so far is to write a small standalone Java app and call it using ExecuteStreamCommand. But that loads a JVM every time (presumably). Any other ideas?
... View more
Labels:
- Labels:
-
Apache NiFi
11-03-2017
07:08 PM
1 Kudo
Something to consider is the downstream of Hive uses as well. We used String initially (mainly because for the data we were loading, we weren't given the data types, so String allowed us to ignore it). However, when we started using SAS, those String fields all converted to varchar(32k) and caused headaches on that end. We converted to varchar(x).
... View more
08-07-2017
10:45 PM
Artem, can you tell me how or where the ambari-admin-password-reset command is located. I'm guessing it's an alias, but it's not set in root on the ambari server. So where is it defined since you're saying it's not a file of some sort.
... View more
07-04-2017
12:59 AM
I forgot I ripped the --jars part out of the spark-submit above because the text was too long. Here's that part. I'm seeing on some other gripes in StackOverflow about database drivers missing points to the driver not being included in the class path. As you can see in the --conf spark.driver.extraClassPath, --conf spark.executor.extraClassPath, and --jars, I tried to provide the /usr/hdp/current/phoenix-client/phoenix-client.jar driver in all contexts. spark-submit --jars /home/jwatson/sdr/bin/e2parser-1.0.jar,/home/jwatson/sdr/bin/f18parser-1.0.jar,/home/jwatson/sdr/bin/mdanparser-1.0.jar,/home/jwatson/sdr/bin/regimerecog-1.0.jar,/home/jwatson/sdr/bin/tsvparser-1.0.jar,/home/jwatson/sdr/bin/xmlparser-1.0.jar,/home/jwatson/sdr/bin/aws-java-sdk-1.11.40.jar,/home/jwatson/sdr/bin/aws-java-sdk-s3-1.11.40.jar,/home/jwatson/sdr/bin/jackson-annotations-2.6.5.jar,/home/jwatson/sdr/bin/jackson-core-2.6.5.jar,/home/jwatson/sdr/bin/jackson-databind-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-paranamer-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-scala_2.10-2.6.5.jar,/home/jwatson/sdr/bin/miglayout-swing-4.2.jar,/home/jwatson/sdr/bin/commons-configuration-1.6.jar,/home/jwatson/sdr/bin/xml-security-impl-1.0.jar,/home/jwatson/sdr/bin/metrics-core-2.2.0.jar,/home/jwatson/sdr/bin/jcommon-1.0.0.jar,/home/jwatson/sdr/bin/ojdbc6.jar,/home/jwatson/sdr/bin/jopt-simple-4.5.jar,/home/jwatson/sdr/bin/ucanaccess-3.0.1.jar,/home/jwatson/sdr/bin/httpcore-nio-4.4.5.jar,/home/jwatson/sdr/bin/nifi-site-to-site-client-1.0.0.jar,/home/jwatson/sdr/bin/nifi-spark-receiver-1.0.0.jar,/home/jwatson/sdr/bin/commons-compiler-2.7.8.jar,/home/jwatson/sdr/bin/janino-2.7.8.jar,/home/jwatson/sdr/bin/hsqldb-2.3.1.jar,/home/jwatson/sdr/bin/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,/home/jwatson/sdr/bin/slf4j-api-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-log4j12-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-simple-1.7.21.jar,/home/jwatson/sdr/bin/snappy-java-1.1.1.7.jar,/home/jwatson/sdr/bin/snakeyaml-1.7.jar,local://usr/hdp/current/hadoop-client/client/hadoop-common.jar,local://usr/hdp/current/hadoop-client/client/hadoop-mapreduce-client-core.jar,local://usr/hdp/current/hadoop-client/client/jetty-util.jar,local://usr/hdp/current/hadoop-client/client/netty-all-4.0.23.Final.jar,local://usr/hdp/current/hadoop-client/client/paranamer-2.3.jar,local://usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar,local://usr/hdp/current/hadoop-client/lib/httpclient-4.5.2.jar,local://usr/hdp/current/hadoop-client/lib/jetty-6.1.26.hwx.jar,local://usr/hdp/current/hadoop-client/lib/joda-time-2.8.1.jar,local://usr/hdp/current/hadoop-client/lib/log4j-1.2.17.jar,local://usr/hdp/current/hbase-client/lib/hbase-client.jar,local://usr/hdp/current/hbase-client/lib/hbase-common.jar,local://usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar,local://usr/hdp/current/hbase-client/lib/hbase-protocol.jar,local://usr/hdp/current/hbase-client/lib/hbase-server.jar,local://usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,local://usr/hdp/current/hive-client/lib/antlr-runtime-3.4.jar,local://usr/hdp/current/hive-client/lib/commons-collections-3.2.2.jar,local://usr/hdp/current/hive-client/lib/commons-dbcp-1.4.jar,local://usr/hdp/current/hive-client/lib/commons-pool-1.5.4.jar,local://usr/hdp/current/hive-client/lib/datanucleus-api-jdo-4.2.1.jar,local://usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar,local://usr/hdp/current/hive-client/lib/datanucleus-rdbms-4.1.7.jar,local://usr/hdp/current/hive-client/lib/geronimo-jta_1.1_spec-1.1.1.jar,local://usr/hdp/current/hive-client/lib/hive-exec.jar,local://usr/hdp/current/hive-client/lib/hive-jdbc.jar,local://usr/hdp/current/hive-client/lib/hive-metastore.jar,local://usr/hdp/current/hive-client/lib/jdo-api-3.0.1.jar,local://usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar,local://usr/hdp/current/phoenix-client/phoenix-client.jar,local://usr/hdp/current/spark-client/lib/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
... View more
07-04-2017
12:48 AM
As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.
... View more
07-04-2017
12:44 AM
One final note: this code runs successfully in Eclipse using --master local[*], it's just the YARN cluster mode where things break down. Go figure.
... View more
07-04-2017
12:41 AM
One last point, the driver upstream from the above connects to Phoenix successfully during the app startup. The code above is where we are querying Phoenix with the SQL query shown to pull rows and kick off an RDD per row returned. It seems like we enter a different context in the call to sparkContext.newAPIHadoopRDD() and the foreach(rdd -> ....) and the stack trace gives me the impression we are (duh) somewhere between the driver and the executors that are trying to instantiate the Phoenix driver. In other parts of the code I had to add a Class.forName("org.apache.phoenix.jdbc.PhoenixDriver") to get rid of this exception, thus I added that code prior to creating the Java Spark Context, before the call to newAPIHadoopRDD() and in the start of the foreach(rdd ->....), but to no avail.
... View more
07-04-2017
12:31 AM
Here is the slightly longer exception that is logged by the outer part of my application 2017-06-28 21:28:25 ERROR AppDriver Fatal exception encountered. Job aborted.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, master.vm.local): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:116)
... 12 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1882)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:919)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:917)
at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:332)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:46)
at mil.navy.navair.sdr.common.framework.ReloadInputReader.readInputInDriver(ReloadInputReader.java:137)
Caused by: java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more