Member since
05-11-2016
35
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
851 | 12-06-2016 04:19 PM |
02-27-2020
07:45 AM
Turning on debug mode showed a little bit of additional information and eventually got me to look at the code and added a few debug lines of my own to /usr/hdp/current/superset/lib/python3.6/site-packages/flask_appbuilder/security/manager.py in _search_ldap to show the filter_str and username being passed to the LDAP search. I saw the filter_str was set to userPrinicipalName=jeff.watson@our.domain, so I got rid of the @our.domain adding AUTH_LDAP_APPEND_DOMAIN, but that still didn't work. I finally remembered that Ranger used sSAMAcountName as an AD search name,so I changed add AUTH_ LDAP_ UID_ FIELD as sAMAccountName and poof, LDAP logins work. Note: Ambari settings aren't saved where the command line version can find them until I saved and restarted superset in Ambari, then stopped it again so I could run it interactively to see the debug logging. I'm busy and lazy, so I didn't start removing other settings to see what I needed or didn't need, so here are the settings that worked for me. Our cluster is Kerberized and uses self signed certificates. AUTH_LDAP_UID_FIELD=sAMAccountName AUTH_LDAP_BIND_USER=CN=Bind,OU=Admin,dc=our,dc=domain AUTH_LDAP_SEARCH=OU=Employees,dc=our,dc=domain AUTH_LDAP_SERVER=ldap://our.domain AUTH_LDAP=AUTH_LDAP AUTH_LDAP_ALLOW_SELF_SIGNED=True AUTH_LDAP_APPEND_DOMAIN=False AUTH_LDAP_FIRSTNAME_FIELD=givenName AUTH_LDAP_LASTNAME_FIELD=sn AUTH_LDAP_USE_TLS=False AUTH_USER_REGISTRATION=True ENABLE_KERBEROS_AUTHENTICATION=True KERBEROS_KEYTAB=/etc/security/keytabs/superset.headless.keytab KERBEROS_PRINCIPAL=superset-sdrdev@OUR.DOMAIN
... View more
11-14-2019
02:34 PM
Ditto on the need. I filled in all the fields in Ambari (in a Kerberized cluster) and no luck. Is there supposed to be any logging somewhere other than /var/log/superset/superset.log to show either issues or attempts to login?
... View more
04-17-2019
05:47 PM
As the last step in my process, I need to check to see if any more files exist in an HDFS directory. I tried using FetchHDFS which can take an existing flow file (unlike ListHDFS which won't accept an incoming flow file), but I discovered the hard way that FetchHDFS can't take wildcards, only an HDFS path and filename. I looked for, but can't find anything on calling existing Java HDFS methods from ExecuteScript and groovy. I was hoping not to need to build a custom processor. The only option I've come up with so far is to write a small standalone Java app and call it using ExecuteStreamCommand. But that loads a JVM every time (presumably). Any other ideas?
... View more
Labels:
- Labels:
-
Apache NiFi
10-02-2018
12:40 PM
When I finally started building it last night that idea finally dawned on me. Thanks for the confirmation. And pretty colored boxes (labels) to segregate the flows visually to make it a little less confusing.
... View more
10-01-2018
11:03 PM
Well, perhaps I should read the quick start guide that happy indicates that minifi doesn't support process groups. So, where does that leave me? Only one process group to run a few unrelated applications in one massive scary mess? I get that minifi isn't supposed to be nifi....but....now what?
... View more
10-01-2018
09:30 PM
It appears from what I've looked at so far that we can convert only one process group to a YAML file and have minifi process it. We are planning on trying to use minifi to run a few different custom build conversion applications that are unrelated to one another and then ship the data to nifi. Is it possible to have multiple process groups being processed by minifi (or multiple minifi instances running on the same server)? Or do I just bundle process groups into one larger process group convert that and have it run on minifi on Windows?
... View more
Labels:
- Labels:
-
Apache MiNiFi
04-03-2018
06:32 PM
It's less a matter of ticket timeout and more restricting the same user for creating lots of sessions. I'm looking into custom ranger plugins which looks promising.
... View more
04-03-2018
06:19 PM
FYI, if there is a place to hook some Java code into HBase as a coprocessor or some other mechanism, we're willing to take it on. Our customer stops short of us monkeying with the actual HBase/Phoenix code base (we are using Phoenix on top of HBase).
... View more
04-03-2018
05:30 PM
We went down the route of using Phoenix/HBase as a DW but abandoned it as our data size grew larger because we were going to have to dramatically increase the # of severs as the dataset grew to have enough region servers. The query times were horrible because we were almost always ignoring the row keys, other than dates so we ended up doing a lot of large table scans. HBase is not designed to perform well when you ignore row keys, and that's what treating Phoenix like a data warehouse ultimately caused. We had to set the timeout up to 1 hour, which Ambari didn't like since the default is 3 minute timeouts. Better to stick with Hive LLAP which is intended as a DW, but with LLAP can cache data to improve speed.
... View more
04-03-2018
05:09 PM
Is there a way to limit the # of connections per user? We are having to respond to security guidelines for databases and they have the desire the max # of connections for each user (we're using Kerberos and AD) to avoid DDOS attacks.
... View more
- Tags:
- Data Processing
- HBase
Labels:
- Labels:
-
Apache HBase
03-27-2018
07:04 PM
Valid question though, any idea when you're going to move up to 4.8, 4.9, 4.10, or 4.11?
... View more
11-03-2017
07:08 PM
1 Kudo
Something to consider is the downstream of Hive uses as well. We used String initially (mainly because for the data we were loading, we weren't given the data types, so String allowed us to ignore it). However, when we started using SAS, those String fields all converted to varchar(32k) and caused headaches on that end. We converted to varchar(x).
... View more
08-07-2017
10:45 PM
Artem, can you tell me how or where the ambari-admin-password-reset command is located. I'm guessing it's an alias, but it's not set in root on the ambari server. So where is it defined since you're saying it's not a file of some sort.
... View more
07-04-2017
12:59 AM
I forgot I ripped the --jars part out of the spark-submit above because the text was too long. Here's that part. I'm seeing on some other gripes in StackOverflow about database drivers missing points to the driver not being included in the class path. As you can see in the --conf spark.driver.extraClassPath, --conf spark.executor.extraClassPath, and --jars, I tried to provide the /usr/hdp/current/phoenix-client/phoenix-client.jar driver in all contexts. spark-submit --jars /home/jwatson/sdr/bin/e2parser-1.0.jar,/home/jwatson/sdr/bin/f18parser-1.0.jar,/home/jwatson/sdr/bin/mdanparser-1.0.jar,/home/jwatson/sdr/bin/regimerecog-1.0.jar,/home/jwatson/sdr/bin/tsvparser-1.0.jar,/home/jwatson/sdr/bin/xmlparser-1.0.jar,/home/jwatson/sdr/bin/aws-java-sdk-1.11.40.jar,/home/jwatson/sdr/bin/aws-java-sdk-s3-1.11.40.jar,/home/jwatson/sdr/bin/jackson-annotations-2.6.5.jar,/home/jwatson/sdr/bin/jackson-core-2.6.5.jar,/home/jwatson/sdr/bin/jackson-databind-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-paranamer-2.6.5.jar,/home/jwatson/sdr/bin/jackson-module-scala_2.10-2.6.5.jar,/home/jwatson/sdr/bin/miglayout-swing-4.2.jar,/home/jwatson/sdr/bin/commons-configuration-1.6.jar,/home/jwatson/sdr/bin/xml-security-impl-1.0.jar,/home/jwatson/sdr/bin/metrics-core-2.2.0.jar,/home/jwatson/sdr/bin/jcommon-1.0.0.jar,/home/jwatson/sdr/bin/ojdbc6.jar,/home/jwatson/sdr/bin/jopt-simple-4.5.jar,/home/jwatson/sdr/bin/ucanaccess-3.0.1.jar,/home/jwatson/sdr/bin/httpcore-nio-4.4.5.jar,/home/jwatson/sdr/bin/nifi-site-to-site-client-1.0.0.jar,/home/jwatson/sdr/bin/nifi-spark-receiver-1.0.0.jar,/home/jwatson/sdr/bin/commons-compiler-2.7.8.jar,/home/jwatson/sdr/bin/janino-2.7.8.jar,/home/jwatson/sdr/bin/hsqldb-2.3.1.jar,/home/jwatson/sdr/bin/pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar,/home/jwatson/sdr/bin/slf4j-api-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-log4j12-1.7.21.jar,/home/jwatson/sdr/bin/slf4j-simple-1.7.21.jar,/home/jwatson/sdr/bin/snappy-java-1.1.1.7.jar,/home/jwatson/sdr/bin/snakeyaml-1.7.jar,local://usr/hdp/current/hadoop-client/client/hadoop-common.jar,local://usr/hdp/current/hadoop-client/client/hadoop-mapreduce-client-core.jar,local://usr/hdp/current/hadoop-client/client/jetty-util.jar,local://usr/hdp/current/hadoop-client/client/netty-all-4.0.23.Final.jar,local://usr/hdp/current/hadoop-client/client/paranamer-2.3.jar,local://usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar,local://usr/hdp/current/hadoop-client/lib/httpclient-4.5.2.jar,local://usr/hdp/current/hadoop-client/lib/jetty-6.1.26.hwx.jar,local://usr/hdp/current/hadoop-client/lib/joda-time-2.8.1.jar,local://usr/hdp/current/hadoop-client/lib/log4j-1.2.17.jar,local://usr/hdp/current/hbase-client/lib/hbase-client.jar,local://usr/hdp/current/hbase-client/lib/hbase-common.jar,local://usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar,local://usr/hdp/current/hbase-client/lib/hbase-protocol.jar,local://usr/hdp/current/hbase-client/lib/hbase-server.jar,local://usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar,local://usr/hdp/current/hive-client/lib/antlr-runtime-3.4.jar,local://usr/hdp/current/hive-client/lib/commons-collections-3.2.2.jar,local://usr/hdp/current/hive-client/lib/commons-dbcp-1.4.jar,local://usr/hdp/current/hive-client/lib/commons-pool-1.5.4.jar,local://usr/hdp/current/hive-client/lib/datanucleus-api-jdo-4.2.1.jar,local://usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar,local://usr/hdp/current/hive-client/lib/datanucleus-rdbms-4.1.7.jar,local://usr/hdp/current/hive-client/lib/geronimo-jta_1.1_spec-1.1.1.jar,local://usr/hdp/current/hive-client/lib/hive-exec.jar,local://usr/hdp/current/hive-client/lib/hive-jdbc.jar,local://usr/hdp/current/hive-client/lib/hive-metastore.jar,local://usr/hdp/current/hive-client/lib/jdo-api-3.0.1.jar,local://usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar,local://usr/hdp/current/phoenix-client/phoenix-client.jar,local://usr/hdp/current/spark-client/lib/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar
... View more
07-04-2017
12:48 AM
As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.
... View more
07-04-2017
12:44 AM
One final note: this code runs successfully in Eclipse using --master local[*], it's just the YARN cluster mode where things break down. Go figure.
... View more
07-04-2017
12:41 AM
One last point, the driver upstream from the above connects to Phoenix successfully during the app startup. The code above is where we are querying Phoenix with the SQL query shown to pull rows and kick off an RDD per row returned. It seems like we enter a different context in the call to sparkContext.newAPIHadoopRDD() and the foreach(rdd -> ....) and the stack trace gives me the impression we are (duh) somewhere between the driver and the executors that are trying to instantiate the Phoenix driver. In other parts of the code I had to add a Class.forName("org.apache.phoenix.jdbc.PhoenixDriver") to get rid of this exception, thus I added that code prior to creating the Java Spark Context, before the call to newAPIHadoopRDD() and in the start of the foreach(rdd ->....), but to no avail.
... View more
07-04-2017
12:31 AM
Here is the slightly longer exception that is logged by the outer part of my application 2017-06-28 21:28:25 ERROR AppDriver Fatal exception encountered. Job aborted.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, master.vm.local): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:116)
... 12 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1421)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1420)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1420)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:801)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:801)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1642)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1601)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1590)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:622)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1856)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1869)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1882)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1953)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:919)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:917)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:323)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:917)
at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:332)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:46)
at mil.navy.navair.sdr.common.framework.ReloadInputReader.readInputInDriver(ReloadInputReader.java:137)
Caused by: java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
... View more
07-04-2017
12:28 AM
FYI, I added some code that in other parts of app forced the Phoenix JDBC driver to load, but it doesn't seem to be working in this context. The call to ConnectionUtil.getInputConnection(conf, props) is the code I tracked down in the stack trace below that builds the connection to verify I was getting the correct connection (it is the valid JDBC URL). final Configuration configuration = HBaseConfiguration.create();
configuration.set(HConstants.ZOOKEEPER_CLIENT_PORT, "2181");
configuration.set(HConstants.ZOOKEEPER_ZNODE_PARENT, quorumParentNode);
configuration.set(HConstants.ZOOKEEPER_QUORUM, quorum);
Properties props = new Properties();
Connection conn = ConnectionUtil.getInputConnection(configuration, props);
log.info("Connection: " + conn.getMetaData().getURL());
log.info("Ingest DBC: " + ingestDbConn);
log.info("driver host name: " + driverHost);
log.info("Zookeeper quorum: " + quorum);
log.info("Reload query: " + sqlQuery);
PhoenixConfigurationUtil.setPhysicalTableName(configuration, FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setInputTableName(configuration , FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setOutputTableName(configuration , FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setInputQuery(configuration, sqlQuery);
PhoenixConfigurationUtil.setInputClass(configuration , FileContentsWritable.class);
PhoenixConfigurationUtil.setUpsertColumnNames(configuration, FileContentsWritable.COLUMN_NAMES);
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
@SuppressWarnings("unchecked")
JavaPairRDD<NullWritable, FileContentsWritable> fileContentsRDD = sparkContext.newAPIHadoopRDD(configuration, PhoenixInputFormat.class, NullWritable.class, FileContentsWritable.class);
fileContentsRDD.foreach(rdd ->
{
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
FileContentsBean fileContentsBean = rdd._2.getFileContentsBean();
:
:
};
... View more
07-04-2017
12:19 AM
Here is the output: 2017-06-28 21:28:13 INFO ReloadInputReader Connection: jdbc:phoenix:master:2181:/hbase-unsecure;
2017-06-28 21:28:13 INFO ReloadInputReader Ingest DBC: jdbc:phoenix:master:2181:/hbase-unsecure
2017-06-28 21:28:13 INFO ReloadInputReader driver host name: master.vm.local
2017-06-28 21:28:13 INFO ReloadInputReader Zookeeper quorum: master
2017-06-28 21:28:13 INFO ReloadInputReader Reload query: SELECT FILE_NAME, TM, DATASET, WORKER_NAME, FILE_CONTENTS FROM JOBS.FILE_CONTENTS WHERE FILE_NAME in (SELECT FILE_NAME FROM JOBS.file_loaded WHERE file_name='B162836D20090316T0854.AAD')
2017-06-28 21:28:13 INFO MemoryStore Block broadcast_1 stored as values in memory (estimated size 428.6 KB, free 465.7 KB)
2017-06-28 21:28:13 INFO MemoryStore Block broadcast_1_piece0 stored as bytes in memory (estimated size 34.9 KB, free 500.7 KB)
2017-06-28 21:28:13 INFO BlockManagerInfo Added broadcast_1_piece0 in memory on 192.168.56.2:51844 (size: 34.9 KB, free: 457.8 MB)
2017-06-28 21:28:13 INFO SparkContext Created broadcast 1 from newAPIHadoopRDD at ReloadInputReader.java:135
2017-06-28 21:28:14 INFO SparkContext Starting job: foreach at ReloadInputReader.java:137
2017-06-28 21:28:14 INFO DAGScheduler Got job 0 (foreach at ReloadInputReader.java:137) with 1 output partitions
2017-06-28 21:28:14 INFO DAGScheduler Final stage: ResultStage 0 (foreach at ReloadInputReader.java:137)
2017-06-28 21:28:14 INFO DAGScheduler Parents of final stage: List()
2017-06-28 21:28:14 INFO DAGScheduler Missing parents: List()
2017-06-28 21:28:14 INFO DAGScheduler Submitting ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ReloadInputReader.java:135), which has no missing parents
2017-06-28 21:28:14 INFO MemoryStore Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 503.6 KB)
2017-06-28 21:28:14 INFO MemoryStore Block broadcast_2_piece0 stored as bytes in memory (estimated size 1845.0 B, free 505.4 KB)
2017-06-28 21:28:14 INFO BlockManagerInfo Added broadcast_2_piece0 in memory on 192.168.56.2:51844 (size: 1845.0 B, free: 457.8 MB)
2017-06-28 21:28:14 INFO SparkContext Created broadcast 2 from broadcast at DAGScheduler.scala:1008
2017-06-28 21:28:14 INFO DAGScheduler Submitting 1 missing tasks from ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ReloadInputReader.java:135)
2017-06-28 21:28:14 INFO YarnClusterScheduler Adding task set 0.0 with 1 tasks
2017-06-28 21:28:14 INFO TaskSetManager Starting task 0.0 in stage 0.0 (TID 0, master.vm.local, partition 0,PROCESS_LOCAL, 2494 bytes)
2017-06-28 21:28:18 INFO BlockManagerInfo Added broadcast_2_piece0 in memory on master.vm.local:40246 (size: 1845.0 B, free: 511.1 MB)
2017-06-28 21:28:18 INFO BlockManagerInfo Added broadcast_1_piece0 in memory on master.vm.local:40246 (size: 34.9 KB, free: 511.1 MB)
2017-06-28 21:28:20 WARN TaskSetManager Lost task 0.0 in stage 0.0 (TID 0, master.vm.local): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:116)
... 12 more
... View more
07-04-2017
12:13 AM
Here is the (wordy) spark-submit command (line breaks added for clarity): spark-submit
--conf spark.driver.extraClassPath=__app__.jar:e2parser-1.0.jar:f18parser-1.0.jar:mdanparser-1.0.jar:regimerecog-1.0.jar:tsvparser-1.0.jar:xmlparser-1.0.jar:log4j.script.properties:common-1.0.jar:aws-java-sdk-1.11.40.jar:aws-java-sdk-s3-1.11.40.jar:jackson-annotations-2.6.5.jar:jackson-core-2.6.5.jar:jackson-databind-2.6.5.jar:jackson-module-paranamer-2.6.5.jar:jackson-module-scala_2.10-2.6.5.jar:miglayout-swing-4.2.jar:commons-configuration-1.6.jar:xml-security-impl-1.0.jar:metrics-core-2.2.0.jar:jcommon-1.0.0.jar:ojdbc6.jar:jopt-simple-4.5.jar:ucanaccess-3.0.1.jar:httpcore-nio-4.4.5.jar:nifi-site-to-site-client-1.0.0.jar:nifi-spark-receiver-1.0.0.jar:commons-compiler-2.7.8.jar:janino-2.7.8.jar:hsqldb-2.3.1.jar:pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:slf4j-api-1.7.21.jar:slf4j-log4j12-1.7.21.jar:slf4j-simple-1.7.21.jar:snappy-java-1.1.1.7.jar:snakeyaml-1.7.jar:/usr/hdp/current/hadoop-client/client/hadoop-common.jar:/usr/hdp/current/hadoop-client/client/hadoop-mapreduce-client-core.jar:/usr/hdp/current/hadoop-client/client/jetty-util.jar:/usr/hdp/current/hadoop-client/client/netty-all-4.0.23.Final.jar:/usr/hdp/current/hadoop-client/client/paranamer-2.3.jar:/usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar:/usr/hdp/current/hadoop-client/lib/httpclient-4.5.2.jar:/usr/hdp/current/hadoop-client/lib/jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-client/lib/joda-time-2.8.1.jar:/usr/hdp/current/hadoop-client/lib/log4j-1.2.17.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar:/usr/hdp/current/hive-client/lib/antlr-runtime-3.4.jar:/usr/hdp/current/hive-client/lib/commons-collections-3.2.2.jar:/usr/hdp/current/hive-client/lib/commons-dbcp-1.4.jar:/usr/hdp/current/hive-client/lib/commons-pool-1.5.4.jar:/usr/hdp/current/hive-client/lib/datanucleus-api-jdo-4.2.1.jar:/usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar:/usr/hdp/current/hive-client/lib/datanucleus-rdbms-4.1.7.jar:/usr/hdp/current/hive-client/lib/geronimo-jta_1.1_spec-1.1.1.jar:/usr/hdp/current/hive-client/lib/hive-exec.jar:/usr/hdp/current/hive-client/lib/hive-jdbc.jar:/usr/hdp/current/hive-client/lib/hive-metastore.jar:/usr/hdp/current/hive-client/lib/jdo-api-3.0.1.jar:/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar \ --conf spark.executor.extraClassPath=<same-as-above> --master yarn --deploy-mode cluster --class <app.class.name> <my-jar-file>
... View more
07-04-2017
12:04 AM
No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.
... View more
06-23-2017
07:21 PM
Well now that's a relief. I think what threw me is (a) we just rewrote our apps to use spark instead of mapreduce, and (b) the class that contains the PhoenixConfigurationUtil is org.apache.phoenix.mapreduce.util so I thought we might be accidentally jumping backwards. Thanks for clarifying.
... View more
06-21-2017
06:45 PM
1 Kudo
Mark, were you at the Hbase Phoenix birds of a feather at the San Jose summit last week? If so, i was sitting 3 seats away. This question or something very similar was asked there.
... View more
06-21-2017
06:36 PM
1 Kudo
I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list. However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.
... View more
06-19-2017
07:08 PM
1 Kudo
We are writing a Java app to do ingesting into Phoenix, this small side application needs to query files we've loaded into Phoenix to identify which files to reprocess by our support team if issues in our rules and custom binary file parsers are fixed and we need to reload data. The application is using spark to run lots of parsers on the cluster in parallel. I was going to start using PhoenixConfigurationUtil class to ultimately provide a SQL query since we are choosing which files to reprocess using a SQL where clause and PhoenixConfigurationUtil.setInputQuery lets us provide a SQL query. It's also one of the few examples that shows how to use Spark and Phoenix that's written in Java, all the other examples are in Scala. We are using HDP 2.5.3, Phoenix 4.7, and spark 1.6. What I noticed in Spark 1.6 and it appears, Spark 2.0 is that all the Scala variations mentioned on the Phoenix site related to Spark that shows calls to phoenixTableAsRDD and phoenixTableAsDataFrame both end up calling PhoenixConfigurationUtil which is part of the org.apache.phoenix.mapreduce.util namespace. So my question is whether anyone recommends using PhoenixConfigurationUtil (which seems to mapreduce based) or one of the scala Spark based, but which look at least right now they call the same PhoenixConfigurationUtil? Are you expecting a pure spark or Spark SQL based solution at some point in the future so that phoenixTableAsRDD or phoenixTableAsDataFrame will no longer call the mapreduce Phoenix code. Or is it expected that those APIs will continue to be joined at the hip for the indefinite future?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Phoenix
-
Apache Spark
03-08-2017
10:41 PM
I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix-4.7.0.2.5.3.0-37-client.jar) I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver. In the YARN log output I can see in directory.info the following entries: lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar 3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar in launch_container.sh I see the following: ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar" So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files: ClassLoader cl = ClassLoader.getSystemClassLoader();
URL[] urls = ((URLClassLoader)cl).getURLs();
for (URL url: urls)
System.out.println(url.getFile()); And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing. As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?
... View more
Labels:
- Labels:
-
Apache Phoenix
-
Apache Spark
01-31-2017
02:42 PM
Duh, looking in the phoenix pom.xml and I fid they are really using...commons-csv version 1.0. Ugh. But now I see that the Phoenix-4.7.0-HBase-1.1 pom.xml is actually for hbase version 1.1.3 and hadoop version 2.5.1 not hadoop version 2.5.3 which is what I'm running. I'll keep checking out releases. According to http://hortonworks.com/products/data-center/hdp/, HDP 2.5.3 should be using Phoenix 4.7.0 and HBase 1.1.2, so I'm a little lost on which version of Phoenix I should be grabbing.
... View more
01-31-2017
02:36 PM
OK, just discovered one issue, I assumed that the /usr/hdp/2.5.3.0-37/phoenix/commons-csv-1.0.jar was the same version as the org.apache.commons.csv.CSVParser bundled in phoenix-4.7.0.2.5.3.0-37-client.jar. However, the sizes of the class files are fairly different between the 2 jar files. Might explain why when I step through the CSVParser code that I step right into the middle of some source code.
... View more