Member since
05-11-2016
35
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1626 | 12-06-2016 04:19 PM |
07-04-2017
12:28 AM
FYI, I added some code that in other parts of app forced the Phoenix JDBC driver to load, but it doesn't seem to be working in this context. The call to ConnectionUtil.getInputConnection(conf, props) is the code I tracked down in the stack trace below that builds the connection to verify I was getting the correct connection (it is the valid JDBC URL). final Configuration configuration = HBaseConfiguration.create();
configuration.set(HConstants.ZOOKEEPER_CLIENT_PORT, "2181");
configuration.set(HConstants.ZOOKEEPER_ZNODE_PARENT, quorumParentNode);
configuration.set(HConstants.ZOOKEEPER_QUORUM, quorum);
Properties props = new Properties();
Connection conn = ConnectionUtil.getInputConnection(configuration, props);
log.info("Connection: " + conn.getMetaData().getURL());
log.info("Ingest DBC: " + ingestDbConn);
log.info("driver host name: " + driverHost);
log.info("Zookeeper quorum: " + quorum);
log.info("Reload query: " + sqlQuery);
PhoenixConfigurationUtil.setPhysicalTableName(configuration, FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setInputTableName(configuration , FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setOutputTableName(configuration , FileContentsWritable.TABLE_NAME);
PhoenixConfigurationUtil.setInputQuery(configuration, sqlQuery);
PhoenixConfigurationUtil.setInputClass(configuration , FileContentsWritable.class);
PhoenixConfigurationUtil.setUpsertColumnNames(configuration, FileContentsWritable.COLUMN_NAMES);
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
@SuppressWarnings("unchecked")
JavaPairRDD<NullWritable, FileContentsWritable> fileContentsRDD = sparkContext.newAPIHadoopRDD(configuration, PhoenixInputFormat.class, NullWritable.class, FileContentsWritable.class);
fileContentsRDD.foreach(rdd ->
{
Class.forName("org.apache.phoenix.jdbc.PhoenixDriver");
FileContentsBean fileContentsBean = rdd._2.getFileContentsBean();
:
:
};
... View more
07-04-2017
12:19 AM
Here is the output: 2017-06-28 21:28:13 INFO ReloadInputReader Connection: jdbc:phoenix:master:2181:/hbase-unsecure;
2017-06-28 21:28:13 INFO ReloadInputReader Ingest DBC: jdbc:phoenix:master:2181:/hbase-unsecure
2017-06-28 21:28:13 INFO ReloadInputReader driver host name: master.vm.local
2017-06-28 21:28:13 INFO ReloadInputReader Zookeeper quorum: master
2017-06-28 21:28:13 INFO ReloadInputReader Reload query: SELECT FILE_NAME, TM, DATASET, WORKER_NAME, FILE_CONTENTS FROM JOBS.FILE_CONTENTS WHERE FILE_NAME in (SELECT FILE_NAME FROM JOBS.file_loaded WHERE file_name='B162836D20090316T0854.AAD')
2017-06-28 21:28:13 INFO MemoryStore Block broadcast_1 stored as values in memory (estimated size 428.6 KB, free 465.7 KB)
2017-06-28 21:28:13 INFO MemoryStore Block broadcast_1_piece0 stored as bytes in memory (estimated size 34.9 KB, free 500.7 KB)
2017-06-28 21:28:13 INFO BlockManagerInfo Added broadcast_1_piece0 in memory on 192.168.56.2:51844 (size: 34.9 KB, free: 457.8 MB)
2017-06-28 21:28:13 INFO SparkContext Created broadcast 1 from newAPIHadoopRDD at ReloadInputReader.java:135
2017-06-28 21:28:14 INFO SparkContext Starting job: foreach at ReloadInputReader.java:137
2017-06-28 21:28:14 INFO DAGScheduler Got job 0 (foreach at ReloadInputReader.java:137) with 1 output partitions
2017-06-28 21:28:14 INFO DAGScheduler Final stage: ResultStage 0 (foreach at ReloadInputReader.java:137)
2017-06-28 21:28:14 INFO DAGScheduler Parents of final stage: List()
2017-06-28 21:28:14 INFO DAGScheduler Missing parents: List()
2017-06-28 21:28:14 INFO DAGScheduler Submitting ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ReloadInputReader.java:135), which has no missing parents
2017-06-28 21:28:14 INFO MemoryStore Block broadcast_2 stored as values in memory (estimated size 2.9 KB, free 503.6 KB)
2017-06-28 21:28:14 INFO MemoryStore Block broadcast_2_piece0 stored as bytes in memory (estimated size 1845.0 B, free 505.4 KB)
2017-06-28 21:28:14 INFO BlockManagerInfo Added broadcast_2_piece0 in memory on 192.168.56.2:51844 (size: 1845.0 B, free: 457.8 MB)
2017-06-28 21:28:14 INFO SparkContext Created broadcast 2 from broadcast at DAGScheduler.scala:1008
2017-06-28 21:28:14 INFO DAGScheduler Submitting 1 missing tasks from ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at ReloadInputReader.java:135)
2017-06-28 21:28:14 INFO YarnClusterScheduler Adding task set 0.0 with 1 tasks
2017-06-28 21:28:14 INFO TaskSetManager Starting task 0.0 in stage 0.0 (TID 0, master.vm.local, partition 0,PROCESS_LOCAL, 2494 bytes)
2017-06-28 21:28:18 INFO BlockManagerInfo Added broadcast_2_piece0 in memory on master.vm.local:40246 (size: 1845.0 B, free: 511.1 MB)
2017-06-28 21:28:18 INFO BlockManagerInfo Added broadcast_1_piece0 in memory on master.vm.local:40246 (size: 34.9 KB, free: 511.1 MB)
2017-06-28 21:28:20 WARN TaskSetManager Lost task 0.0 in stage 0.0 (TID 0, master.vm.local): java.lang.RuntimeException: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:134)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.createRecordReader(PhoenixInputFormat.java:71)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:156)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:64)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: No suitable driver found for jdbc:phoenix:master:2181:/hbase-unsecure;
at java.sql.DriverManager.getConnection(DriverManager.java:689)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.PhoenixInputFormat.getQueryPlan(PhoenixInputFormat.java:116)
... 12 more
... View more
07-04-2017
12:13 AM
Here is the (wordy) spark-submit command (line breaks added for clarity): spark-submit
--conf spark.driver.extraClassPath=__app__.jar:e2parser-1.0.jar:f18parser-1.0.jar:mdanparser-1.0.jar:regimerecog-1.0.jar:tsvparser-1.0.jar:xmlparser-1.0.jar:log4j.script.properties:common-1.0.jar:aws-java-sdk-1.11.40.jar:aws-java-sdk-s3-1.11.40.jar:jackson-annotations-2.6.5.jar:jackson-core-2.6.5.jar:jackson-databind-2.6.5.jar:jackson-module-paranamer-2.6.5.jar:jackson-module-scala_2.10-2.6.5.jar:miglayout-swing-4.2.jar:commons-configuration-1.6.jar:xml-security-impl-1.0.jar:metrics-core-2.2.0.jar:jcommon-1.0.0.jar:ojdbc6.jar:jopt-simple-4.5.jar:ucanaccess-3.0.1.jar:httpcore-nio-4.4.5.jar:nifi-site-to-site-client-1.0.0.jar:nifi-spark-receiver-1.0.0.jar:commons-compiler-2.7.8.jar:janino-2.7.8.jar:hsqldb-2.3.1.jar:pentaho-aggdesigner-algorithm-5.1.5-jhyde.jar:slf4j-api-1.7.21.jar:slf4j-log4j12-1.7.21.jar:slf4j-simple-1.7.21.jar:snappy-java-1.1.1.7.jar:snakeyaml-1.7.jar:/usr/hdp/current/hadoop-client/client/hadoop-common.jar:/usr/hdp/current/hadoop-client/client/hadoop-mapreduce-client-core.jar:/usr/hdp/current/hadoop-client/client/jetty-util.jar:/usr/hdp/current/hadoop-client/client/netty-all-4.0.23.Final.jar:/usr/hdp/current/hadoop-client/client/paranamer-2.3.jar:/usr/hdp/current/hadoop-client/lib/commons-cli-1.2.jar:/usr/hdp/current/hadoop-client/lib/httpclient-4.5.2.jar:/usr/hdp/current/hadoop-client/lib/jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-client/lib/joda-time-2.8.1.jar:/usr/hdp/current/hadoop-client/lib/log4j-1.2.17.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/protobuf-java-2.5.0.jar:/usr/hdp/current/hive-client/lib/antlr-runtime-3.4.jar:/usr/hdp/current/hive-client/lib/commons-collections-3.2.2.jar:/usr/hdp/current/hive-client/lib/commons-dbcp-1.4.jar:/usr/hdp/current/hive-client/lib/commons-pool-1.5.4.jar:/usr/hdp/current/hive-client/lib/datanucleus-api-jdo-4.2.1.jar:/usr/hdp/current/hive-client/lib/datanucleus-core-4.1.6.jar:/usr/hdp/current/hive-client/lib/datanucleus-rdbms-4.1.7.jar:/usr/hdp/current/hive-client/lib/geronimo-jta_1.1_spec-1.1.1.jar:/usr/hdp/current/hive-client/lib/hive-exec.jar:/usr/hdp/current/hive-client/lib/hive-jdbc.jar:/usr/hdp/current/hive-client/lib/hive-metastore.jar:/usr/hdp/current/hive-client/lib/jdo-api-3.0.1.jar:/usr/hdp/current/hive-webhcat/share/hcatalog/hive-hcatalog-core.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.2.2.5.3.0-37-hadoop2.7.3.2.5.3.0-37.jar \ --conf spark.executor.extraClassPath=<same-as-above> --master yarn --deploy-mode cluster --class <app.class.name> <my-jar-file>
... View more
07-04-2017
12:04 AM
No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.
... View more
06-21-2017
06:45 PM
1 Kudo
Mark, were you at the Hbase Phoenix birds of a feather at the San Jose summit last week? If so, i was sitting 3 seats away. This question or something very similar was asked there.
... View more
06-21-2017
06:36 PM
1 Kudo
I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list. However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.
... View more
03-08-2017
10:41 PM
I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix-4.7.0.2.5.3.0-37-client.jar) I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver. In the YARN log output I can see in directory.info the following entries: lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar 3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar in launch_container.sh I see the following: ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar" So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files: ClassLoader cl = ClassLoader.getSystemClassLoader();
URL[] urls = ((URLClassLoader)cl).getURLs();
for (URL url: urls)
System.out.println(url.getFile()); And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing. As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?
... View more
Labels:
- Labels:
-
Apache Phoenix
-
Apache Spark
12-06-2016
04:19 PM
I finally tracked it down, the clue which I didn't notice until I pasted the error message above was that the error was getting a connection refused on the DistributedCacheClient. I set the properties in the GetHBase | DistributedCacheClient, but hadn't created the DistributedCacheServer in the global settings for NiFi. Once I did that, then I hit HBase errors (to make it more fun HBase happened to have been down while I was testing). But once I fixed that, everything started to work.
... View more
12-06-2016
02:57 AM
I'm trying to use the NiFi GetHBase processor and so far all I've gotten is errors. I'm using the configuration settings out of /etc/hbase/conf/hive-site.xml which has the zookeeper quorum (master.fulldomain.name,edge.fulldomain.name, port number 2181, and zookeeper znode parent of /hbase-unsecure) all correctly specified. I noticed as I cut and pasted this that it's getting a connection refused. I'm on a 2 VM development system so made sure I could connect from the nifi VM to the master node which is running zookeeper and hbase / phoenix. I've created the table in HBase using Phoenix as JOBS.JOB_REQUEST. I'm referencing that table as JOBS:JOB_REQUEST in the GetHBase processor. Anyone have any thoughts as to what might be going on. It seems it would be a nifi server to HBase connection issue. Also, is there any way to either (a) increase the logging of a particular processor or (b) debug a processor that's baked into NiFi (not that I really want to, but I would really like to get this particular processor working). Here's the log contents from /var/log/nifi/nifi-app.log related to this error: 2016-12-05 18:45:39,660 ERROR [StandardProcessScheduler Thread-3] org.apache.nifi.hbase.GetHBase GetHBase[id=bf278170-0158-1000-6fb5-ae093b1db722] GetHBase[id=bf278170-0158-1000-6fb5-ae093b1db722] failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30 seconds: java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.
2016-12-05 18:45:39,662 ERROR [StandardProcessScheduler Thread-1] org.apache.nifi.engine.FlowEngine A flow controller task execution stopped abnormally
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:1.8.0_77]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[na:1.8.0_77]
at org.apache.nifi.engine.FlowEngine.afterExecute(FlowEngine.java:100) ~[na:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1150) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.GeneratedMethodAccessor437.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:137) ~[na:na]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:125) ~[na:na]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotations(ReflectionUtils.java:70) ~[na:na]
at org.apache.nifi.util.ReflectionUtils.invokeMethodsWithAnnotation(ReflectionUtils.java:47) ~[na:na]
at org.apache.nifi.controller.StandardProcessorNode$1$1.call(StandardProcessorNode.java:1244) ~[na:na]
at org.apache.nifi.controller.StandardProcessorNode$1$1.call(StandardProcessorNode.java:1240) ~[na:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
... 2 common frames omitted
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.Net.connect0(Native Method) ~[na:1.8.0_77]
at sun.nio.ch.Net.connect(Net.java:454) ~[na:1.8.0_77]
at sun.nio.ch.Net.connect(Net.java:446) ~[na:1.8.0_77]
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648) ~[na:1.8.0_77]
at java.nio.channels.SocketChannel.open(SocketChannel.java:189) ~[na:1.8.0_77]
at org.apache.nifi.distributed.cache.client.StandardCommsSession.<init>(StandardCommsSession.java:49) ~[na:na]
at org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService.createCommsSession(DistributedMapCacheClientService.java:234) ~[na:na]
at org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService.leaseCommsSession(DistributedMapCacheClientService.java:249) ~[na:na]
at org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService.withCommsSession(DistributedMapCacheClientService.java:303) ~[na:na]
at org.apache.nifi.distributed.cache.client.DistributedMapCacheClientService.get(DistributedMapCacheClientService.java:184) ~[na:na]
at sun.reflect.GeneratedMethodAccessor438.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
at org.apache.nifi.controller.service.StandardControllerServiceProvider$1.invoke(StandardControllerServiceProvider.java:177) ~[na:na]
at com.sun.proxy.$Proxy138.get(Unknown Source) ~[na:na]
at org.apache.nifi.hbase.GetHBase.getState(GetHBase.java:483) ~[na:na]
at org.apache.nifi.hbase.GetHBase.parseColumns(GetHBase.java:212) ~[na:na]
... 15 common frames omitted
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi
- « Previous
-
- 1
- 2
- Next »