<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Phoenix / HBase problem with HDP 2.3.4 and Java in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149844#M20313</link>
    <description>&lt;P&gt;Hello I actually have couple of questions regarding phoenix-spark on HBase&lt;/P&gt;&lt;P&gt;I am on HDP 2.3.4, therefore with phoenix 4.4.0.2.3.4.0-3485, and Spark 1.5.2&lt;/P&gt;&lt;P&gt;First question regarding read, I am trying out this &lt;A href="https://community.hortonworks.com/answers/4025/view.html"&gt;very nice example here&lt;/A&gt; , but I am getting (following from spark-shell, but also got the same in java):&lt;/P&gt;&lt;PRE&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14, sandbox.hortonworks.com): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
        at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
&lt;/PRE&gt;&lt;P&gt;Which seems to be an issue with the particular spark + phoenix combo on HDP 2.3.4 according to &lt;A href="https://issues.apache.org/jira/browse/PHOENIX-2287"&gt;PHOENIX-2287&lt;/A&gt;, and it is fixed in phoenix 4.5.3+.&lt;/P&gt;&lt;P&gt;Is there any other way to get round this or have to wait until Hortonworks do an upgrade?&lt;/P&gt;&lt;P&gt;Secondly due to a decision made high up in my organization to not use Scala, I can only use Java and it seems that &lt;A href="https://phoenix.apache.org/phoenix_spark.html"&gt;this example from phoenix (in particular the saveToPhoenix method)&lt;/A&gt; :&lt;/P&gt;&lt;PRE&gt;sc.parallelize(dataSet)
  .saveToPhoenix(
    "OUTPUT_TEST_TABLE",
    Seq("ID","COL1","COL2"),
    zkUrl = Some("phoenix-server:2181")
  )
&lt;/PRE&gt;&lt;P&gt;is &lt;A href="http://stackoverflow.com/questions/30639659/apache-phoenix-4-3-1-and-4-4-0-hbase-0-98-on-spark-1-3-1-classnotfoundexceptio"&gt;not available to java according this thread on SO&lt;/A&gt;.  Is this true?&lt;/P&gt;&lt;P&gt;Anyway I tried with Java by firstly creating this simple table in phoenix:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE EXAMPLE1 (id BIGINT NOT NULL PRIMARY KEY, COLUMN1 VARCHAR)&lt;/PRE&gt;&lt;P&gt;And then run the following code java to write the dataframe:&lt;/P&gt;&lt;PRE&gt;DataFrame writeDF = df.withColumnRenamed("Key", "id")
&lt;/PRE&gt;
&lt;PRE&gt;	.withColumnRenamed("somecolumn", "COLUMN1")
        .selectExpr(new String[]{"id", "COLUMN1"})
// doesnt work even if I renamed with prefix "0." with any of the following:
//        .withColumnRenamed("COLUMN1", "0.COLUMN1")        
//        .withColumnRenamed("COLUMN1", "`0.COLUMN1`")
;

df.write()
        .format("org.apache.phoenix.spark")
        .options( ImmutableMap.of("table" , "EXAMPLE1",
                "zkUrl", "sandbox:2181:/hbase-unsecure"))
        .mode(SaveMode.Overwrite)
        .save();
&lt;/PRE&gt;&lt;P&gt;But I am getting these:&lt;/P&gt;&lt;PRE&gt;org.apache.spark.sql.AnalysisException: cannot resolve '0.COLUMN1' given input columns id, 0.COLUMN1;


	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)


&lt;/PRE&gt;&lt;P&gt; In any case is there a way or example to read / write DataFrame via phoenix for the specific versions of HDP / Phoenix using java?&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
    <pubDate>Fri, 19 Feb 2016 22:48:01 GMT</pubDate>
    <dc:creator>David_Tam</dc:creator>
    <dc:date>2016-02-19T22:48:01Z</dc:date>
    <item>
      <title>Phoenix / HBase problem with HDP 2.3.4 and Java</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149844#M20313</link>
      <description>&lt;P&gt;Hello I actually have couple of questions regarding phoenix-spark on HBase&lt;/P&gt;&lt;P&gt;I am on HDP 2.3.4, therefore with phoenix 4.4.0.2.3.4.0-3485, and Spark 1.5.2&lt;/P&gt;&lt;P&gt;First question regarding read, I am trying out this &lt;A href="https://community.hortonworks.com/answers/4025/view.html"&gt;very nice example here&lt;/A&gt; , but I am getting (following from spark-shell, but also got the same in java):&lt;/P&gt;&lt;PRE&gt;org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 14, sandbox.hortonworks.com): java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericMutableRow cannot be cast to org.apache.spark.sql.Row
        at org.apache.spark.sql.SQLContext$$anonfun$7.apply(SQLContext.scala:445)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:215)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
&lt;/PRE&gt;&lt;P&gt;Which seems to be an issue with the particular spark + phoenix combo on HDP 2.3.4 according to &lt;A href="https://issues.apache.org/jira/browse/PHOENIX-2287"&gt;PHOENIX-2287&lt;/A&gt;, and it is fixed in phoenix 4.5.3+.&lt;/P&gt;&lt;P&gt;Is there any other way to get round this or have to wait until Hortonworks do an upgrade?&lt;/P&gt;&lt;P&gt;Secondly due to a decision made high up in my organization to not use Scala, I can only use Java and it seems that &lt;A href="https://phoenix.apache.org/phoenix_spark.html"&gt;this example from phoenix (in particular the saveToPhoenix method)&lt;/A&gt; :&lt;/P&gt;&lt;PRE&gt;sc.parallelize(dataSet)
  .saveToPhoenix(
    "OUTPUT_TEST_TABLE",
    Seq("ID","COL1","COL2"),
    zkUrl = Some("phoenix-server:2181")
  )
&lt;/PRE&gt;&lt;P&gt;is &lt;A href="http://stackoverflow.com/questions/30639659/apache-phoenix-4-3-1-and-4-4-0-hbase-0-98-on-spark-1-3-1-classnotfoundexceptio"&gt;not available to java according this thread on SO&lt;/A&gt;.  Is this true?&lt;/P&gt;&lt;P&gt;Anyway I tried with Java by firstly creating this simple table in phoenix:&lt;/P&gt;&lt;PRE&gt;CREATE TABLE EXAMPLE1 (id BIGINT NOT NULL PRIMARY KEY, COLUMN1 VARCHAR)&lt;/PRE&gt;&lt;P&gt;And then run the following code java to write the dataframe:&lt;/P&gt;&lt;PRE&gt;DataFrame writeDF = df.withColumnRenamed("Key", "id")
&lt;/PRE&gt;
&lt;PRE&gt;	.withColumnRenamed("somecolumn", "COLUMN1")
        .selectExpr(new String[]{"id", "COLUMN1"})
// doesnt work even if I renamed with prefix "0." with any of the following:
//        .withColumnRenamed("COLUMN1", "0.COLUMN1")        
//        .withColumnRenamed("COLUMN1", "`0.COLUMN1`")
;

df.write()
        .format("org.apache.phoenix.spark")
        .options( ImmutableMap.of("table" , "EXAMPLE1",
                "zkUrl", "sandbox:2181:/hbase-unsecure"))
        .mode(SaveMode.Overwrite)
        .save();
&lt;/PRE&gt;&lt;P&gt;But I am getting these:&lt;/P&gt;&lt;PRE&gt;org.apache.spark.sql.AnalysisException: cannot resolve '0.COLUMN1' given input columns id, 0.COLUMN1;


	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:56)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:53)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:293)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)


&lt;/PRE&gt;&lt;P&gt; In any case is there a way or example to read / write DataFrame via phoenix for the specific versions of HDP / Phoenix using java?&lt;/P&gt;&lt;P&gt;Thank you in advance!&lt;/P&gt;</description>
      <pubDate>Fri, 19 Feb 2016 22:48:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149844#M20313</guid>
      <dc:creator>David_Tam</dc:creator>
      <dc:date>2016-02-19T22:48:01Z</dc:date>
    </item>
    <item>
      <title>Re: Phoenix / HBase problem with HDP 2.3.4 and Java</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149845#M20314</link>
      <description>&lt;P&gt;Have you seen this? &lt;A href="https://phoenix.apache.org/phoenix_spark.html" target="_blank"&gt;https://phoenix.apache.org/phoenix_spark.html&lt;/A&gt; there's a pyspark example but alas no java. &lt;/P&gt;</description>
      <pubDate>Sun, 21 Feb 2016 06:10:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149845#M20314</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-02-21T06:10:44Z</dc:date>
    </item>
    <item>
      <title>Re: Phoenix / HBase problem with HDP 2.3.4 and Java</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149846#M20315</link>
      <description>&lt;P&gt;ok at the end I have found a way to both read and write from phoenix into Java spark app.:&lt;/P&gt;&lt;PRE&gt;// read
// using jdbc - which isnt the best way of doing this as there is no push-down optimization...
DataFrame dfFromHbase = SPARK_MANAGED_RESOURCE.getSparkSqlContext().read().format("jdbc")
        .options(ImmutableMap.of(
                "driver" , "org.apache.phoenix.jdbc.PhoenixDriver", "url",
                "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure",
                "dbtable", tableName)).load();

// write
// there is no column family specify - it uses whatever that has been linked up in the phoenix table
dfICreated.write().format("org.apache.phoenix.spark")
        .mode(SaveMode.Overwrite)
        .options(ImmutableMap.of(
                "zkUrl", "sandbox:2181:/hbase-unsecure",
                "table", tableName)).save();
&lt;/PRE&gt;&lt;P&gt;These are for sandbox 2.3.4.  I hope hortonworks will upgrade to latest phoenix (4.6 or 4.7?) soon as the read would provide push down query, which I dont think the jdbc driver is doing at the moment...&lt;/P&gt;</description>
      <pubDate>Mon, 29 Feb 2016 18:00:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Phoenix-HBase-problem-with-HDP-2-3-4-and-Java/m-p/149846#M20315</guid>
      <dc:creator>David_Tam</dc:creator>
      <dc:date>2016-02-29T18:00:09Z</dc:date>
    </item>
  </channel>
</rss>

