I am facing an issue while using spark with phoenix and after some search came to know that applying the patches to phoenix can resolve the issue.
can someone explain how to apply the patch or the procedure to do it.
that usually means that you need to rebuild phoenix from the sources. As an alternative you may download more recent binaries from Apache site. Please keep in mind that if you upgrade to more recent version there may be problems with getting back to old version.
Replace phoenix-server.jar in the hbase lib to a new one. If you are using HDP I would recommend to update all soft links to phoenix-sqlline, phoenix-client.jar, etc.
if you are using any vendor distribution, It is not recommended to build phoenix.jar by yourself, as you might not be able to build it with correct versions of other components(hbase/hdfs) used in that distribution.
Stable version of phoenix-spark with above and many other fixes will be shipped in maintenance release of HDP 2.4.1+ and HDP 2.5.
you may either wait for a while or ask for hotfix from your vendor.
As you're probably aware, it's not a problem to write DataFrames to Phoenix, but is a problem when reading from Phoenix into a DataFrame.
@Randy Gelhausen, Thanks for the workaround. But if i want to use the spark history server, what should i do. Because my existing spark installation is controlled by ambari. if i add a lower version of spark manually, how should it work out.
Spark is a client-side library which means you can use it alongside your existing Ambari managed Spark. See the comments here about how this works. Essentially you can setup a custom Spark install in your home directory and use the included spark-shell/spark-submit scripts from that home directory. I don't think it will send finished job data to the Ambari managed Spark history server though.