About shikhar_agarwal

shikhar_agarwal · ‎03-22-2017

Hi Artem, I'm currently stuck in a particular use case where in I'm trying to access Hive Table data using spark.read.jdbc as shown below: export SPARK_MAJOR_VERSION=2 spark-shell import org.apache.spark.sql.{DataFrame, Row,SparkSession} val connectionProperties = new java.util.Properties() val hiveQuery = "(SELECT * from hive_table limit 10) tmp" val hiveResult = spark.read.jdbc("jdbc:hive2://hiveServerHostname:10000/hiveDBName;user=hive;password=hive", hiveQuery, connectionProperties).collect() But when I check for the results in hiveResult it's just empty. Could you please suggest what's going on here? I know we can access Hive tables using HiveSesssion and I've successfully tried that but is it possible to run hive queries and access Hive data using the above method?

shikhar_agarwal · ‎02-14-2017

Hi Kyle, This worked for me! But could you please throw some light on how did you get to this solution and what was the actual issue. Thanks Shikhar

shikhar_agarwal · ‎02-14-2017

Hi Prathneesh, Thanks for your help Please find my responses below: 1) Yes able to query a normal external table created using a HDFS file 2) Checked and killed all pxf processes on all servers. Ambari showed PXF agents as down, then restarted all the PXF agents via ambari. 3) Done, same error repeated 4) Yes a total of 11 PXF and 11 Hive clients installed on all the nodes. 5) pxf_service_address is set to hivenode:51200 and also hcatalog_enable=true. Both these properties set under custom hawq-site.xml. I've followed all the above steps but not able to get past the error. Any suggestions/ pointers are welcomed. Thanks Shikhar

shikhar_agarwal · ‎02-13-2017

Thanks Artem

shikhar_agarwal · ‎02-13-2017

I'm seeing the below lines on the host where Hive and Hawq segment is installed under the segment logs in pg_log: 2017-02-13 21:56:39.098974 IST,,,p572743,th-427955904,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210, 2017-02-13 21:56:39.099010 IST,,,p572743,th-427955904,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 10.131.137.16",,,,,,,0,,"network_utils.c",210, I'm wondering why is it showing 127.0.0.1 Thoughts anyone?

shikhar_agarwal · ‎02-13-2017

Hi, I'm getting the below error when trying to read data from a Hive table in Hawq using pxf: PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878) Please note tat I've installed pxf-service on all the nodes successfully and verified that the the pxf service is running on the hosts on port 51200. Please suggest what could be the possible reason for this error. gpadmin=# select count(*) from hcatalog.default.sample_07; ERROR: remote component error (404) from '10.131.137.16:51200': PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878) LINE 1: select count(*) from hcatalog.default.sample_07; gpadmin=# select * from hcatalog.default.sample_07 limit 1; ERROR: remote component error (404) from '10.131.137.16:51200': PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878) LINE 1: select * from hcatalog.default.sample_07 limit 1; I checked that the pxf service is running on all the hosts as below: Last login: Mon Feb 13 18:08:53 2017 from in-cppy5z1.in.kronos.com [root@kvs-in-hadoop04 ~]# netstat -anp | grep 51200 tcp6 0 0 :::51200 :::* LISTEN 519018/java [root@kvs-in-hadoop04 ~]# ps -ef | grep 519018 pxf 519018 1 0 15:57 ? 00:00:31 /usr/jdk64/jdk1.8.0_77/bin/java -Djava.util.logging.config.file=/var/pxf/pxf-service/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx512M -Xss256K -Djava.endorsed.dirs=/var/pxf/pxf-service/endorsed -classpath /var/pxf/pxf-service/bin/bootstrap.jar:/var/pxf/pxf-service/bin/tomcat-juli.jar -Dcatalina.base=/var/pxf/pxf-service -Dcatalina.home=/var/pxf/pxf-service -Djava.io.tmpdir=/var/pxf/pxf-service/temp org.apache.catalina.startup.Bootstrap start root 662021 661908 0 20:41 pts/0 00:00:00 grep --color=auto 519018 I'm also attaching the segment log files from the host where Hive is installed.hawq-2017-02-13-170046.txt @njayakumar @Greg Keys @Gagan Brahmi @Artem Ervits

shikhar_agarwal · ‎01-17-2017

Hi Eyad, I'm trying to execute a Spark2 action using the Shell Action in Oozie. I've tried the exact same steps as above but I'm stuck at the point below: It just keeps on printing this forever in the stdout logs of the Oozie Launcher: >>> Invoking Shell command line now >> Stdoutput Testing Shell Action Heart beat Heart beat Heart beat Heart beat There is no error also, please suggest what am I doing wrong? workflow.txt job-properties.txt echo.txt

shikhar_agarwal · ‎01-15-2017

Hi Artem, Yes i created a new directory under HDFS and included it in the Oozie libpath as below : oozie.libpath=/user/oozie/share/lib/spark2 I included all the jars from under this Spark2 installation directory /usr/hdp/2.5.3.0-37/spark2/jars to the above HDFS directory but still it gives me this error: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1484116726997_0144 finished with failed status org.apache.spark.SparkException: Application application_1484116726997_0144 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:738) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:289) at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:211) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51) at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:59) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Any ideas about what might be causing this error?

shikhar_agarwal · ‎01-13-2017

I'm trying to run the SparkPi example using the example jar in Spark2 and running it through Oozie. Attached are the different configuration files for Oozie: job-properties.txt workflow.xml I've the below directory structure both on local FS and HDFS: +-~/sparkAction/ +-job.properties +-workflow.xml +-lib/ +-spark-examples_2.11-2.0.0.2.5.3.0-37.jar +-spark-hdp-assembly.jar When I run this using this command as the yarn user : oozie job -oozie http://kvs-in-merlin04.int.kronos.com:11000/oozie -config job.properties -run I'm getting the below error: java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$ at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:559) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 7 more The Oozie launcher successfully starts the SparkPi on yarn so that means there are no permission issues. But the Spark program is not finding the SparkSession class!!! Please help...

shikhar_agarwal · ‎12-22-2016

Thanks Sunile, This article is really informative. For now we are trying to run this Spark application as a single run service instead of as a Webservice and are going to give all the arguments through the command line only. We have actually found a different entry point to this service and are now trying to run it through the command line. Since I'm new to scala and spark I'm finding it difficult to start this application on my Hortonworks Sandbox. I've already setup the required Jar files and able to submit the application to Spark2 on Hortonworks 2.5 but I'm stuck on the point where I've to run this on Yarn in cluster mode. I'm getting the below error, I've kept all the required jars under the root directory from where I'm running this command. Please note that I'm able to execute this in local mode but don't know what's the issue with the Yarn cluster mode! 16/12/22 09:25:40 ERROR ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver java.sql.SQLException: No suitable driver The command which I'm using is given below: su root --command "/usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --jars ./dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar --conf spark.scheduler.mode=FIFO --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.executor.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" /root/dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -jdbc.cache.dir=hdfs://sandbox.hortonworks.com:8020/etl"

Online	Offline
Last Visited	‎03-11-2019 09:18 AM

Member Since	‎08-22-2016 04:12 AM
Last Visited	‎03-11-2019 09:18 AM
Posts	30
Kudos received	4

Cloudera Community

Re: query hive tables with spark sql

Re: Getting this Error : PXF service could not be ...

Re: Getting this Error : PXF service could not be ...

Re: Has anyone tried Spark2 jar execution in Yarn ...

Re: Getting this Error : PXF service could not be ...

Getting this Error : PXF service could not be reac...

Re: Run Oozie Shell Action instead of Oozie Spark ...

Re: Has anyone tried Spark2 jar execution in Yarn ...

Has anyone tried Spark2 jar execution in Yarn clus...

Re: Can we have a long running Spark application w...