Member since
08-22-2016
30
Posts
4
Kudos Received
0
Solutions
03-22-2017
04:27 AM
Hi Artem, I'm currently stuck in a particular use case where in I'm trying to access Hive Table data using spark.read.jdbc as shown below: export SPARK_MAJOR_VERSION=2 spark-shell import org.apache.spark.sql.{DataFrame, Row,SparkSession} val connectionProperties = new java.util.Properties() val hiveQuery = "(SELECT * from hive_table limit 10) tmp" val hiveResult = spark.read.jdbc("jdbc:hive2://hiveServerHostname:10000/hiveDBName;user=hive;password=hive", hiveQuery, connectionProperties).collect() But when I check for the results in hiveResult it's just empty. Could you please suggest what's going on here? I know we can access Hive tables using HiveSesssion and I've successfully tried that but is it possible to run hive queries and access Hive data using the above method?
... View more
02-14-2017
02:27 PM
Hi Kyle, This worked for me! But could you please throw some light on how did you get to this solution and what was the actual issue. Thanks Shikhar
... View more
02-14-2017
01:54 AM
Hi Prathneesh, Thanks for your help Please find my responses below: 1) Yes able to query a normal external table created using a HDFS file 2) Checked and killed all pxf processes on all servers. Ambari showed PXF agents as down, then restarted all the PXF agents via ambari. 3) Done, same error repeated 4) Yes a total of 11 PXF and 11 Hive clients installed on all the nodes. 5) pxf_service_address is set to hivenode:51200 and also hcatalog_enable=true. Both these properties set under custom hawq-site.xml. I've followed all the above steps but not able to get past the error. Any suggestions/ pointers are welcomed. Thanks Shikhar
... View more
02-13-2017
04:29 PM
Thanks Artem
... View more
02-13-2017
04:28 PM
I'm seeing the below lines on the host where Hive and Hawq segment is installed under the segment logs in pg_log: 2017-02-13 21:56:39.098974 IST,,,p572743,th-427955904,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 127.0.0.1",,,,,,,0,,"network_utils.c",210,
2017-02-13 21:56:39.099010 IST,,,p572743,th-427955904,,,,0,,,seg-10000,,,,,"LOG","00000","Resource manager discovered local host IPv4 address 10.131.137.16",,,,,,,0,,"network_utils.c",210, I'm wondering why is it showing 127.0.0.1 Thoughts anyone?
... View more
02-13-2017
02:48 PM
Hi, I'm getting the below error when trying to read data from a Hive table in Hawq using pxf: PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878) Please note tat I've installed pxf-service on all the nodes successfully and verified that the the pxf service is running on the hosts on port 51200. Please suggest what could be the possible reason for this error. gpadmin=# select count(*) from hcatalog.default.sample_07; ERROR: remote component error (404) from '10.131.137.16:51200': PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878)
LINE 1: select count(*) from hcatalog.default.sample_07; gpadmin=# select * from hcatalog.default.sample_07 limit 1; ERROR: remote component error (404) from '10.131.137.16:51200': PXF service could not be reached. PXF is not running in the tomcat container (libchurl.c:878)
LINE 1: select * from hcatalog.default.sample_07 limit 1; I checked that the pxf service is running on all the hosts as below: Last login: Mon Feb 13 18:08:53 2017 from in-cppy5z1.in.kronos.com [root@kvs-in-hadoop04 ~]# netstat -anp | grep 51200 tcp6 0 0 :::51200 :::* LISTEN 519018/java [root@kvs-in-hadoop04 ~]# ps -ef | grep 519018 pxf 519018 1 0 15:57 ? 00:00:31 /usr/jdk64/jdk1.8.0_77/bin/java -Djava.util.logging.config.file=/var/pxf/pxf-service/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Xmx512M -Xss256K -Djava.endorsed.dirs=/var/pxf/pxf-service/endorsed -classpath /var/pxf/pxf-service/bin/bootstrap.jar:/var/pxf/pxf-service/bin/tomcat-juli.jar -Dcatalina.base=/var/pxf/pxf-service -Dcatalina.home=/var/pxf/pxf-service -Djava.io.tmpdir=/var/pxf/pxf-service/temp org.apache.catalina.startup.Bootstrap start
root 662021 661908 0 20:41 pts/0 00:00:00 grep --color=auto 519018 I'm also attaching the segment log files from the host where Hive is installed.hawq-2017-02-13-170046.txt @njayakumar @Greg Keys @Gagan Brahmi @Artem Ervits
... View more
Labels:
- Labels:
-
Apache Hive
01-17-2017
04:55 AM
Hi Eyad, I'm trying to execute a Spark2 action using the Shell Action in Oozie. I've tried the exact same steps as above but I'm stuck at the point below: It just keeps on printing this forever in the stdout logs of the Oozie Launcher: >>> Invoking Shell command line now >>
Stdoutput Testing Shell Action
Heart beat
Heart beat
Heart beat
Heart beat There is no error also, please suggest what am I doing wrong? workflow.txt job-properties.txt echo.txt
... View more
01-15-2017
09:14 AM
Hi Artem, Yes i created a new directory under HDFS and included it in the Oozie libpath as below : oozie.libpath=/user/oozie/share/lib/spark2 I included all the jars from under this Spark2 installation directory /usr/hdp/2.5.3.0-37/spark2/jars to the above HDFS directory but still it gives me this error: Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Application application_1484116726997_0144 finished with failed status
org.apache.spark.SparkException: Application application_1484116726997_0144 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1169)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:289)
at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:211)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:51)
at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:59)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:242)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Any ideas about what might be causing this error?
... View more
01-13-2017
08:50 AM
1 Kudo
I'm trying to run the SparkPi example using the example jar in Spark2 and running it through Oozie. Attached are the different configuration files for Oozie: job-properties.txt workflow.xml I've the below directory structure both on local FS and HDFS: +-~/sparkAction/ +-job.properties +-workflow.xml +-lib/ +-spark-examples_2.11-2.0.0.2.5.3.0-37.jar +-spark-hdp-assembly.jar When I run this using this command as the yarn user : oozie job -oozie http://kvs-in-merlin04.int.kronos.com:11000/oozie -config job.properties -run I'm getting the below error: java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:28)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.yarn.ApplicationMaster$anon$2.run(ApplicationMaster.scala:559)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more The Oozie launcher successfully starts the SparkPi on yarn so that means there are no permission issues. But the Spark program is not finding the SparkSession class!!! Please help...
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Spark
-
Apache YARN
12-22-2016
10:28 AM
Thanks Sunile, This article is really informative. For now we are trying to run this Spark application as a single run service instead of as a Webservice and are going to give all the arguments through the command line only. We have actually found a different entry point to this service and are now trying to run it through the command line.
Since I'm new to scala and spark I'm finding it difficult to start this application on my Hortonworks Sandbox. I've already setup the required Jar files and able to submit the application to Spark2 on Hortonworks 2.5 but I'm stuck on the point where I've to run this on Yarn in cluster mode. I'm getting the below error, I've kept all the required jars under the root directory from where I'm running this command. Please note that I'm able to execute this in local mode but don't know what's the issue with the Yarn cluster mode! 16/12/22 09:25:40 ERROR ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver
java.sql.SQLException: No suitable driver The command which I'm using is given below: su root --command "/usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --jars ./dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar --conf spark.scheduler.mode=FIFO --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.executor.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" /root/dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -jdbc.cache.dir=hdfs://sandbox.hortonworks.com:8020/etl"
... View more