Member since
12-06-2019
6
Posts
0
Kudos Received
0
Solutions
04-30-2020
10:02 PM
Hi @sappu , I want to dump Spark Dataframe data to Hive table using the Hive Warehouse Connector. I am running a Spark application from spark-shell. DF.write.format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR).mode(SaveMode.Overwrite).option("table","Demo").save() Sometimes it loads data into Hive table and sometimes throws below exception: Caused by: java.lang.SecurityException: class "org.codehaus.janino.JaninoRuntimeException"'s signer information does not match signer information of other classes in the same package I have set below spark classpath: export CLASSPATH=/usr/hdp/3.0.1.0-187/spark2/jars/spark-sql_2.11-2.3.1.3.0.1.0-187.jar:/usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar Still, it throws an error.
... View more
02-21-2020
02:33 AM
Hi,
I want to process data from the HDFS file using the Spark Java code. While processing files, I am performing simple transformation such as replace a new line with space and find patterns using regex from the file. I used the wholeTextFiles method to read data from HDFS files but it took 2 hours to process only 4 MB files. I tried to increase spark executor memory to 15g with 4 executor instances still it took 2 hours.
I have 1 master with 56GiB memory,8 cores, and 3 workers with 28 GiB memory,8 cores.
How to improve the performance of the spark job using the above nodes configurations.
Thanks,
... View more
- Tags:
- HDFS
- performance
- Spark
Labels:
- Labels:
-
Apache Spark
-
HDFS
12-09-2019
09:56 PM
I added piggybank.jar in Pig script, REGISTER hdfs://sandbox-hdp.hrotonworks.com:8020/lib/piggybank.jar; I changed this path to local path : REGISTER /usr/hdp/3.0.1.0-78/pig/lib/piggybank.jar; after that the issue is resolved.
... View more
12-06-2019
06:47 AM
Hi,
I am running Pig action from Oozie workflow But Pig action ended with below error,So I tried changing yarn.timeline-service.version from 2.0f to 1.5f still it throws below error,
2019-12-06 13:54:21,074 [main] WARN org.apache.pig.PigServer - Error posting to ATS:
org.apache.hadoop.service.ServiceStateException: java.io.IOException: Timeline V1 client is not properly configured. Either timeline service is not enabled or version is not set to 1.x
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:173)
at org.apache.pig.backend.hadoop.PigATSClient.<init>(PigATSClient.java:68)
at org.apache.pig.backend.hadoop.PigATSClient.getInstance(PigATSClient.java:57)
at org.apache.pig.PigServer.<init>(PigServer.java:260)
at org.apache.pig.PigServer.<init>(PigServer.java:219)
at org.apache.pig.tools.grunt.Grunt.<init>(Grunt.java:46)
at org.apache.pig.Main.run(Main.java:495)
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:273)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:216)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:78)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:231)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:465)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.io.IOException: Timeline V1 client is not properly configured. Either timeline service is not enabled or version is not set to 1.x
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:100)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
... 24 more
Then I set yarn.timeline-service.enabled=false after setting this property above error gets resolve but Pig action stuck in running state. Below is the pig job log:
2019-12-06 14:15:42,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
2019-12-06 14:15:42,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1575641509651_0005]
Heart beat
Heart beat
Heart beat
Heart beat
2019-12-06 14:17:43,613 [HiveClientCache-cleaner-0] INFO org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Closed a connection to metastore, current connections: 1
2019-12-06 14:17:43,613 [HiveClientCache-cleaner-0] INFO org.apache.hadoop.hive.metastore.HiveMetaStoreClient - Closed a connection to metastore, current connections: 0
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
End of LogType:stdout.This log file belongs to a running container (container_e24_1575641509651_0004_01_000002) and so may not be complete.
It seems like Pig job lost connection to the HiveMetaStoreClient but Hive services are up and running still it stuck in running state.
Thanks,
... View more
Labels:
- Labels:
-
Apache Pig
-
Apache YARN