Member since
12-16-2015
23
Posts
6
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6244 | 09-15-2016 08:19 AM |
04-03-2017
03:49 AM
mapred.child.java.opts seems to be depricated. Below are the values from cluster and the one used in driver code. In Code : ======================= config.set("mapreduce.map.java.opts","-Xmx8192m") config.set("mapreduce.reduce.java.opts","-Xmx8192m"); In Cluster : ================================== <property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx26214m</value>
</property> <property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx13107m</value>
</property>
... View more
03-31-2017
09:10 AM
2017-03-30 14:12:34,329 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.xerces.dom.DeferredDocumentImpl.getNodeObject(Unknown Source)
at org.apache.xerces.dom.DeferredDocumentImpl.synchronizeChildren(Unknown Source)
at org.apache.xerces.dom.DeferredElementImpl.synchronizeChildren(Unknown Source)
at org.apache.xerces.dom.ElementImpl.normalize(Unknown Source)
at org.apache.xerces.dom.ElementImpl.normalize(Unknown Source)
at com.mbrdi.xdl.powertrain.MR_XDLogFileAnalysis_ProcessingLogFiles_mapper.map(MR_XDLogFileAnalysis_ProcessingLogFiles_mapper.java:249)
at com.mbrdi.xdl.powertrain.MR_XDLogFileAnalysis_ProcessingLogFiles_mapper.map(MR_XDLogFileAnalysis_ProcessingLogFiles_mapper.java:46)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) I was getting this issue in one of the cluster while trying to read an xml file in a mapreduce. So, i used below properties in driver code. config.set("mapreduce.map.memory.mb","10240");
config.set("mapreduce.map.java.opts","-Xmx8192m");
config.set("mapreduce.reduce.memory.mb","10240");
config.set("mapreduce.reduce.java.opts","-Xmx8192m");
config.set("mapreduce.task.io.sort.mb","1792");
config.set("yarn.scheduler.minimum-allocation-mb","10240");
config.set("yarn.scheduler.maximum-allocation-mb","184320");
config.set("yarn.nodemanager.resource.memory-mb","184320");
config.set("yarn.app.mapreduce.am.resource.mb","10240");
config.set("yarn.app.mapreduce.am.command-opts","-Xmx8192m"); and the code was working In another cluster, i am not able to fix the error. Is there any property I am missing?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
09-15-2016
01:49 PM
Yes I agree but I am trying to run hive query inside the map method. While using hive context, I am getting error that class is not serializable.
... View more
09-15-2016
08:19 AM
Ok I have resolved the kerberos ticket issue by using below line in my java code. System.setProperty("java.security.auth.login.config","gss-jaas.conf");
System.setProperty("sun.security.jgss.debug","true");
System.setProperty("javax.security.auth.useSubjectCredsOnly","false"); System.setProperty("java.security.krb5.conf","krb5.conf");
But I am getting different error now. WARN scheduler.TaskSetManager: Lost task 1.0 in stage 1.0 (TID 7, SGSCAI0068.inedc.corpintra.net): java.sql.SQLException: Could not open connection to jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/******.inedc.corpintra.net@*****;transportMode=http;httpPath=cliservice: null
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:206)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:178)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at GetDataFromXMLStax.returnXmlTagMatchStatus(GetDataFromXMLStax.java:77)
at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:200)
at SP_LogFileNotifier_FileParsingAndCreateEmailContent$2.call(SP_LogFileNotifier_FileParsingAndCreateEmailContent.java:1)
at org.apache.spark.api.java.JavaPairRDD$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1027)
at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$anon$11.next(Iterator.scala:328)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply$mcV$sp(PairRDDFunctions.scala:1109)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13$anonfun$apply$6.apply(PairRDDFunctions.scala:1108)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1116)
at org.apache.spark.rdd.PairRDDFunctions$anonfun$saveAsHadoopDataset$1$anonfun$13.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:258)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:203)
... 22 more Seems the connection string is not correct. I am calling this connection string from inside of a spark program. Connection con_con1 = DriverManager
.getConnection(
"jdbc:hive2://**********.inedc.corpintra.net:10001/default;principal=hive/**********.inedc.corpintra.net@*****.NET;transportMode=http;httpPath=cliservice"); Please help me with the correct connection string.
... View more
09-14-2016
08:27 AM
The credentials are correct. I am using the same connection string from edge node and it works fine, but not from the spark program.
... View more
09-14-2016
08:08 AM
1 Kudo
I am trying to connect to Hive from Spark inside the map function like below String driver = "org.apache.hive.jdbc.HiveDriver";
Class.forName(driver);
Connection con_con1 = DriverManager
.getConnection(
"jdbc:hive2://server1.net:10001/default;principal=hive/server1.net@abc.xyz.NET;ssl=false;transportMode=http;httpPath=cliservice" , "username", "password"); But i am getting error. javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
07-18-2016
02:30 PM
I understand that. I am asking is there any other way I can read hbase inside spark?
... View more
07-18-2016
02:20 PM
I am using HDP 2.3.2 with spark 1.4.1. As per below link Spark hbase connector works with HDP 2.4.2 onwards. Can someone help with me how can i read hbase in spark using HDP 2.3.2 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/ch_introduction-spark.html
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
07-12-2016
04:56 PM
Hello, I had tried with hive.auto.convert.join.noconditionaltask=false, but didn't work. No table is bucketed.
... View more
07-12-2016
04:30 PM
I am getting the below error while trying to execute a query like "select * from a where a.col1 not in (select b.col1 from b)" Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space at
java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.<init>(String.java:203) at
java.lang.StringBuilder.toString(StringBuilder.java:405) at
org.apache.hadoop.fs.Path.toString(Path.java:390) at
org.apache.hadoop.hive.ql.optimizer.AbstractBucketJoinProc.getBucketFilePathsOfPartition(AbstractBucketJoinProc.java:87) at
org.apache.hadoop.hive.ql.optimizer.metainfo.annotation.OpTraitsRulesProcFactory$TableScanRule.checkBucketedTable(OpTraitsRulesProcFactory.java:147) at
org.apache.hadoop.hive.ql.optimizer.metainfo.annotation.OpTraitsRulesProcFactory$TableScanRule.process(OpTraitsRulesProcFactory.java:174) at
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79) at
org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:56) at
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110) at
org.apache.hadoop.hive.ql.optimizer.metainfo.annotation.AnnotateWithOpTraits.transform(AnnotateWithOpTraits.java:91) at
org.apache.hadoop.hive.ql.parse.TezCompiler.runStatsAnnotation(TezCompiler.java:249) at
org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:122) at
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102) at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10188) at
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211) at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227) at
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:424) at
org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308) at
org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1122) at
org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1170) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059) at
org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213) at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165) at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736) at
org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681) at
org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) I tried increasing value for below properties, but it is not working. Hive is running on TEZ mapreduce.map.memory.mb mapreduce.reduce.memory.mb hive.tez.container.size hive.tez.java.opts
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez