Member since
12-14-2015
21
Posts
15
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1278 | 12-15-2016 04:05 PM | |
4248 | 07-13-2016 09:39 AM | |
14548 | 01-11-2016 05:07 PM |
12-30-2016
05:28 PM
@Huahua Wei this is a spark specific configuration is not in hive-site.xml! set it in your application or from Ambari
... View more
12-15-2016
04:05 PM
2 Kudos
It looks like your RM doesn't have write access to the root znode, and it can't create /yarn-leader-election Please ensure that you have proper ACL on /
... View more
12-12-2016
10:36 PM
Thanks @bikas shc has it's own problems and I couldn't find it useful yet! please let me know if you've tried it? Here are the problems I faced: 1- The shc tech preview is released with HDP 2.5 but the Spark versions are slightly different! HDP 2.5 has Spark 1.6.2 but shc is built for 1.6.1; I actually tried to upgrade the Spark and rebuilt shc but it failed in few test cases! 2- Spark 1.6.2 it built based on Scala 2.10 as you can find http://repo.hortonworks.com/content/repositories/releases/org/apache/spark/spark-core_2.10 but shc for Spark 1.6.1 is built based on Scala 2.11 which sounds weird! http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/shc-core/1.0.1-1.6-s_2.10/ The above link looks like Scala version is 2.10 but if you try it you will notice it was built and released based on 2.11!!!
... View more
12-12-2016
10:24 PM
Thanks for the answer @Josh Elser My problem is not whether this is the best practice for handling authentication or not First I want my code work! I'm trying to find a way to make newAPIHadoopRDD authenticate to AD on the executors but there is no success yet!
... View more
12-12-2016
03:40 PM
1 Kudo
Hi, Does anyone know how to let mr Spark talk to HBase on a secured cluster? I have a Keberized Hadoop cluster (HDP 2.5) and want to scan HBase tables from Spark using newAPIHadoopRDD! Spark application on local mode can easily authenticate to AD using a keytab and communicate with HBase! When I run it on YARN, driver can authenticate with AD and get the tgt in 2 ways: using --keytab --principal on spark-submit with the help of UserGroupInformation.loginUserFromKeytabAndReturnUGI But executors fail and can't get the Kerberos tgt although the keytab is available to Spark on all the nodes! The problem is executors are handled by newAPIHadoopRDD and I can't find a way to make them use my user and its headless keytab! Then I get the following famous exception on all the executors: avax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:611)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:156)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:737)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:734)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:734)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887)
at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1199)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:32741)
at org.apache.hadoop.hbase.client.ClientSmallScanner$SmallScannerCallable.call(ClientSmallScanner.java:201)
at org.apache.hadoop.hbase.client.ClientSmallScanner$SmallScannerCallable.call(ClientSmallScanner.java:180)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:364)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:338)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:65)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122)
at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192)
... View more
Labels:
07-13-2016
09:39 AM
2 Kudos
Now I know what is going on, there is a ticket in progress at moment called "Enable OrcRelation even when connecting via spark thrift server", https://issues.apache.org/jira/browse/SPARK-12998 But you can find it as a release improvement on HDP 2.4.2, https://github.com/hortonworks/spark-release/blob/HDP-2.4.2.0-tag/HDP-CHANGES.txt To conclude, Spark in Hortonworks could be slightly different from main Spark! Now this is a workaround to disable the improvement and directly read the schema from Hive metastore: sqlContext.setConf("spark.sql.hive.convertMetastoreOrc", "false")
... View more
07-11-2016
08:02 PM
I think this bug fix is also related to the issue https://issues.apache.org/jira/browse/SPARK-8501
... View more
07-11-2016
07:25 PM
There is also an open jira ticket since March https://issues.apache.org/jira/browse/SPARK-14286
... View more
07-11-2016
07:23 PM
3 Kudos
A simple scenario, create an empty orc table using Hive, then try to query the table using Spark Hive:
create table tbl(name string) stored as orc;
Spark:
sqlContext.sql("select * from tbl") // even collect is not needed to see the error! Here is the error: 16/07/11 15:09:21 INFO ParseDriver: Parsing command: select * from tbl
16/07/11 15:09:22 INFO ParseDriver: Parse Completed
java.lang.IllegalArgumentException: orcFileOperator: path hdfs://dobbindata/apps/hive/warehouse/tbl does not have valid orc files matching the pattern
at org.apache.spark.sql.hive.orc.OrcFileOperator$.listOrcFiles(OrcFileOperator.scala:104)
at org.apache.spark.sql.hive.orc.OrcFileOperator$.getFileReader(OrcFileOperator.scala:69)
at org.apache.spark.sql.hive.orc.OrcFileOperator$.readSchema(OrcFileOperator.scala:77)
at org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$2.apply(OrcRelation.scala:185)
at org.apache.spark.sql.hive.orc.OrcRelation$$anonfun$2.apply(OrcRelation.scala:185)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.orc.OrcRelation.<init>(OrcRelation.scala:184)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$20.apply(HiveMetastoreCatalog.scala:580)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$20.apply(HiveMetastoreCatalog.scala:578)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$convertToOrcRelation(HiveMetastoreCatalog.scala:578)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:647)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$$anonfun$apply$2.applyOrElse(HiveMetastoreCatalog.scala:643)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:332)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:321)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:332)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$.apply(HiveMetastoreCatalog.scala:643)
at org.apache.spark.sql.hive.HiveMetastoreCatalog$OrcConversions$.apply(HiveMetastoreCatalog.scala:637)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:83)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:80)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:80)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:72)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:72)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:36)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:37)
at $iwC$$iwC$$iwC.<init>(<console>:39)
at $iwC$$iwC.<init>(<console>:41)
at $iwC.<init>(<console>:43)
at <init>(<console>:45)
at .<init>(<console>:49)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
... View more
Labels:
02-02-2016
06:24 PM
I see no sign of Kafka on Zeppelin, but apparently TwitterUtils is available to use!
... View more
01-29-2016
03:09 PM
have you tried to set SPARK_HOME? I think it most likely solves your problem for more info take a look at this link, scroll down to the Configure section: https://github.com/apache/incubator-zeppelin
... View more
01-28-2016
10:13 PM
1 Kudo
Can you check what the SPARK_HOME value is in zeppelin-env.sh?
... View more
01-28-2016
12:48 AM
1 Kudo
Anyone has any idea why I get the following error when I execute KafkaUtils.createStream on Zeppelin? However it works fine on spark-shell yarn-client error: bad symbolic reference. A signature in KafkaUtils.class refers to term kafka
in package <root> which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling KafkaUtils.class.
error: bad symbolic reference. A signature in KafkaUtils.class refers to term serializer
in value kafka which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling KafkaUtils.class.
... View more
Labels:
01-20-2016
10:58 AM
2 Kudos
I think this is a bug and the proposed workaround by @Benjamin Leonhardi is the only way to fix the issue so far! For the record, as you see in the below hiveserver2.log the MR/Tez execution is completed and ATSHook is successfully finished. But post HiveHook caused the problem due to incompatible operation name! The workaround solution is to remove "org.apache.atlas.hive.hook.HiveHook" from "hive.exec.post.hooks" ....
2016-01-12 11:00:21,188 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.STATS.Stage-2 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: ql.Driver (Driver.java:launchTask(1653)) - Starting task [Stage-2:STATS] in serial mode
2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: exec.StatsTask (StatsTask.java:execute(86)) - Executing stats task
2016-01-12 11:00:21,189 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: get_table : db=default tbl=my_test
2016-01-12 11:00:21,190 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=get_table : db=default tbl=my_test
2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,203 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(747)) - 3: alter_table: db=default tbl=my_test newtbl=my_test
2016-01-12 11:00:21,209 INFO [HiveServer2-Background-Pool: Thread-180]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(372)) - ugi=anonymous ip=unknown-ip-addr cmd=alter_table: db=default tbl=my_test newtbl=my_test
2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,224 WARN [HiveServer2-Background-Pool: Thread-180]: security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1521)) - No groups available for user anonymous
2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(217)) - Updating table stats fast for my_test
2016-01-12 11:00:21,244 INFO [HiveServer2-Background-Pool: Thread-180]: hive.log (MetaStoreUtils.java:updateUnpartitionedTableStatsFast(219)) - Updated size of table my_test to 0
2016-01-12 11:00:21,253 INFO [HiveServer2-Background-Pool: Thread-180]: exec.Task (SessionState.java:printInfo(951)) - Table default.my_test stats: [numFiles=0, totalSize=0]
2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=runTasks start=1452592809214 end=1452592821254 duration=12040 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: hooks.ATSHook (ATSHook.java:<init>(84)) - Created ATS Hook
2016-01-12 11:00:21,254 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=PostHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1452592821254 end=1452592821255 duration=1 from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=PostHook.org.apache.atlas.hive.hook.HiveHook from=org.apache.hadoop.hive.ql.Driver>
2016-01-12 11:00:21,255 ERROR [HiveServer2-Background-Pool: Thread-180]: ql.Driver (SessionState.java:printError(960)) - FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1054)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:154)
at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:71)
at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:206)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2016-01-12 11:00:21,255 INFO [HiveServer2-Background-Pool: Thread-180]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1452592809211 end=1452592821255 duration=12044 from=org.apache.hadoop.hive.ql.Driver>
... View more
01-11-2016
05:07 PM
1 Kudo
If you add your external files using "spark-submit --files" your files will be uploaded to this HDFS folder: hdfs://your-cluster/user/your-user/.sparkStaging/application_1449220589084_0508 application_1449220589084_0508 is an example of yarn application ID! In your spark application, you can find your files in 2 ways: 1- find the spark staging directory by below code: (but you need to have the hdfs uri and your username) System.getenv("SPARK_YARN_STAGING_DIR"); --> .sparkStaging/application_1449220589084_0508 2- find the complete comma separated file paths by using: System.getenv("SPARK_YARN_CACHE_FILES"); --> hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar#__spark__.jar,hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/your-spark-job.jar#__app__.jar,hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/test_file.txt#test_file.txt
... View more
12-17-2015
07:19 AM
I had a quick check on Atlas HiveHook, shouldn't be a bug there because it simply gets the generated operation name string from hook context; hookContext.getOperationName() https://github.com/hortonworks/atlas-release/blob/HDP-2.3.2.0-tag/addons/hive-bridge/src/main/java/org/apache/atlas/hive/hook/HiveHook.java
... View more
12-14-2015
01:30 PM
1 Kudo
I have a Spark job running frequently that populates a Hive table backed by ORC files. Spark generates many small files and even using coalesce can't help to efficiently fill the HDFS large blocks.
The best solution I found was to schedule a job to concatenate the hive table periodically. The alter table actually works fine but it raises an exception as you see below: CREATE TABLE my_test(id String) STORED AS ORC; ALTER TABLE my_test CONCATENATE; Loading data to table default.my_test
Table default.my_test stats: [numFiles=0, totalSize=0]
FAILED: Hive Internal Error: java.lang.IllegalArgumentException(No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE)
java.lang.IllegalArgumentException: No enum constant org.apache.hadoop.hive.ql.plan.HiveOperation.ALTER_TABLE_MERGE
at java.lang.Enum.valueOf(Enum.java:238)
at org.apache.hadoop.hive.ql.plan.HiveOperation.valueOf(HiveOperation.java:23)
at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1522)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) I also don't understand 2 things:
What is the relation between running simple Hive shell query and Atlas? at org.apache.atlas.hive.hook.HiveHook.run(HiveHook.java:151)
Why Hive is trying to use ALTER_TABLE_MERGE enum constant which actually implemented as ALTERTABLE_MERGEFILES? on https://github.com/hortonworks/hive-release/blob/HDP-2.3.2.0-tag/ql/src/java/org/apache/hadoop/hive/ql/plan/HiveOperation.java
... View more
Labels: