Member since
07-26-2016
5
Posts
0
Kudos Received
0
Solutions
01-09-2018
11:27 PM
Hi Guys, We have just updated from CDH 5.4.4 to CDH 5.13.0 , and we are facing in running a spark job on yarn cluster. This tends to relate to something with jets3t version mismatch, with spark and hadoop dependencies on the cluster at runtime, while submitting the job via Hue. 2018-01-09 15:18:36,944 [broadcast-hash-join-1] ERROR org.apache.hadoop.yarn.YarnUncaughtExceptionHandler - Thread Thread[broadcast-hash-join-1,5,main] threw an Error. Shutting down now...
java.lang.VerifyError: Bad type on operand stack
Exception Details:
Location:
org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore.copy(Ljava/lang/String;Ljava/lang/String;)V @152: invokevirtual
Reason:
Type 'org/jets3t/service/model/S3Object' (current frame, stack[4]) is not assignable to 'org/jets3t/service/model/StorageObject'
Current Frame:
bci: @152
flags: { }
locals: { 'org/apache/hadoop/fs/s3native/Jets3tNativeFileSystemStore', 'java/lang/String', 'java/lang/String', 'org/jets3t/service/model/S3Object' }
stack: { 'org/jets3t/service/S3Service', 'java/lang/String', 'java/lang/String', 'java/lang/String', 'org/jets3t/service/model/S3Object', integer }
Bytecode:
0000000: b200 3fb9 0065 0100 9900 36b2 003f bb00
0000010: 5759 b700 5812 66b6 0059 2bb6 0059 1267
0000020: b600 592c b600 5912 68b6 0059 2ab4 0021
0000030: b600 3ab6 0059 b600 5ab9 0069 0200 2ab4
0000040: 0010 9900 302a b400 0b2a b400 212b 0101
0000050: 0101 b600 6a4e 2ab4 001a 0994 9e00 162d
0000060: b600 6b2a b400 1a94 9e00 0a2a 2d2c b600
0000070: 6cb1 bb00 2859 2cb7 0029 4e2d 2ab4 001d
0000080: b600 2e2a b400 0b2a b400 21b6 003a 2b2a
0000090: b400 21b6 003a 2d03 b600 6d57 a700 0a4e
00000a0: 2a2d 2bb7 0033 b1
Exception Handler Table:
bci [0, 113] => handler: 159
bci [114, 156] => handler: 159
Stackmap Table:
same_frame(@62)
same_frame(@114)
same_locals_1_stack_item_frame(@159,Object[#214])
same_frame(@166)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.createDefaultStore(NativeS3FileSystem.java:342)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.initialize(NativeS3FileSystem.java:332)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2816)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2853)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2835)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:91)
at org.apache.spark.sql.execution.Exchange.prepareShuffleDependency(Exchange.scala:220)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:254)
at org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:248)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
at org.apache.spark.sql.execution.Exchange.doExecute(Exchange.scala:247)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:86)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate$$anonfun$doExecute$1.apply(TungstenAggregate.scala:80)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
at org.apache.spark.sql.execution.aggregate.TungstenAggregate.doExecute(TungstenAggregate.scala:80)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:85)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1$$anonfun$apply$1.apply(BroadcastHashOuterJoin.scala:82)
at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:90)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:82)
at org.apache.spark.sql.execution.joins.BroadcastHashOuterJoin$$anonfun$broadcastFuture$1.apply(BroadcastHashOuterJoin.scala:82)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) I am stuck with this for sometime now, and any help or possible direction would be appreciated.
... View more
07-26-2016
12:06 PM
Hi Everyone, This is Ketan, I have around 7 years of experience with Big Data stacks, More can be known by following my linkedIn profile: MyProfile; I have joined the community to get in touch with like minded people and learn from them as well as share my knowledge in best possible manner. Also as I am working with Cloudera distributions recently , would really help me to stay in touch with current advancements and upcoming features.
... View more
07-26-2016
11:46 AM
Hi Guys !!, We have some custom logic based on which we take decisons to update the FQS resource allocations for queues on Cloudera, 'yarn' service; Currently we are working with the entire json which represnts all queue resource allocations: For reference: ApiConfig fsScheduledConfig = rootResource.getClustersResource().getServicesResource("cluster")
.readServiceConfig("yarn", DataView.FULL).getConfigs().stream()
.filter(config -> config.getName().equalsIgnoreCase("yarn_fs_scheduled_allocations")).findFirst().get();
I update this configuration with required allocations, and then update, rootResource.getClustersResource().poolsRefresh("cluster"); What I am looking at was Dynamic Resource Pools(configurations) page on CDH Manager, where I see all the queues(pools) along with their allocations, here I can edit them specifically on UI Console and it automatically updates the allocation properties and refreshes the cluster. I want to know if there is a mechanism to access this configuration of Dynamic Resource Pools via CM_API or Rest API (using v12 probably); This would help me to directly update the allocations with minimum fuss; Any help or insights would be helpful.
... View more
07-14-2016
03:59 PM
Hi Ryan, Yes I have the fair scheduler enabled, But how would enabling Pre-emption help me out here ??
... View more
07-14-2016
11:50 AM
Hi, I am stuck with this issue for quite sometime now; We have a cluster (with spot instances as workers with auto scaling groups) and cluster is managed by Yarn. We are using fair schedulers on the same. Often we need to add more resources to particular queues based upon the SLA's we have( we are maintaining them in our rep, from AWS metrics, to Yarn Metrics etc). Then we have SLA's which is often time bounded but sometimes metric dependent as well. For Yarn , we are using Fair Scheduling currently. The need is that for a particular SLA, (with a given allowed deviation), if we find that with current resource allocation to this queue we would not be able to meet our SLA( based on application progress etc), we add more instances(resources/containers) to cluster. But what we want to control is that these new resources that are added to cluster are taken up by a specific queue only to speed up the applications in that queue and we meet our SLAs. Would be really helpful if someone can help us with the solution to do this.
... View more
Labels: