Member since
06-02-2020
331
Posts
67
Kudos Received
49
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2801 | 07-11-2024 01:55 AM | |
| 7860 | 07-09-2024 11:18 PM | |
| 6571 | 07-09-2024 04:26 AM | |
| 5905 | 07-09-2024 03:38 AM | |
| 5605 | 06-05-2024 02:03 AM |
04-09-2023
08:56 PM
Display query metrics of Analyzer/Optimizer Rules
To display the query metrics of effective runs of Analyzer/Optimizer Rules, we need to use the RuleExecutor object.
RuleExecutor metrics will help us to identify which rule is taking more time.
object RuleExecutor {
protected val queryExecutionMeter = QueryExecutionMetering()
/** Dump statistics about time spent running specific rules. */
def dumpTimeSpent(): String = {
queryExecutionMeter.dumpTimeSpent()
}
/** Resets statistics about time spent running specific rules */
def resetMetrics(): Unit = {
queryExecutionMeter.resetMetrics()
}
def getCurrentMetrics(): QueryExecutionMetrics = {
queryExecutionMeter.getMetrics()
}
}
Display the query metrics using Scala code:
import org.apache.spark.sql.catalyst.rules.RuleExecutor
var df = spark.range(100).toDF()
// Do this first or your values will be cumulative
RuleExecutor.resetMetrics()
for (i <- 1 to 500) {
df = df.withColumn("id_" + i, col("id") + i)
}
println(RuleExecutor.dumpTimeSpent())
Output:
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 84000
Total time: 3.88783617 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 440769504 / 514930128 500 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 316457344 / 452531027 500 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 225949844 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 0 / 218911510 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 0 / 164356553 0 / 1500
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 141921674 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 61465421 / 109227965 500 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 0 / 98820566 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed 0 / 95189919 0 / 1500
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 0 / 87679682 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 0 / 84910897 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 0 / 83477029 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 0 / 80862398 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 0 / 79603933 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 0 / 77833267 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 0 / 75004425 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 0 / 70605659 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 0 / 68207411 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 63671046 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 0 / 62710050 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 0 / 60797560 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 59701827 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 0 / 58397231 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 57996555 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions 0 / 55920753 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 0 / 55915403 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 55282298 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 55127879 0 / 1500
org.apache.spark.sql.catalyst.analysis.CleanupAliases 25547573 / 48980732 500 / 1000
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables 0 / 40694858 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 29383949 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 29069226 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 28936950 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 0 / 27205184 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 0 / 26819109 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 25447455 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 19031939 0 / 1500
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 18588169 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 0 / 14494796 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 13098274 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 0 / 13007322 0 / 1500
org.apache.spark.sql.execution.datasources.FindDataSourceTable 0 / 12953222 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 12934635 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 0 / 12923604 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 12910359 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 0 / 12652452 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 12609578 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 0 / 12577636 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 0 / 12549282 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 12509176 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 12436875 0 / 1500
org.apache.spark.sql.hive.ResolveHiveSerdeTable 0 / 12354180 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation 0 / 12323360 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints 0 / 11305271 0 / 500
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints 0 / 5869056 0 / 500
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 5663024 0 / 500
org.apache.spark.sql.catalyst.analysis.AliasViewChild 0 / 5435219 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 5325264 0 / 500
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 0 / 5177154 0 / 500
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 5001442 0 / 500
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 4902903 0 / 500
org.apache.spark.sql.hive.DetermineTableStats 0 / 4890450 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 0 / 4862617 0 / 500
org.apache.spark.sql.hive.RelationConversions 0 / 4755488 0 / 500
org.apache.spark.sql.hive.HiveAnalysis 0 / 4703621 0 / 500
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 4532930 0 / 500
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 4528303 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 4486339 0 / 500
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 0 / 4360278 0 / 500
Display the query metrics using Python code:
>>> df = spark.range(100)
RuleExecutor = spark._jvm.org.apache.spark.sql.catalyst.rules.RuleExecutor
RuleExecutor.resetMetrics()
for i in range(500):
df = df.withColumn("id_" + str(i), df["id"] + str(i))
print(RuleExecutor.dumpTimeSpent())
Output:
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 84000
Total time: 4.083792217 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 428130064 / 486834072 500 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 386199112 / 471189375 500 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 0 / 267973019 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 234887998 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 0 / 221097152 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 0 / 187387962 0 / 1500
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 169662648 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 0 / 118567324 0 / 1500
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 0 / 114827165 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 66654151 / 114117034 500 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed 0 / 94635230 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 0 / 75385981 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 0 / 74819084 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 0 / 71308118 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 66340413 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 60474458 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 0 / 60323978 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 0 / 60093349 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 0 / 59012355 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 0 / 57692382 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 0 / 57007739 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 0 / 55960927 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 55746694 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 54499655 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions 0 / 54407217 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 0 / 53640536 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 0 / 53254689 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 52972883 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 40538255 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables 0 / 39820352 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 0 / 39517598 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 0 / 37882907 0 / 1500
org.apache.spark.sql.catalyst.analysis.CleanupAliases 17699843 / 33339362 500 / 1000
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 28576611 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 26537381 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 20261221 0 / 500
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 14907118 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 0 / 14088163 0 / 1500
org.apache.spark.sql.hive.ResolveHiveSerdeTable 0 / 13454797 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 0 / 13337544 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 13324686 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 13170550 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation 0 / 13104878 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 0 / 13068603 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 0 / 13012218 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 12952997 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 0 / 12928211 0 / 1500
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 0 / 12859184 0 / 1500
org.apache.spark.sql.execution.datasources.FindDataSourceTable 0 / 12798249 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 12657792 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 12643004 0 / 1500
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 12455529 0 / 1500
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 12143355 0 / 1500
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints 0 / 12042103 0 / 500
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints 0 / 6474487 0 / 500
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 5663492 0 / 500
org.apache.spark.sql.catalyst.analysis.AliasViewChild 0 / 5625356 0 / 500
org.apache.spark.sql.hive.RelationConversions 0 / 5533718 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 5348453 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 5247319 0 / 500
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 0 / 5239071 0 / 500
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 5216258 0 / 500
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 5027277 0 / 500
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 5003055 0 / 500
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 0 / 4939426 0 / 500
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 0 / 4930642 0 / 500
org.apache.spark.sql.hive.HiveAnalysis 0 / 4709238 0 / 500
org.apache.spark.sql.hive.DetermineTableStats 0 / 4658002 0 / 500
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 4634318 0 / 500
... View more
Labels:
04-06-2023
01:19 PM
@pankshiv1809 Can you share the spark-submit conf for UPSS_PROMO_PROMOTIONS Spark JOB ? JConsole, which helps to detect performance problems in the code including java.lang.OutOfMemoryErrors. Depending on the available memory on your cluster you can then re-adjust as suggested by @RangaReddy
... View more
04-06-2023
12:58 PM
@pankshiv1809 Can you share a more detailed log and background on your environment,Python versions etc Geoffrey
... View more
04-05-2023
11:48 PM
Hi @gizelly Have you tried above solution and its worked? If yes, please accept as Solution. It will useful to other members.
... View more
04-02-2023
09:48 PM
@ComNic, Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
... View more
03-30-2023
06:41 AM
Hi @sat_046 As i mentioned earlier comment, unfortunately it is not possible to delay the tasks. You can find the Spark code when tasks failed. https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L879C8-L1002 Please accept the solution if you liked my answer.
... View more
03-30-2023
04:38 AM
Use the following tool to generate no of executors: https://rangareddy.github.io/SparkConfigurationGenerator/ In order to calculate the driver memory/executor memory we need to start with 1g, 2g, 4g, 8g .... and executor-cores you can set 3-5 and number of executor it will depend on data how much you are processing.
... View more
03-30-2023
04:35 AM
Hi @pankshiv1809 To run application faster, we need to tune the resources, spark code and cluster. To solve any kind of performance issues, you need to go through the Spark UI and understand jobs, stages and executors. After that you need to tune the resources like driver and executor memory and no of executors and separate queue to process the data. If you want to know other techniques raise the Cloudera case, we will help to further.
... View more
03-30-2023
04:31 AM
Hi @quangbilly79 Cloudera will support YARN and Kubernets deployment mode and it will not support Standalone mode (In standalone mode you can access the Spark Master using 7077 port). In order to check which node driver is launched and which node is executor is launched you need to go to Spark UI or Spark History Server UI of that application. From there go to Executors tab. You can see list of executors. In the second table you find executor id. Where the executor id is 'driver' that is the one Driver Node and remaining all are executors.
... View more