Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I have written a simple logic to find association rules similar to that of collaborative filtering the logic works fine but we are facing run time issues in executing the job

I have written a simple logic to find association rules similar to that of collaborative filtering the logic works fine but we are facing run time issues in executing the job

New Contributor

Reading the dataframe using spark session

 val dframe = ss.read.option("inferSchema", value=true).option("delimiter", ",").csv("/home/balakumar/scala work files/matrimony.txt")


create two tables from the input table


val dfLeft = dframe.withColumnRenamed("_c1", "left_data")
val dfRight = dframe.withColumnRenamed("_c1", "right_data")

Join and filter duplicates from the table


  val joined = dfLeft.join(dfRight , dfLeft.col("_c0") === dfRight.col("_c0") ).filter(col("left_data") !== col("right_data") )

write the joined output as csv

 val result = joined.select(col("left_data"), col("right_data") as "similar_ids" )
 result.write.csv("/home/balakumar/scala work files/output")

While running the above spark job in the cluster with following configuration it gets spread into following job Id's

CLUSTER CONFIGURATION

3 NODE CLUSTER

NODE 1 - 64GB 16CORES

NODE 2 - 64GB 16CORES

NODE 3 - 64GB 16CORES

107895-works-untill-51-of-254.png


At Job Id 2 job is stuck at the stage 51 of 254 and then it starts utilising the disk space I am not sure why is this happening and my work is completely ruined . could someone help me on this

1 REPLY 1

Re: I have written a simple logic to find association rules similar to that of collaborative filtering the logic works fine but we are facing run time issues in executing the job

New Contributor

The job is critical could someone help on this

Don't have an account?
Coming from Hortonworks? Activate your account here