Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark to Phoenix Upsert is slow

Spark to Phoenix Upsert is slow

New Contributor

Upserting to Phoenix table, reading an Hive ORC through a Spark DF. For 10 M rows taking around 40 minutes time. Initial stage of executing it is loading 20K rows per sec and gradually it drops to 500 rows per sec when it reaches near to 50 to 60% load process. Note: Have set the the phoenix.mutate.MaxSize to 5000000 and maxSize,bytes to 104857600000. Any thoughts ? it appears the load is good at start but it appears it is caching to WAL or something that its caching somewhere. However there is no WAL enabled in the cluster. Its a 6 node cluster(ambari 2.7.3) and spark with 20 cores(3 cores per executor) + 20 GB per  executor. Below is the syntax used to load phoenix

 

df.write \
  .format("phoenix") \
  .mode("overwrite") \
  .option("table", "TABLE1") \
  .option("zkUrl", "phoenix-server:2181") \
  .save()
 
Don't have an account?
Coming from Hortonworks? Activate your account here