Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark Phoenix Performance

Highlighted

Spark Phoenix Performance

New Contributor

can some one help in fine tuning the below Spark Phoenix load. Currently it is taking 17min to load 40 million records from MySql database,

def getmodems(sqlContext: SQLContext) : DataFrame = { val modems_df = broadcast(sqlContext.read .format("jdbc") .option("url", "jdbc:mysql://hostname:port/dbname") .option("driver", "com.mysql.jdbc.Driver") .option("dbtable", "db.table_name") .option("user", "username") .option("password", "password") .load())

dataframe.write.format(sparkDriver).mode("overwrite").option("table", phoenix_table).option("zkUrl", "jdbc:phoenix:zkquoram:2181:/hbase-secure:user@HDP.DEV.COM").save()

3 REPLIES 3

Re: Spark Phoenix Performance

That's ingesting records at a rate of 40,000 rows per second which seems pretty good to me.

If you want more help, you must provide more information about your scenario:

  • Phoenix Table DDL
  • Available hardware
  • Hardware specifications

Re: Spark Phoenix Performance

New Contributor

Phoenix Table DDL:

db.phoenix_table (col0 VARCHAR not null primary key, col1 BIGINT, col2 VARCHAR, --- col33 DECIMAL ) salt_bukets=95;

We have 8 node cluster with 5 datanodes CPU(s): 24 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 2, Memory 250G

I need to load 2-3 Million records per load with all default settings in the cluster, is the spark task partition need to be equal to Phoenix table salt bucket. My table will not increase in size, as most of the operations will be overwrite. So splits will also remain constant.

Can you suggest if there are any other efficient ways in doing ingestion?

Re: Spark Phoenix Performance

New Contributor

Hey Apoorva, You performance stats are pretty good. My Cluster load is like it takes 400,000 records in 1min through Phoenix. I have made some config changes but didn't achieve what I need. Can you please help me out. I use a 3 Node Cluster with 64GB RAM and 8 Core processor. Please help me out.