Member since
02-08-2017
15
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1277 | 02-10-2017 01:06 PM |
03-24-2017
06:54 AM
Hi All, I have issue in fetching data from Hbase/Phoenix and writing into a file, have increased the hfile block size but i think it will improve hbase data load performance but will reduce the Hbase/Phoenix table read performance. data block encoding and ROW_COL bloom filer are also not helping.As i am able to do aggregation operation on 100 million records in some seconds.But when ever i try to write the data on to a file it is taking more time EXAMPLE: The Phoenix table "WEB_STAT2" has 200 million of rows of data. scenario 1:Here I am doing select operation and caching the result. It is taking 3 seconds
val tblDF = sqlContext.phoenixTableAsDataFrame("WEB_STAT2", Seq("HOST","DOMAIN","FEATURE","DATE","CORE","DB","ACTIVE_VISITOR"), predicate = Some("\"HOST\" = 'EU' AND \"CORE\" <1000000")) tblDF.cache() scenario 2:Here I am doing select operation and writing data onto a local file, It is taking 10 seconds for 1 Million records(70 seconds for 10 million records) val tblDF = sqlContext.phoenixTableAsDataFrame("WEB_STAT2",
Seq("HOST","DOMAIN","FEATURE","DATE","CORE","DB","ACTIVE_VISITOR"),
predicate = Some("\"HOST\" = 'EU' AND \"CORE\" <100000")) tblDF.write.parquet("header", "true").save("/home/result_DIR")
My question in these 2 scenario are: 1)Why it is taking more time when i am writing data onto file, Is it do to serialization and de-serialization or when i do cache the data will not loaded on to data frame only when i do write operation then only data will be loaded on to data frame. 2)What I have to do, if we need to query on a phoenix table with 2 trillion rows of data and write result of the query into a file (The query result will be around 20 million). Ideally we need to do all these operation in 2 to 3 seconds. Thanks Ashok
... View more
02-10-2017
01:06 PM
Thanks, I am able to do the custom filer operation. I have removed the filterRow() method overriding from my custom filter class after that its working as expected Thanks Ashok
... View more
02-08-2017
06:34 PM
Thanks Josh elser, have copied the jar to lib directory and restarted zookeeper Now the custom filters working fine Thank alot
... View more