Member since
12-16-2015
7
Posts
0
Kudos Received
0
Solutions
02-02-2017
11:27 AM
Hey Iv@c - I think you are right. The table beneath has 700+ columns. The moment I created a VIEW selecting just 30 columns, the response time became same as what I get querying directly the table (2-3 secs). In my case though, I cannot select a set of columns, because the VIEW needs to be queries across the application. Maybe when we get a chance to change the front-end queries, we will try to implement this. Create different views for different reports, selecting just required columns from the underlying table.
... View more
01-31-2017
05:42 PM
Hey Gatsby - I'm using impalad version 2.3.0-cdh5.5.1
... View more
01-31-2017
05:38 PM
Hi Lars- No. Everytime I fire the query, it takes this long. I haven't checked logs yet. Went through the query profile, and found that query planning is what killing me. Will check logs to see if I can find anything. Thanks for your response. Regards, Gaurang
... View more
01-30-2017
01:32 PM
Hello - As we are recomputing data everyday, I need remove old data and load new data everyday. We create our parquet data files through Map Reduce. So in order to reach ZERO downtime during switching yesterday's data with today's data, I came up with the idea of having a fixed VIEW and then after batch processing issue a ALTER VIEW statement to change the underlying table. first time - CREATE VIEW table_view AS SELECT * from table_0130 daily - ALTER VIEW table_view AS SELECT * from table_0131 Most of our queries worked well. The response time did degrade slightly but nothing alarming. But for few BIG JOIN queries, the response time went from 2-3 secs to 2-3 mins. On further digging into query profile, I found that the query planning is taking 2+ mins. Why would it take so much time? The VIEW is a simple one, just a SELECT *. Any impala conf settings that can resolve this? I appreciate any help, pointers regarding this issue. Querying VIEW Planner Timeline: 2m17s
- Analysis finished: 2s588ms (2s588ms)
- Equivalence classes computed: 1m16s (1m13s)
- Single node plan created: 2m17s (1m1s)
- Distributed plan created: 2m17s (223.64ms)
- Lineage info computed: 2m17s (2.6ms)
- Planning finished: 2m17s (9.974ms)
Query Timeline: 2m31s
- Start execution: 53.597us (53.597us)
- Planning finished: 2m26s (2m26s)
- Ready to start remote fragments: 2m26s (63.364ms)
- Remote fragments started: 2m31s (4s442ms)
- Cancelled: 2m31s (5.567ms)
- Rows available: 2m31s (35.971ms)
- Unregister query: 2m31s (118.833us) Querying TABLE (directly) Planner Timeline: 55.334ms
- Analysis finished: 21.430ms (21.430ms)
- Equivalence classes computed: 22.938ms (1.507ms)
- Single node plan created: 47.813ms (24.875ms)
- Distributed plan created: 51.913ms (4.99ms)
- Lineage info computed: 52.394ms (481.757us)
- Planning finished: 55.334ms (2.939ms)
Query Timeline: 1s036ms
- Start execution: 45.736us (45.736us)
- Planning finished: 125.378ms (125.332ms)
- Ready to start remote fragments: 129.281ms (3.902ms)
- Remote fragments started: 478.56ms (348.775ms)
- Rows available: 882.741ms (404.685ms)
- First row fetched: 982.468ms (99.727ms)
- Unregister query: 998.825ms (16.356ms)
... View more
Labels:
- Labels:
-
Apache Impala
12-17-2015
09:15 AM
Hi Matheiu - You were right, it was related to Garbage Collection. The moment I loaded another big index, the JVM memory went above 70% and GC was initiated. I was just being pre-maturely concerned about this. Thanks Again! Gaurang
... View more
12-16-2015
01:25 PM
Hi, My use case is zero-down time while batch indexing. To achieve this, I plan to create a new index everyday in batch using MR indexer tool, while the live index serves the queries. After the new index is built I will use collection alias to switch. After the switch I will delete the stale index. Now the concern I have is - After I delete an index (huge index. Around 320 mill documents. Distributed in 64 shards spread over 4 solr instances) the solr JVM memory is not released. While the index is live, the solr jvm shows around 10g occupied. It remains the same even after I delete the index. If the solr service is restarted, the memory is released. My understanding is that solr uses its heap to store different caches for query speedup. But I need to figure out a way to clear this cache (for a particular index) when I delete the index, or else I will soon face OOM errors. Please can someone help? 4 solr instances Solr Java heap size - 20 gb Direct Memory Allocation - 4gb Below are the HDFSDirectoryFactory settings in solrconfig.xml <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:org.apache.solr.core.HdfsDirectoryFactory}"> <str name="solr.hdfs.home">${solr.hdfs.home:}</str> <str name="solr.hdfs.confdir">${solr.hdfs.confdir:}</str> <str name="solr.hdfs.security.kerberos.enabled">${solr.hdfs.security.kerberos.enabled:false}</str> <str name="solr.hdfs.security.kerberos.keytabfile">${solr.hdfs.security.kerberos.keytabfile:}</str> <str name="solr.hdfs.security.kerberos.principal">${solr.hdfs.security.kerberos.principal:}</str> <bool name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool> <str name="solr.hdfs.blockcache.global">${solr.hdfs.blockcache.global:true}</str> <int name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int> <bool name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:true}</bool> <int name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int> <bool name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool> <bool name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool> <int name="solr.hdfs.blockcache.bufferstore.buffercount">${solr.hdfs.blockcache.bufferstore.buffercount:0}</int> <bool name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool> <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int> <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int> </directoryFactory> Regards, Gaurang
... View more
Labels:
- Labels:
-
Apache Solr
-
HDFS
-
Kerberos
-
Security