Member since
09-25-2016
34
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7127 | 08-24-2017 09:36 AM | |
3500 | 08-17-2017 08:57 AM |
06-19-2019
10:23 AM
We do have load balancer in Impala. In our case issue happen to be cross referenced VIP in different data center causing load on metadata servers. But still some of the capabilities of metadata status is either missing or undocumented or may be I'm unaware of. SYNC_DDL makes the DDL query extremely slow ... the performance drops from something that runs in 3 seconds to 5 minutes. We have 30 node cluster and I am hoping SYNC_DDL doesn't mean sequenctial execution of DDL ( even that doens't ad up ). Is there a way to identify which node needs metadata refresh ? ( or which impalad has invlid metadata .... i.e. time when last metadata refresh occured ? )
... View more
06-11-2019
10:32 PM
Environment : CDH 5.15 Impala version : impalad version 2.12.0-cdh5.15.0 RELEASE OS: Centos 6.10 Table size : 88TB Partitions : 7K Type : Parquet, file size compacted 256MB We ingest data every minute to the table partition and run refresh table to load the data. There is a separate compaction process that runs every hour and merges smaller files into big. The set up was working fine for months until recently we are running into a strange issue of inconsistent behavior between few nodes. Randomly some nodes appears to have incosistent metadata i.e. even though refresh table command ran successfully some nodes still didn't have correct files so they referred older files for those partitions. We tried invalidating metadata ( followed by describe table to fix metada) but it didn't help. Even re-running refresh doesn't help all the time. We need some help/points to figure out the issue. * Is there a way to check if all Impala nodes have stale metadata ? * How to fix metadata for individual node ?Is there a command ? * Anyone has faced similar issue ? Can you share your experience and fix ?
... View more
Labels:
- Labels:
-
Apache Impala
07-13-2018
11:02 AM
We've requirement fro low latency data availability. So there is a pressure to run this even more frequently not less. Would it help if we allocate more memory to catalog service or statestore service ?
... View more
07-12-2018
09:16 PM
We're a streaming application that's writes parquet files to HDFS to a partitioned ( partitioned by day and one more custom integer customer id) impala folder. We need to run refresh table in order to make Impala aware of the new files. The files are generated every minute and we run refresh table command every 2 minutes. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html We're two options 1) Run "refresh table <table name>" or 2) use "refresh table partition <partition spec>" ( available in CDH 5.11 / 5.10 onwards that refresh perticular partition. In terms of total time taken; "Refresh table <table name>" is very efficient in terms of time taken. It takes ~ 20 seconds or someting vs 5-7 seconds for each partition using "refresht able <table name > <partition spec>. I'd like to ask community and especially Impala team; what is recommanded to use in use case like ours. Running 30 individual refresh every minute or running one ? Or is there a third option that we don't know about ?
... View more
Labels:
- Labels:
-
Apache Impala
03-29-2018
10:17 PM
Thanks but Iogs are not the issue here. I think the issue is Time-Series Storage. The configuration attribute firehose_time_series_storage_bytes controls the disk usage but the minimum value that can be set is 10GB. Is there a way to override this value ?
... View more
01-22-2018
07:32 PM
Looks like by default disk space used by service monitor and host monitor is huge ! Several GB. They appear in some /var/lib/cloudera-service-monitor and /var/lib/cloudera-host-monitor directory. Most space was taken by folders ts and type in both case. Is there a way to configure these two services to use less pace in development and test environment ?
... View more
Labels:
- Labels:
-
Cloudera Manager
12-16-2017
10:25 PM
Oh ...I might have missed something. Just wanted to make sure I am not missing anything. 1. Add --insert_inherit_permissions=true for impalad safety valve 2. Set partition directory permission to 764 ( or whatever required ? ) 3. Insert into partition directory 4. Check file permission ; it should be same as folder ? 764 ?
... View more
12-06-2017
12:37 PM
Environment CDH 5.12 When running INSERT query on the table all files are always owned by user impala i.e. 744, all for impala read for everyone else . We have externally running compaction process which needs to read/write/replace this files. Is there a way to change this default behavior to have the file permission different than 744 ? I'd prefer if it's 764 ( group read/write ) so we can add the user to the same group as Impala who run the compaction process I tried change Impala Daemon Environment Advanced Configuration Snippet (Safety Valve) property and added --insert_inherit_permissions=true The upstream directory was 774 but files created were still 744. So other users can not write/edit those files.
... View more
Labels:
- Labels:
-
Apache Impala
11-22-2017
12:22 PM
Env : CDH 5.12
We're using Cloudera Manager to configure and use HUE.
We're using HUE to expose ad hoc query to our cluster for occasional debugging for Impala and HIVE. We would want to configure HUE to add more editors to access other data systems. HUE documentation has guidance on how to add more editors but we have trouble figuring out to this using cloudera manager. How can we translate this to cloudera manager managed HUE configs ?
http://gethue.com/custom-sql-query-editors/
... View more
Labels:
10-25-2017
12:04 PM
# Create parquet Impala table temp with a column a # write parquet file using streaming applicaiton/ map reduce job call parquet schema for that #Impala select a from default.temp works and returns data #hive select a from default.temp returns null because it tries to reference column name from parquet schema I think and it doesn't match. Is there a way to force hive to read column name from metastore instead of parquet schema ?
... View more
Labels:
- Labels:
-
Apache Hive