Member since
09-25-2016
34
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8448 | 08-24-2017 09:36 AM | |
4578 | 08-17-2017 08:57 AM |
07-13-2018
11:02 AM
We've requirement fro low latency data availability. So there is a pressure to run this even more frequently not less. Would it help if we allocate more memory to catalog service or statestore service ?
... View more
07-12-2018
09:16 PM
We're a streaming application that's writes parquet files to HDFS to a partitioned ( partitioned by day and one more custom integer customer id) impala folder. We need to run refresh table in order to make Impala aware of the new files. The files are generated every minute and we run refresh table command every 2 minutes. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html We're two options 1) Run "refresh table <table name>" or 2) use "refresh table partition <partition spec>" ( available in CDH 5.11 / 5.10 onwards that refresh perticular partition. In terms of total time taken; "Refresh table <table name>" is very efficient in terms of time taken. It takes ~ 20 seconds or someting vs 5-7 seconds for each partition using "refresht able <table name > <partition spec>. I'd like to ask community and especially Impala team; what is recommanded to use in use case like ours. Running 30 individual refresh every minute or running one ? Or is there a third option that we don't know about ?
... View more
Labels:
- Labels:
-
Apache Impala
10-25-2017
12:04 PM
# Create parquet Impala table temp with a column a # write parquet file using streaming applicaiton/ map reduce job call parquet schema for that #Impala select a from default.temp works and returns data #hive select a from default.temp returns null because it tries to reference column name from parquet schema I think and it doesn't match. Is there a way to force hive to read column name from metastore instead of parquet schema ?
... View more
Labels:
- Labels:
-
Apache Hive
09-15-2017
04:28 PM
Has anyone successfully run/installed impala-shell on mac ? I've copied impala-shell from /usr/bin and impala-shell directory from /usr/lib to my home directory. When I'm trying to launch impala-shell I get this error. I tried pip install sasl but it didn't solve the problem. I'm running python 2.7.11 in my machine. macofsunil:bin sparmar$ ./impala-shell Traceback (most recent call last): File "/Users/sparmar/bin/../lib/impala-shell/impala_shell.py", line 34, in <module> from impala_client import (ImpalaClient, DisconnectedException, QueryStateException, File "/Users/sparmar/lib/impala-shell/lib/impala_client.py", line 16, in <module> import sasl File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/sasl/__init__.py", line 1, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/sasl/saslwrapper.py", line 7, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/_saslwrapper.py", line 7, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/_saslwrapper.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so, 2): no suitable image found. Did find: /Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00 /Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00 macof
... View more
Labels:
- Labels:
-
Apache Impala
09-15-2017
09:43 AM
Finding logs manually in machine sound very brute force; I was thinking more of an API or CLI option to find logs Anyway the main issue we're trying to solve is access to logs to all developers in prod environment. Our node managers are behind the bars and not accessible ( any port or web ) to develoeprs and it's unlikely to happen. So we're trying to find a way to proxy the logs. I discovered that there is a jobhistory proxy to look at completed jobs / yarn apps but I coudln't get it working for running app. Is there any trick / way to access running app's logs like above ? http://resourcemanager.xyz.com:19888/jobhistory/logs//dataNode.com:8041/container_id_000001/container_id_000001/root
... View more
09-11-2017
09:57 PM
Is there a YARN API or command to know path to yarn logs location on disk for given container and application id ? Also want to add; we don't have log aggregation working and I'm perticularly looking for direct physical link to the file not the web interface. Thanks, Sunil
... View more
Labels:
- Labels:
-
Apache YARN
08-24-2017
03:53 PM
I have a table with two columns test, id and yyyymmdd and group_id are my partition columns. When I run following query it runs fast as it only scans 1 partition. select id, yyyymmdd, group_id, test from dwh.table where (id='1a' and yyyymmdd=20170815 and group_id=1) But when I tried to run following . It scans entire table. Even explain shows that it is going to perform full table scan. I think I'm missing somethign simple here. Appreciate community's help to find out what I'm missing here ? select id, yyyymmdd, group_id, test from dwh.table where (id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2) How to scan two rows in two different partitions ?
... View more
Labels:
- Labels:
-
Apache Impala
08-24-2017
09:36 AM
Using cloudera manager goto Sentry->Configurations Add users/groups to following property to allow them create/show roles. Smaller fonts are property name in the configuration file while regular fonts are display name of the property in the CM. Admin Groups sentry.service.admin.group Allowed Connecting Users sentry.service.allow.connect
... View more
08-18-2017
12:31 AM
I'm using Sentry service using Cloudera manager. I just realized that I can other users / groups to sentry config in cloudera manager and allow them to run Grant / Create role commands.
... View more
08-17-2017
08:57 AM
Actualy I figured out. I had to configure Impala to allow user ldaptest to impersonate as user cloudera ( hue login). I appended this to the cloudera manager property Proxy User Configuration ( authorized_proxy_user_config ) hue=*;ldaptest=cloudera So user hue can impersonate anyone and user 'ldaptest' can impersonate as 'cloudera'.
... View more