About sunilosunil

sunilosunil · ‎07-13-2018

We've requirement fro low latency data availability. So there is a pressure to run this even more frequently not less. Would it help if we allocate more memory to catalog service or statestore service ?

sunilosunil · ‎07-12-2018

We're a streaming application that's writes parquet files to HDFS to a partitioned ( partitioned by day and one more custom integer customer id) impala folder. We need to run refresh table in order to make Impala aware of the new files. The files are generated every minute and we run refresh table command every 2 minutes. https://www.cloudera.com/documentation/enterprise/latest/topics/impala_refresh.html We're two options 1) Run "refresh table <table name>" or 2) use "refresh table partition <partition spec>" ( available in CDH 5.11 / 5.10 onwards that refresh perticular partition. In terms of total time taken; "Refresh table <table name>" is very efficient in terms of time taken. It takes ~ 20 seconds or someting vs 5-7 seconds for each partition using "refresht able <table name > <partition spec>. I'd like to ask community and especially Impala team; what is recommanded to use in use case like ours. Running 30 individual refresh every minute or running one ? Or is there a third option that we don't know about ?

sunilosunil · ‎10-25-2017

# Create parquet Impala table temp with a column a # write parquet file using streaming applicaiton/ map reduce job call parquet schema for that #Impala select a from default.temp works and returns data #hive select a from default.temp returns null because it tries to reference column name from parquet schema I think and it doesn't match. Is there a way to force hive to read column name from metastore instead of parquet schema ?

sunilosunil · ‎09-15-2017

Has anyone successfully run/installed impala-shell on mac ? I've copied impala-shell from /usr/bin and impala-shell directory from /usr/lib to my home directory. When I'm trying to launch impala-shell I get this error. I tried pip install sasl but it didn't solve the problem. I'm running python 2.7.11 in my machine. macofsunil:bin sparmar$ ./impala-shell Traceback (most recent call last): File "/Users/sparmar/bin/../lib/impala-shell/impala_shell.py", line 34, in <module> from impala_client import (ImpalaClient, DisconnectedException, QueryStateException, File "/Users/sparmar/lib/impala-shell/lib/impala_client.py", line 16, in <module> import sasl File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/sasl/__init__.py", line 1, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/sasl/saslwrapper.py", line 7, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/_saslwrapper.py", line 7, in <module> File "/Users/sparmar/lib/impala-shell/ext-py/sasl-0.1.1-py2.6-linux-x86_64.egg/_saslwrapper.py", line 6, in __bootstrap__ ImportError: dlopen(/Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so, 2): no suitable image found. Did find: /Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00 /Users/sparmar/.python-eggs/sasl-0.1.1-py2.6-linux-x86_64.egg-tmp/_saslwrapper.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00 macof

sunilosunil · ‎09-15-2017

Finding logs manually in machine sound very brute force; I was thinking more of an API or CLI option to find logs Anyway the main issue we're trying to solve is access to logs to all developers in prod environment. Our node managers are behind the bars and not accessible ( any port or web ) to develoeprs and it's unlikely to happen. So we're trying to find a way to proxy the logs. I discovered that there is a jobhistory proxy to look at completed jobs / yarn apps but I coudln't get it working for running app. Is there any trick / way to access running app's logs like above ? http://resourcemanager.xyz.com:19888/jobhistory/logs//dataNode.com:8041/container_id_000001/container_id_000001/root

sunilosunil · ‎09-11-2017

Is there a YARN API or command to know path to yarn logs location on disk for given container and application id ? Also want to add; we don't have log aggregation working and I'm perticularly looking for direct physical link to the file not the web interface. Thanks, Sunil

sunilosunil · ‎08-24-2017

I have a table with two columns test, id and yyyymmdd and group_id are my partition columns. When I run following query it runs fast as it only scans 1 partition. select id, yyyymmdd, group_id, test from dwh.table where (id='1a' and yyyymmdd=20170815 and group_id=1) But when I tried to run following . It scans entire table. Even explain shows that it is going to perform full table scan. I think I'm missing somethign simple here. Appreciate community's help to find out what I'm missing here ? select id, yyyymmdd, group_id, test from dwh.table where (id='1a' and yyyymmdd=20170815 and group_id=1) OR (id='2b' and yyyymmdd=20170811 and group_id=2) How to scan two rows in two different partitions ?

sunilosunil · ‎08-24-2017

Using cloudera manager goto Sentry->Configurations Add users/groups to following property to allow them create/show roles. Smaller fonts are property name in the configuration file while regular fonts are display name of the property in the CM. Admin Groups sentry.service.admin.group Allowed Connecting Users sentry.service.allow.connect

sunilosunil · ‎08-18-2017

I'm using Sentry service using Cloudera manager. I just realized that I can other users / groups to sentry config in cloudera manager and allow them to run Grant / Create role commands.

sunilosunil · ‎08-17-2017

Actualy I figured out. I had to configure Impala to allow user ldaptest to impersonate as user cloudera ( hue login). I appended this to the cloudera manager property Proxy User Configuration ( authorized_proxy_user_config ) hue=*;ldaptest=cloudera So user hue can impersonate anyone and user 'ldaptest' can impersonate as 'cloudera'.

Online	Offline
Last Visited	‎07-03-2019 04:36 PM

Member Since	‎09-25-2016 03:20 PM
Last Visited	‎07-03-2019 04:36 PM
Posts	34
Kudos received	1

Cloudera Community

Re: How to create a role admin user / priviledge

Re: HUE with IMPALA with LDAP, SENTRY enabled

Re: refresh table vs refresh table partition

refresh table vs refresh table partition

Hive select when column name different in parquet ...

impala-shell on mac

Re: yarn logs location on disk

yarn logs location on disk

Impala query to scan two records in different part...

Re: How to create a role admin user / priviledge

Re: How to create a role admin user / priviledge

Re: HUE with IMPALA with LDAP, SENTRY enabled