Member since
09-02-2016
523
Posts
89
Kudos Received
42
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2309 | 08-28-2018 02:00 AM | |
2160 | 07-31-2018 06:55 AM | |
5070 | 07-26-2018 03:02 AM | |
2433 | 07-19-2018 02:30 AM | |
5863 | 05-21-2018 03:42 AM |
01-31-2017
12:24 PM
1 Kudo
@AnisurRehman 1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html 2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link 3. About Incremental: Pls refer "incremental import" from the above link 4. About Impala for Sqoop: a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables) Thanks Kumar
... View more
01-30-2017
12:12 PM
@csguna It is authorized_key nothing to do with hdfs here. so it is user:linux group (instead of hdfs group)
... View more
01-26-2017
08:16 AM
I've set-up hive.prewarm.enabled=true and it did not improve the slow latency to start and initialize executors. It still takes about 15seconds to initialize things. Any idea ?
... View more
01-17-2017
11:50 AM
On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.
... View more
01-12-2017
03:16 PM
@saranvisaThis health check result indicates that NodeManager is not getting enough heap space compared to its workload. Typically when workload grows in the cluster and thus the java daemon needs more heap, you need to give more heap to the Role. You could: 1. Increase the heap given to Node Manager through Node Manager's configuration page.('Java Heap Size of NodeManager in Bytes') 2. Alternatively, though not recommended, you could tune the threshold you found to tolerate higher GC ratio for Node Manager. I would recommend you go to the specific Node Manager's role instance page in Cloudera Manager, browse through the charts available for Node Manager, there would be a chart named 'JVM heap memory usage' telling you the heap consumption of the particular Node Manager. Then you can have a better idea of how much memory the Role is using and potentially increase the heap given to it to a higher value.
... View more
01-12-2017
12:38 PM
Since you have mentioned the word "user role", I want to clarify this You have to understand the difference between Group, User and Role Group and User to be created in both Linux(root user) and Hue(as admin user) But Role to be created only in Hue Ex: Login as root in Linux and apply below commands. Group: groupadd hive; groupadd hue; groupadd impala; groupadd analyst; groupadd admin; # In your case, your Group suppose to be.. Auditor, Read-Only, Limited Operator, Operator, Configurator, Cluster Administrator ,BDR Administrator, Navigator Administrator, User Administrator, Key Administrator, Full Administrator User: useradd kumar; # User belongs to Group usermod -a -G hive,hue,impala,admin,analyst kumar; passwd kumar; # Role assigned to Group: Now, login to Hue -> Security(Menu)-> Sentry Tables -> Add Roles (as Hive user)
... View more
01-12-2017
09:13 AM
@cplusplus1 You can get xml files in the below path... But I will not recommand you to update it directly, instead you can update your configuration using CM /var/run/cloudera-scm-agent/process/*-hive-HIVESERVER2 By default, Sentry requirs configuration changes in Hive, Imapal, YARN and Hue ( you can add addiontal services as needed and change configuration) Ex: You can follow this method CM -> Hive -> Configuration Select Scope > HiveServer2. Select Category > Main. Uncheck the HiveServer2 Enable Impersonation checkbox
... View more
01-06-2017
09:08 AM
FYI... Everything is fine with kadmin.local but kadmin is not working properly.. the same issue was raised by someone else in stackoverflow... I just followed the instruction.. The issue has been fixed now http://stackoverflow.com/questions/23779468/kerberos-kadmin-not-working-properly
... View more
01-03-2017
10:35 PM
Hi, I created a user called "commonuser" and group called "commonuser" in hue and linux machine. Created role called "commonuser" in sentry app to access databases and gave "select" privilege. Now, I logged in as commonuser in hue. In hue-hive editor the databases are visable but not in hue-Impala editor. In impala only the default database without any tables is visable as show in the below screenshot. Please advice me on the issue.
... View more
- « Previous
- Next »