About saranvisa

saranvisa · ‎07-23-2018

@martinbo Regarding multiple user creation on multiple nodes, you have to use configuration tools like puppet, chef, ansible, etc You were asking only about creating a new user in each node, but in real time, your requirement will be extended as follows: 1. Create/modify user at each node 2. Setup temporary password if you don't have sso 3. Create/modify multiple user-groups at each node (admin group, developer group, tester group, analyst, etc) 4. Assign each user to the corresponding user-groups 5. Create a home directory to the each user, setup quota if needed 6. Setup permission & owner to each home directory (as other user should not access) 7. etc There are so many other activities we can do with this tool, but i've listed few based on your requirement... hope it will help

saranvisa · ‎07-19-2018

@scratch28 You can use clouder navigator to generate this report Login as full admin to CM Cloudera Management Service -> Navigator Metadata Server -> Cloudera Navigator (menu) -> search for 'impala' -> choose from left side options. You can choose upto last 365 days (or) custom period

saranvisa · ‎07-03-2018

@Rod CDH 6.0 will support Spark 2.2.0 and Spark 2.2.0 will support Spark SQL as per below link https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#sql Not sure this is the answer for your question, if not, please give some more details

saranvisa · ‎06-14-2018

@KeepCalmNCode You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx) If your block.size > small file then you will not find any difference Ex: All the below will give the same result 445 MB > 1 MB 400 MB > 1 MB 300 MB > 1 MB 200 MB > 1 MB 100 MB > 1 MB 10 MB > 1 MB 2 MB > 1 MB may be you will find difference in file size when you set the block.size < small file

saranvisa · ‎06-13-2018

@Sumit The link that you have provided says "If you have selected IAM authentication, no additional steps are needed" - which includes editing the core-site.xml You have confirmed that added "AWS Credentials" in "External Accounts" from "Administration" menu - which is IAM role based authentication. so you don't need to do both

saranvisa · ‎06-12-2018

@Sumit 1. What is your Cloudera Version? If you are using Cloudera 5.10 or above then you can follow the instruction from the link that i've given above espectially 'Adding AWS Credentials' . You didn't mentioned whether you have already tried or not 2. Not sure you have restarted your cluster after the configuration change 3. If you follow this option, it will be applicable for all the users 4. I don't know which blog that you are following and how old it is... if you are using cloudera then use cloudera document

saranvisa · ‎06-09-2018

@Sumit Pls refer the below link, it will explain S3 as storage for Impala tables S3 as a source or destination for HDFS and Hive/Impala replication and for cluster storage Where to update the credentials, etc https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_auth_aws.html#concept_tmd_nsh_2y

saranvisa · ‎06-09-2018

@AcharkiMed Basically this is a recommendation, still it will be considered as Mandatory or Optional depends upon the environment that you are using Ex: *) For Prod Env - It is mandatory. Otherwise it will create performance difference if there is any switch between active and standby NN *) For test/POC Env - It is optional if you don't have right choice

saranvisa · ‎05-21-2018

@sim6 I hope you have more than 3 data nodes Generally there two types of "data missing" issues are possible for many reasons a. ReplicaNotFoundException b. BlockMissingException If your issue is related to BlockMissingException and if you have backup data in your DR environment then you are good otherwise it might be a problem, but for ReplicaNotFoundException, please make sure all your datanodes are healthy and commissioned state. In fact, namenode suppose to handle this automatically whenever a hit occurs on that data.. if not, you can also try hdfs rebalance (or) NN restart may fix this issue, but you don't need to try this option unless some user report any issue on the particular data. In your case no one reported yet and you found it, so you can ignore it for now

saranvisa · ‎05-09-2018

@hendry There could be two possibilities for this scenario 1. May be the hive and impala tables are referring to the two different files. But chances are less for this scenario unless any minor mistakes in the tables (or) some other internal error You can confirm this by > describe formatted db.tablename Run this command from both hive and impala and get the location and compare 2. Your file has duplicate records. I mean some key values are same but other columns may have different value. So it may return different value when you filter. So check your data in detail

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: Sentry + Kerberos + Impala : manage users

Re: How do I see user usage stats by table in Impa...

Re: Will Spark SQL be officially supported in CDH6...

Re: hive set block size not working

Re: Update core-site.xml for hive from Cloudera Ma...

Re: Update core-site.xml for hive from Cloudera Ma...

Re: Update core-site.xml for hive from Cloudera Ma...

Re: HDFS High Availability - is the Active & stand...

Re: Replica Not FoundException

Re: Select in impala has different value with hive