Member since
09-02-2016
523
Posts
89
Kudos Received
42
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2309 | 08-28-2018 02:00 AM | |
2160 | 07-31-2018 06:55 AM | |
5070 | 07-26-2018 03:02 AM | |
2433 | 07-19-2018 02:30 AM | |
5863 | 05-21-2018 03:42 AM |
07-23-2018
10:51 AM
1 Kudo
@martinbo Regarding multiple user creation on multiple nodes, you have to use configuration tools like puppet, chef, ansible, etc You were asking only about creating a new user in each node, but in real time, your requirement will be extended as follows: 1. Create/modify user at each node 2. Setup temporary password if you don't have sso 3. Create/modify multiple user-groups at each node (admin group, developer group, tester group, analyst, etc) 4. Assign each user to the corresponding user-groups 5. Create a home directory to the each user, setup quota if needed 6. Setup permission & owner to each home directory (as other user should not access) 7. etc There are so many other activities we can do with this tool, but i've listed few based on your requirement... hope it will help
... View more
07-19-2018
02:30 AM
@scratch28 You can use clouder navigator to generate this report Login as full admin to CM Cloudera Management Service -> Navigator Metadata Server -> Cloudera Navigator (menu) -> search for 'impala' -> choose from left side options. You can choose upto last 365 days (or) custom period
... View more
07-03-2018
04:05 AM
@Rod CDH 6.0 will support Spark 2.2.0 and Spark 2.2.0 will support Spark SQL as per below link https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#sql Not sure this is the answer for your question, if not, please give some more details
... View more
06-14-2018
05:06 AM
@KeepCalmNCode You have mentioned there are lot of small files. And you set the block.size as 445644800 (which is 445 MB approx) If your block.size > small file then you will not find any difference Ex: All the below will give the same result 445 MB > 1 MB 400 MB > 1 MB 300 MB > 1 MB 200 MB > 1 MB 100 MB > 1 MB 10 MB > 1 MB 2 MB > 1 MB may be you will find difference in file size when you set the block.size < small file
... View more
06-13-2018
05:31 AM
@Sumit The link that you have provided says "If you have selected IAM authentication, no additional steps are needed" - which includes editing the core-site.xml You have confirmed that added "AWS Credentials" in "External Accounts" from "Administration" menu - which is IAM role based authentication. so you don't need to do both
... View more
06-12-2018
05:46 AM
@Sumit 1. What is your Cloudera Version? If you are using Cloudera 5.10 or above then you can follow the instruction from the link that i've given above espectially 'Adding AWS Credentials' . You didn't mentioned whether you have already tried or not 2. Not sure you have restarted your cluster after the configuration change 3. If you follow this option, it will be applicable for all the users 4. I don't know which blog that you are following and how old it is... if you are using cloudera then use cloudera document
... View more
06-09-2018
01:51 AM
@Sumit Pls refer the below link, it will explain S3 as storage for Impala tables S3 as a source or destination for HDFS and Hive/Impala replication and for cluster storage Where to update the credentials, etc https://www.cloudera.com/documentation/enterprise/5-12-x/topics/cm_auth_aws.html#concept_tmd_nsh_2y
... View more
06-09-2018
01:46 AM
@AcharkiMed Basically this is a recommendation, still it will be considered as Mandatory or Optional depends upon the environment that you are using Ex: *) For Prod Env - It is mandatory. Otherwise it will create performance difference if there is any switch between active and standby NN *) For test/POC Env - It is optional if you don't have right choice
... View more
05-21-2018
03:42 AM
1 Kudo
@sim6 I hope you have more than 3 data nodes Generally there two types of "data missing" issues are possible for many reasons a. ReplicaNotFoundException b. BlockMissingException If your issue is related to BlockMissingException and if you have backup data in your DR environment then you are good otherwise it might be a problem, but for ReplicaNotFoundException, please make sure all your datanodes are healthy and commissioned state. In fact, namenode suppose to handle this automatically whenever a hit occurs on that data.. if not, you can also try hdfs rebalance (or) NN restart may fix this issue, but you don't need to try this option unless some user report any issue on the particular data. In your case no one reported yet and you found it, so you can ignore it for now
... View more
05-09-2018
10:13 AM
@hendry There could be two possibilities for this scenario 1. May be the hive and impala tables are referring to the two different files. But chances are less for this scenario unless any minor mistakes in the tables (or) some other internal error You can confirm this by > describe formatted db.tablename Run this command from both hive and impala and get the location and compare 2. Your file has duplicate records. I mean some key values are same but other columns may have different value. So it may return different value when you filter. So check your data in detail
... View more