Member since
09-02-2016
523
Posts
89
Kudos Received
42
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2692 | 08-28-2018 02:00 AM | |
2670 | 07-31-2018 06:55 AM | |
5634 | 07-26-2018 03:02 AM | |
2948 | 07-19-2018 02:30 AM | |
6418 | 05-21-2018 03:42 AM |
11-15-2017
07:38 AM
1 Kudo
@hparteaga The correct way is 1. In Cloudera manager -> Add Sentry Service and make sure it has Hue 2. Login to Hue -> Go to Security Menu -> it will have sub menu called either Sentry table (or) Hive table. The below link will explain why either sentry table or hive table. Use this option to set the db, table, column level authentication http://community.cloudera.com/t5/Security-Apache-Sentry/Hive-Tables-instead-Sentry-Tables/m-p/48740#M190
... View more
11-13-2017
07:58 AM
@cdhhadoop Try the below, it may help you Cm -> Yarn -> Configuration -> "Java Heap Size of NodeManager in Bytes" Get the current value like 1GB or 2GB, etc... Increase one extra GB, ex: if it is 1GB, increase it to 2GB (or) Cm -> Yarn -> Configuration -> "Garbage Collection Duration Monitoring Period" Increase it from 5 mins to 10 mins restart yarn as needed
... View more
11-07-2017
08:01 PM
@epowell The issue might be related to the below jira which is opened a long back still in open status https://issues.apache.org/jira/browse/HDFS-3447 as an alternate way to connect to hdfs, go to hdfs-site.xml and get dfs.nameservices and try to connect to hdfs using namespace as follows, it may help you hdfs://<ClusterName>-ns/<hdfs_path> Note: I didn't get a chance to explore this... also not sure how it will respond in old cdh version
... View more
11-06-2017
07:38 AM
@gaurav796 The difference is insertInto: To overwrite any existing data Mode comes with additional options, like mode("append"): Append contents of this DataFrame to existing data
mode("overwrite:): Overwrite existing data. Note: I didn't get a chance to explore this before reply
... View more
11-03-2017
08:16 AM
2 Kudos
@dubislv Pls follow this steps 1. Ex: Impala -> instances -> Role Groups -> Create (as needed and choose the existing group) 2. Ex: Impala -> instances -> Role Groups -> click on already existing group (in your case Impala Daemon Default Group) -> Select the host -> Action for Selected -> Move to Different Role Group -> select the newly created group
... View more
11-02-2017
08:52 AM
@ganeshkumarj a. mapred.map.tasks - The default number of map tasks per job is 2. Ignored when mapred.job.tracker is "local". You can modify using set mapred.map.tasks = <value> b. mapred.reduce.tasks - The default number of reduce tasks per job is 1. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". you can modify using set mapred.reduce.tasks = <value> https://hadoop.apache.org/docs/r1.0.4/mapred-default.html
... View more
10-18-2017
07:24 AM
@cdhhadoop as mentioned, you will get warning if b > a
... View more
10-17-2017
01:01 PM
@cdhhadoop Get the value of a. CM -> HDFS -> Configuration -> DataNode Block Count Thresholds b. CM -> HDFS -> WebUI -> Namenode Web UI -> Click on datanode menu -> Get the block count of your node if b > a then you will get block count warning also cloudera advice says "presence of many small files" also create this warning action: 1. if it is not disturbing anything then you can ignore this warning but just keep an on eye on block pool usage percentage from 'b' 2. you can increase block count thresholds in 'a' 3. you can cleanup unwanted data, but if your trash folder maintains old data (for ex: 24 hrs) then you will see the result after 24 hours 4. add additional data nodes and apply rebalance etc
... View more
10-06-2017
09:11 AM
@desind To add on to your point, the cluster setup is applicable to all the mapreduce job, so it may impact other non-mapreduce jobs. In fact I am not against setup higher value in cluster itself, but you can do that based on how many jobs requires higher values and performance, etc
... View more
10-04-2017
01:29 PM
@wchagas One common reason to disable the firewall is, as we know HDFS maintains replication in different nodes/racks but it shouldn't take any extra time for that. Setting firewall using SElinux may disturb this (or) lead to performance issue. So the general recommendation is to disable the firewall. But I believe in some cases users are still using hadoop with firewall for security reasons (if the business really demands). Regarding your question about security, you can follow the other recommended securities like kerberos, sentry, etc (depends upon your needs).
... View more