About saranvisa

saranvisa · ‎11-15-2017

@hparteaga The correct way is 1. In Cloudera manager -> Add Sentry Service and make sure it has Hue 2. Login to Hue -> Go to Security Menu -> it will have sub menu called either Sentry table (or) Hive table. The below link will explain why either sentry table or hive table. Use this option to set the db, table, column level authentication http://community.cloudera.com/t5/Security-Apache-Sentry/Hive-Tables-instead-Sentry-Tables/m-p/48740#M190

saranvisa · ‎11-13-2017

@cdhhadoop Try the below, it may help you Cm -> Yarn -> Configuration -> "Java Heap Size of NodeManager in Bytes" Get the current value like 1GB or 2GB, etc... Increase one extra GB, ex: if it is 1GB, increase it to 2GB (or) Cm -> Yarn -> Configuration -> "Garbage Collection Duration Monitoring Period" Increase it from 5 mins to 10 mins restart yarn as needed

saranvisa · ‎11-07-2017

@epowell The issue might be related to the below jira which is opened a long back still in open status https://issues.apache.org/jira/browse/HDFS-3447 as an alternate way to connect to hdfs, go to hdfs-site.xml and get dfs.nameservices and try to connect to hdfs using namespace as follows, it may help you hdfs://<ClusterName>-ns/<hdfs_path> Note: I didn't get a chance to explore this... also not sure how it will respond in old cdh version

saranvisa · ‎11-06-2017

@gaurav796 The difference is insertInto: To overwrite any existing data Mode comes with additional options, like mode("append"): Append contents of this DataFrame to existing data mode("overwrite:): Overwrite existing data. Note: I didn't get a chance to explore this before reply

saranvisa · ‎11-03-2017

@dubislv Pls follow this steps 1. Ex: Impala -> instances -> Role Groups -> Create (as needed and choose the existing group) 2. Ex: Impala -> instances -> Role Groups -> click on already existing group (in your case Impala Daemon Default Group) -> Select the host -> Action for Selected -> Move to Different Role Group -> select the newly created group

saranvisa · ‎11-02-2017

@ganeshkumarj a. mapred.map.tasks - The default number of map tasks per job is 2. Ignored when mapred.job.tracker is "local". You can modify using set mapred.map.tasks = <value> b. mapred.reduce.tasks - The default number of reduce tasks per job is 1. Typically set to 99% of the cluster's reduce capacity, so that if a node fails the reduces can still be executed in a single wave. Ignored when mapred.job.tracker is "local". you can modify using set mapred.reduce.tasks = <value> https://hadoop.apache.org/docs/r1.0.4/mapred-default.html

saranvisa · ‎10-18-2017

@cdhhadoop as mentioned, you will get warning if b > a

saranvisa · ‎10-17-2017

@cdhhadoop Get the value of a. CM -> HDFS -> Configuration -> DataNode Block Count Thresholds b. CM -> HDFS -> WebUI -> Namenode Web UI -> Click on datanode menu -> Get the block count of your node if b > a then you will get block count warning also cloudera advice says "presence of many small files" also create this warning action: 1. if it is not disturbing anything then you can ignore this warning but just keep an on eye on block pool usage percentage from 'b' 2. you can increase block count thresholds in 'a' 3. you can cleanup unwanted data, but if your trash folder maintains old data (for ex: 24 hrs) then you will see the result after 24 hours 4. add additional data nodes and apply rebalance etc

saranvisa · ‎10-06-2017

@desind To add on to your point, the cluster setup is applicable to all the mapreduce job, so it may impact other non-mapreduce jobs. In fact I am not against setup higher value in cluster itself, but you can do that based on how many jobs requires higher values and performance, etc

saranvisa · ‎10-04-2017

@wchagas One common reason to disable the firewall is, as we know HDFS maintains replication in different nodes/racks but it shouldn't take any extra time for that. Setting firewall using SElinux may disturb this (or) lead to performance issue. So the general recommendation is to disable the firewall. But I believe in some cases users are still using hadoop with firewall for security reasons (if the business really demands). Regarding your question about security, you can follow the other recommended securities like kerberos, sentry, etc (depends upon your needs).

Online	Offline
Last Visited	‎08-10-2019 05:12 PM

Member Since	‎09-02-2016 11:35 AM
Last Visited	‎08-10-2019 05:12 PM
Posts	523
Kudos received	96

Cloudera Community

Re: Promoting Metadata

Re: Mix on premise and cloud nodes

Re: impala-shell

Re: How do I see user usage stats by table in Impa...

Re: Replica Not FoundException

Re: Is it possible to configure auths to fields or...

Re: GC duration concerning for jobtracker in CDH5....

Re: ls: Operation category READ is not supported i...

Re: Inserting data from a dataframe to an existing...

Re: Can't move instance to Different Role Group

Re: Hive limit number of mappers and reducers

Re: block count warning still shows in cloudera ma...

Re: block count warning still shows in cloudera ma...

Re: Map and Reduce Error: Java heap space

Re: Why I do need to turn off SElinux?