About wfloyd

aengineer · ‎11-02-2015

@Wes Floyd Since there are multiple questions here I am going to answer each question individually. > When a single drive fails on a worker node in HDFS, can this adversely affect performance of jobs running on this node? The answer to this question is it depends. If this node is running a job that is accessing blocks on the failed volume, then yes. it is also possible that the job would be treated as failed if the dfs.datanode.failed.volumes.tolerated is not greater than 0. If it is not a value greater than zero, then HDFS treats a loss of a volume as catastrophic and marks the datanode as failed. If this is set to a value greater than zero, then node will work well until we lose more volumes. > If this could cause a performance impact, how can our customers monitor for these drive failures in order to take corrective action? Now this is a hard question to answer without further details. I am tempted to answer that the performance benefit you are going to get by monitoring and relying on a human being to take a corrective action is very doubtful. YARN / MR or whatever execution engine you are using is probably going to be much more efficient at re-scheduling your jobs. >Or does the DataNode process quickly mark the drive and its HDFS blocks as "unusable". Datanode does mark the volume as failed , and namenode will learn that all the blocks on that failed volume are not available on that datanode any more. This happens via something called "block reports". Once namenode learns that data node has lost the replica of a block then namenode will initiate appropriate replication. Since namenode knows about the loss of blocks, further jobs that need access to those block would most probably not be scheduled on that node. This again depends on the scheduler and its policies.

aervits · ‎02-02-2016

@Wes Floyd has this been resolved? Can you accept the best answer or provide your own solution?

bganesan · ‎05-10-2016

To add in, as a best practice, we recommend customers using HDP 2.3.x or HDP 2.4.x to configure their audits to both Solr and HDFS. HDFS destination is for long term audit storage, while Solr could be used for short term audit query from the Ranger UI. It is recommended to use a ttl (time to live) setting in Solr to ensure documents are deleted automatically after a certain period.

kevin_minder · ‎10-22-2015

Keep in mind that you can broaden the scope of the search by making the value of main.ldapRealm.userSearchBase to possible help with part 2 of the question. However, you need to be careful for two reasons: Performance - The larger the base of the search the more entries that must be searched for the matching userSearchAttributeName. Query Limits - In some LDAP implementation there are limits to how may entires a given query can return so too broad a userSearchBase may result in a failed queries causing authentication failures.

bhadraarijit84 · ‎09-27-2016

Sir i don't think you have made any mistake while creating the view, just put a double cote (") around the view name while query select * from "weblog"; @Wes Floyd

aervits · ‎02-04-2016

@Wes Floyd I pinged sme-hive for an answer, @gopal responded with the following statement. Half of all interactive tuning will be replaced by LLAP. Hive 2.0 is days from being released in the Apache.

sameer_khanwalk · ‎09-21-2016

Hi, Is there any way where we can trigger ListSFTP based on some status condition. Say ListSFTP should pick or transfer files based on some status. for example if status is start it should start fetching files and if it something else then it should stop.

jstraub · ‎10-12-2015

Hey Wes, here are some of my notes about HBase (Tuning). Validate your data model (key design, naming, no. versions, ...) Check configuration (RAM, BlockSize,...) Caching http://www.n10k.com/blog/blockcache-101/ http://www.n10k.com/blog/blockcache-showdown/ Compression (Gzip,Snappy,...) How is data retrieved? Random lookups => Maybe smaller block size makes more sense? Sequential scans => Maybe higher block size makes more sense? Check whether you have hot spots in your Hbase Environment? http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/ "Hot Spotting is an hbase phenomenon where a region server is hosting most sought after data. This causes that region server to run really hot and potentially slow down and run in a degraded mode." Metrics to check BlockCache Ratio and Hit/Miss Compaction Queue Memstore size Flush Queue Call Queue CPU load & WIO Memory usage Latency IOPS JVM Metrics (GC, Log,...) Links: http://blog.sematext.com/2012/07/16/hbase-memstore-what-you-should-know/ http://hortonworks.com/blog/introduction-to-hbase-mean-time-to-recover-mttr/ http://hadoop-hbase.blogspot.de/2014/03/hbase-gc-tuning-observations.html http://hadoop-hbase.blogspot.de/2015/01/more-hbase-gc-tuning.html Not a consolidated guide, but it might help anyway 🙂 Let me know if you need more information about metrics or specific tuning methods (RAM, blockSize, etc.) I might have some more documents on my drive. Jonas

MattWho · ‎01-11-2016

By Default NiFi will run as what ever user starts NiFi. NiFi can however be configured to start as a specific user by setting the run.as= property inside the bootstrap.conf file. *** Note that the user configured in this property will need to have sudo privileges to the java executable on linux based deployments. This could interfere with some processors that are dependent on the java process being owned by a particular user since it will be owned by root. *** Setting the run.as user allows you to setup NiFi as a service that can be configured to start as part of the OS starting.

JoeWitt · ‎10-08-2015

Puppet, Vagrant, and home grown goodness have all been used pretty extensively to ensure consistent configuration of servers and such. We need to keep doing more work though to separate concerns of 'box configuration' from 'nifi node' configuration. To the greatest extent possible NiFi's clustering should handle the 'nifi node' shared configuration needs and the box/os/security settings should be handled by systems like puppet or what the ops teams are most familiar with.

Online	Offline
Last Visited	‎04-24-2017 02:32 PM

Member Since	‎09-23-2015 09:15 PM
Last Visited	‎04-24-2017 02:32 PM
Posts	88
Kudos received	109

Cloudera Community

Re: Is there is any workaround to map csv columns ...

Re: How to Alert for HDFS Disk Failures

Re: Does Knox allow LDAP Password to be stored out...

Re: Ranger Audit Options - Is DB Audit still suppo...

Re: Does Knox support active directory searches us...

Re: How to Map HBase Table to Phoenix ("Table unde...

Re: Hive User Concurrency - Reconciling YARN Capac...

Re: NiFi - How to use the GetSFTP processor withou...

Re: What is the Best consolidated Guide for HBase ...

Re: How to run NiFi as Non-Root User?

Re: Which tool is most popular for managing large ...