About drussell

drussell · ‎05-13-2016

Hi @simran kaur this may or may not help depending on your exact scenario, however I've done something similar before by using Falcon (which is driving Oozie underneath) to do exactly this. Have a look at https://github.com/apache/falcon/tree/master/addons/hdfs-snapshot-mirroring The reason this is nice is that it provides built in functionality to handle: * Create snapshots in source directory * Copy this directory between HDFS clusters * Create snapshot in target directory * Handle snapshot retention in source and target directories It's honestly going to be much easier than writing that all yourself within Oozie, you don't need to use it to mirror those snapshots between clusters, you can use it within a single cluster. Hope that helps!

drussell · ‎05-12-2016

Hi @ccasano, understood, I don't believe such a list exists right now, unless @lpapp knows differently, or could generate such a list?

drussell · ‎05-12-2016

Hi @kavitha velaga for this kind of monitoring, I'd suggest using an external monitoring framework, something like Munin, Ganglia or whatever framework you already use within your org. Most of these frameworks can handle the recording of Round Trip Times (RTT) from hosts to something like an s3 endpoint. Hope that helps.

drussell · ‎05-12-2016

Hi @Anandha L Ranganathan All the steps are covered in the ambari wizard for the downgrade, but you'll need to pay close attention to the various databases etc that require backing up as you go through. Both roll-back and full downgrade are possible. Make sure you have satisfied all of the prerequisites before beginning as well, and preferably have read through the upgrade document several times to ensure you have accounted for everything. http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_upgrading_Ambari/content/_upgrading_HDP_prerequisites.html I've performed a number of upgrades myself and while there is always the odd thing that crops up, its usually something minor that is easily fixed. Good luck and I hope everything goes smoothly.

drussell · ‎05-12-2016

Hi @ccasano There was a post that covers the minimum privilidge set that you require for Cloudbreak on AWS by @lpapp. https://community.hortonworks.com/questions/30242/list-of-policies-required-by-cloudbreak-to-launch.html As for the intracacies of VPC management, unless anyone here knows, that might be a question better answered by Amazon.

drussell · ‎05-10-2016

@Fazil Aijaz I've also reached out to the training team asking them to respond to those comments, so thanks for drawing our attention to them!

drussell · ‎05-10-2016

Hi @Fazil Aijaz yes I believe those issues have been resolved, as you noted, a good fast internet connection is certainly required, but I know a few people who have completed the exam sucessfully in the last week or so. I will verify with the team that everything is working as expected, but if you don't hear from me here in the next day or so then everything is good to go. Good luck with your exam!

drussell · ‎05-10-2016

Hi @John Yawney. So answering your questions one at a time: 1) Currently Atlas 0.6 (that or later is expected in the next HDP release) supports the following hooks (therefore tracks the govornanace information for data that is touched by these systems) Hive, Sqoop, Falcon, Storm (with Spark, NiFi and HBase expected around end of year). Currently anything that doesn't have a hook won't be tracked. For time frame information on the next release, there's no public information but if you look historically we have regularly announced new releases around the US Hadoop Summit timeframe. This gives a good idea of what is coming down the line for Atlas: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62695330 2) Atlas is designed to be open and extensible so you could absolutely add 3rd party liniage information into the metastore. Any work you do in that area would also be greatly appreciated if contributed upstream also, with the added benifit that when accepted you won't need to suppor that code on your own. 3) Assuming you're talking Hive here? Column level liniage is expected around the end of the year. 4) You can, usually in combination with Ranger, but you'll need to be more specific about exactly what you mean for these rules. Additional notes.... For more detailed information, as I know the documentation around Atlas is pretty poor... I'd strongly advise watching three sessions that occurred during the recent European Hadoop Summit, search for sessions by Andrew Ahn (there are three!) http://www.hadoopsummit.org/dublin/agenda/

drussell · ‎05-09-2016

So I'd start looking at what log files are consuming space in /var/log remove some of the older ones that have rolled over etc should be pretty safe. 4.9GB in /usr seems a bit large too, maybe investigate what's consuming such a large percentage of your space in there too. As usual, remove any unneeded packages at the O/S level of course. 10GB is honestly a bit small for a root partition, might want to bump that up a bit, or at least spinning up some extra storage to mount as /var and /usr to give yourself a bit more space. Hadoop is very good at generating logs, so it's very easy to fill up a root partition if you're not careful and don't have it split off elsewhere.

drussell · ‎05-09-2016

Hi @Sunile Manjee. It's a bit more fully featured than that, Wandisco solutions generally operate at a level above the underlying service, so if one of the HBase clusters dies, the users don't notice any change, the second area of significant importance is Wandisco's solution ability to handle significant distances between clusters, where as standard HBase replication is usually within the same datacentre or only via very fast links, not full geo replication. https://www.wandisco.com/product/fusion-active-active-hbase They also have a webinar on HBase, the final section is all about using Fusion Active HBase across multiple datacentres: http://www.wandisco.com/webinar/replay/hadoop-hbase-depth Hope that helps!

Online	Offline
Last Visited	‎12-10-2018 10:03 AM

Member Since	‎09-18-2015 08:21 AM
Last Visited	‎12-10-2018 10:03 AM
Posts	191
Kudos received	80

Cloudera Community

Re: Metastore HA Active/Active ?

Re: Hi All, I want to integrate Ab initio tool wit...

Re: Hadoop Rack-Awareness is only for datanode ser...

Re: Kafka installation best practices in HDF

Re: Best tools for file transfer and ingest.

Re: How to take daily snapshots of data through oo...

Re: What is the least privilege access model for A...

Re: how to check the connectivity between the serv...

Re: Downgrade HDP 2.4 to previous version

Re: What is the least privilege access model for A...

Re: HDPCD Exam Environment issues

Re: HDPCD Exam Environment issues

Re: Atlas 0.5: Current Functionalities

Re: Releasing storage space from host

Re: HBase Wandisco value add?