About LesterMartin

LesterMartin · ‎03-08-2016

First, spot-on by letting the ZK processes write to their own disks. As for letting the active/passive NNs write to the same physical disks as the JNs, I think you are OK with that approach. I say that as the edits are what are being written to continuously, but the fsimage files are only being read/recreated at key points such as checkpointing and startup. I probably pitched a bit of overkill in a blog I did last year on this topic of filesystems, but feel free to check it out at https://martin.atlassian.net/wiki/x/EoC3Ag if you need some help going to sleep at night. 😉 If you do check it out, you'll notice my very clear advice is that you should still make backups of the fsimage/edits files (even w/HA enabled) to avoid a potential "bunker scene" of your own. Having seen what happens first hand by losing this information (it was a configuration screw-up, not a h/w failure), I know I simply don't want to be there again.

LesterMartin · ‎03-05-2016

I know... I'm the FIFTH person to say create a View and secure permissions on it, and the backing table, appropriately. 😉 That said, I've got a simple little demo posted at https://github.com/HortonworksUniversity/Essentials/blob/master/demos/ranger/README.md along with a video recording of it linked there in case anyone might find that useful.

LesterMartin · ‎03-02-2016

Based on my findings, I'm not sure we have anything to change on the documentation front. We could probably build something in the webapp itself that creates that splash screen to provide some more details (or an error?) if you come in as http://localhost:8888.

LesterMartin · ‎03-02-2016

I think I figured it out. I was able to verify the two different ports (2222 and 22) are displayed based on it you hit 127.0.0.1:8888 or localhost:8888 in your browser as @Michael Häusler's side-by-side screenshot shows. I double-checked the port forwarding in VirtualBox as shown below. No matter which way you hit it via the browser, we are always going to port 8888 on the Sandbox itself. It seems there is some logic in the webapp that you hit that leverages which address you used to get to it. Thankfully, the on-screen instructions drive you to use http://127.0.0.1:8888/. If you look close, you'll see that the all the hostnames on that splash screen change between 127.0.0.1 and localhost based on the original URL you use. I believe the reason the webapp is doing this is based on if you were actually running a windowing system on the Sandbox itself. We are basically just SSH'ing in and aren't using a windowing interface. My hunch then was that depending on where you are accessing this makes a difference and I did validate that by some SSH testing. If you are coming in from your host OS then you can only SSH over port 2222, but it doesn't matter if you say localhost or 127.0.0.1. HW10653-2:~ lmartin$ ssh root@127.0.0.1 -p 22 ssh: connect to host 127.0.0.1 port 22: Connection refused HW10653-2:~ lmartin$ ssh root@localhost -p 22 ssh: connect to host localhost port 22: Connection refused HW10653-2:~ lmartin$ ssh root@localhost -p 2222 root@localhost's password: Last login: Wed Mar 2 23:18:00 2016 from 10.0.2.2 [root@sandbox ~]# exit logout Connection to localhost closed. HW10653-2:~ lmartin$ ssh root@127.0.0.1 -p 2222 root@127.0.0.1's password: Last login: Wed Mar 2 23:19:13 2016 from 10.0.2.2 [root@sandbox ~]# Conversely, once you "in" the Sandbox (in the following case it doesn't matter if you SSH'd in or if you were logged in via a windowing manager) you can only use port 22; again, it doesn't matter how you refer to the hostname. [root@sandbox ~]# ssh root@127.0.0.1 -p 22 Last login: Wed Mar 2 23:26:13 2016 from 10.0.2.2 [root@sandbox ~]# ssh root@localhost -p 22 Last login: Wed Mar 2 23:27:10 2016 from 127.0.0.1 [root@sandbox ~]# ssh root@localhost -p 2222 ssh: connect to host localhost port 2222: Connection refused [root@sandbox ~]# ssh root@127.0.0.1 -p 2222 ssh: connect to host 127.0.0.1 port 2222: Connection refused [root@sandbox ~]# I'd have to put my networking cap on tighter than it is right now to fully explain why this is the case, but with these limitations in play of port 22 and 222 depending on if you are currently logged into the Host OS or the Guest OS, we really do need the splash screen to be dynamic like it is now. That said, it is easy to see why one would get confused if they got creative and just started with http://localhost:8888 instead of http://127.0.0.1:8888 as the instructions said.

LesterMartin · ‎03-02-2016

Interesting, my copy (I'm also using VirtualBox) shows 2222 as I'm pasting below from the 8888 splash page. It doesn't make it painfully obvious you should type "ssh root@127.0.0.1 -p 2222", but it does look correct. Can you paste a screenshot @Michael Häusler as well as the exact URL you used to pull down the Sandbox VM? Secure Shell (SSH) Client With an SSH client of your choice use the following credentials: IP: 127.0.0.1 port: 2222 username: root password: hadoop

LesterMartin · ‎02-19-2016

Yep, what @Joe Widen said and sprinkle in a bit of "it depends" on top for good measure. If I put my consulting cap on again, I'd ask "what are you trying to do?". Meaning, is the job in question more about memory usage or overall compute time to complete. There will be factors at play that don't let you have a perfect yes or no answer to the question above and I'd recommend picking an indicative use case of your data processing and have a small bake-off on your cluster based on what is most important to you if you really need to a more data-driven decision. Chances are that your decision will be based on what you plan on developing in going forward more so than the answer to this question about memory usage.

LesterMartin · ‎02-09-2016

I'm working with @Rafael Coss to make sure the instructions are extremely crisp as I think there are a few things that could easily trip up a novice which is who we are targeting with these tutorials.

LesterMartin · ‎02-09-2016

Yep, my History Server was down and had to be manually started.

LesterMartin · ‎02-09-2016

It ~seems~ that the Ambari Views were adding about 30 seconds to the run times. Here's some of my notes around timings; notice the actual log-reported job times are pretty consistent from CLI and View runs. Ran From Exec Eng Job Time Clock Time Ambari View MR 64 sec 103 sec Ambari View Tez 25 sec 63 sec CLI MR 59 sec 61 sec CLI Tez 25 sec 27 sec Actual job times were consistent for each execution engine (Tez twice as fast), but Ambari View ~seemed~ to add 30+ secs overall. I'm sure my the extremely constrained HDP stack on a tiny little psuedo-cluster (aka the Sandbox) is a big factor in this (understandable).

LesterMartin · ‎02-09-2016

I just ran this tutorial on my 16GB i7 MBPro (gave the VM 8GB just as you) and could get it to run in 100 secs with MR and about 65 secs using Tez. I then ran the same script from the CLI and got those times down to about 60 and 25 secs on MR and Tez, respectively. I'm using the 2.3.2 Sandbox and the only thing I had to do was start the History Server was showing up red in Ambari.

Online	Offline
Last Visited	‎03-04-2021 02:39 PM

Member Since	‎05-02-2019 12:59 PM
Last Visited	‎03-04-2021 02:39 PM
Posts	319
Kudos received	145

Cloudera Community

Re: How to create partitions on existing Hive tabl...

Re: Copying data from One HBase to another Hbase c...

Re: Number of Concurrent Users on HDP Sandbox in a...

Re: Reason for Hive dependency on PIg during insta...

Re: One datanode nearly full but not the others

Re: Where to write fsimage files when running QJM ...

Re: What is the best way to implement row-based se...

Re: HDP 2.4.0 Sandbox: documentation typo regardin...

Re: HDP 2.4.0 Sandbox: documentation typo regardin...

Re: HDP 2.4.0 Sandbox: documentation typo regardin...

Re: For running the same kind of use-case do we ne...

Re: How To Process Data with Apache Pig tutorial S...

Re: How To Process Data with Apache Pig tutorial S...

Re: How To Process Data with Apache Pig tutorial S...

Re: How To Process Data with Apache Pig tutorial S...