About mfoley

mfoley · ‎12-15-2016

@Arsalan Siddiqi , Sorry you're having difficulties. It's pretty hard to debug access problems remotely, as you know, but let's see what we can do. Let me clarify a couple things: Is it correct that you are able to use: ssh root@127.0.0.1 -p 2222 to connect over ssh in putty, but when you use: scp -P 2222 mylocalfile.txt root@127.0.0.1:/root/ it rejects the connection? Are you using all the parts in the above statement? The port: "-P 2222" (capital P, not lower case as with ssh). The root user id "root@" The colon after the IP address "127.0.0.1:" The destination directory (/root/ in the example above, but you can use any absolute directory path) If you've been using "localhost" as in the tutorial, try "127.0.0.1" instead. If you've been cut-and-pasting the command line, try typing it instead. The "-" symbol often doesn't cut-and-paste correctly, because formatted text may use character "m-dash" or "n-dash" instead of "hyphen". It is much safer to type it than to paste it. You are using Windows, correct? I'm a little confused about how you're connecting through Putty, as I remember Putty wanting the connection info in a dialogue box before the connection, whereas on a Mac or Linux box, the terminal application just opens a terminal on the box itself, and you then FURTHER connect via typing the ssh command. So, did you actually configure Putty with the "ssh" request, the port number, and the user and host info, in a dialogue box rather than typing an ssh command line? And did that work correctly? Assuming the answer to the above is "yes" and "yes", the next question is: Where are you typing the "scp" command? You can't type it into the Putty connection with the VM, that won't work. The scp command line is meant to be sent to your native box. Does Putty have a file transfer dialogue box that can use scp protocol? Is that what you're trying to use? Or have you downloaded the "pscp.exe" file from putty.org, and are using that? See http://www.it.cornell.edu/services/managed_servers/howto/file_transfer/fileputty.cfm The full docs for Putty PSCP are at https://the.earth.li/~sgtatham/putty/0.67/htmldoc/Chapter5.html#pscp-usage and shows that pscp also takes a capital "-P" to specify the port number. Worst case, if you can't get any of these working, you've already established the Virtualbox connection for the VM. WIth some effort you can figure out how to configure a shared folder with your host machine, and use it to pass files back and forth.

mfoley · ‎12-14-2016

@Arsalan Siddiqi, I'll try to answer the several pieces of your question. First, I encourage you to go through the Sandbox Tutorial at http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/ It will help you understand a great deal about the Sandbox, and what it is intended to do and how to use it. The sandbox comes with Ambari and the Stack pre-installed. You shouldn't need to change OS system settings in the Sandbox VM, which is intended to act like an "appliance". Also, the sandbox already has everything Ambari needs to run successfully, over the HTTP Web interface. No need to configure graphic packages on the sandbox VM. Ambari provides a quite nice GUI for a large variety of things you might want to do with an HDP Stack, including viewing and modifying configurations, seeing the health and activity levels of HDP services, stopping and re-starting component services, and even viewing contents of HDFS files. While you can view the component config files at (in most cases) /etc/<componentname>/conf/* in the sandbox VM's native file system, please DO NOT try to change configurations there. The configs are Ambari-managed, and like any Ambari installation, if you change the files yourself, the ambari-agent will just change them back! Instead, use the Ambari GUI to change any configurations you wish, then press the Save button, and restart the affected services (Ambari will prompt you). The data files for HDFS are stored as usual in the native filesystem location defined by HDFS config parameter "dfs.datanode.data.dir". However, it won't do you much good to go look there, because the blockfiles stored there are not readily human-understandable. As you may know, HDFS layers its own filesystem on top of the native file system, storing each replica of each block as a file in a datanode. If you want to check the contents of HDFS directories, you're much better off to use the HDFS file browser, as follows: In Ambari, select the HDFS service view, and pull down the "Quick Links" menu at the top center. Select "Namenode UI". In the Namenode UI, pull down the "Utilities" menu at the top right. Select "Browse the file system". This will take you to the "Browse Directory" UI. You may click thru the directory names at the right edge, or type an HDFS directory path into the text box at the top of the directory listing. If you click on a file name, you will see info about the blocks of that file (short files only have Block 0), and you may download the file if you want to see the contents. Note that HDFS files are always immutable. HDFS files may be appended to, but cannot be edited. For copying files to the Sandbox VM, first make sure you can access the sandbox through 'ssh', as documented near the beginning of the Tutorial under "Explore the Sandbox"; then see http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/#send-data-btwn-sandbox-local-machine . Hope this helps.

mfoley · ‎12-14-2016

One way to see whether any Stack services use hardwired "/user" prefix would be to use Ambari to install the whole Stack on a lab machine. Make a change to "dfs.user.home.dir.prefix", to something other than "/user", during Install Wizard time, BEFORE letting Ambari do the actual installation, thus making sure everything from the beginning sees the non-default value. Let it install, start all the services and let them run a few minutes, then see if anything got created under /user/* in HDFS. Sorry I don't have time to do the experiment right now, but if you do please report the results back here as a comment for others to learn from. If you find services that apparently hardwire the "/user" prefix, I'll enter bugs against those components and try to get them fixed.

mfoley · ‎12-13-2016

As usual, parameter changes only affect things going forward. If users have already been created in the default location, their home directories will not be magically re-created in the new location. This could cause problems, depending on whether processes use the parameter vs using hardwired "/user" prefix. For your site-defined users, you can just move their home directories with dfs commands. For the pre-defined Stack service users, you'll need to experiment to see whether they want their home directories to stay in /user or be moved to value(dfs.user.home.dir.prefix). I would start by leaving them in place, but that's just a guess.

mfoley · ‎12-13-2016

@Sean Roberts As noted above, this is controlled by the 'dfs.user.home.dir.prefix' parameter in hdfs-site.xml. However, since this is not a commonly changed parameter, it isn't in the default Ambari configs for HDFS. (It just defaults to the value from hdfs-default.xml.) To change this value in Ambari, do the following: In Ambari, select the HDFS service, then select the "Configs" tab. Within Configs, select the "Advanced" tab Open the "Advanced hdfs-site" section, and confirm that this parameter isn't already present there. Open the "Custom hdfs-site" section, and click the "Add property" link A dialogue pops up, inviting you to type a "key=value" pair in the text field. Enter: dfs.user.home.dir.prefix=/user Press the "Add" button, and the new entry will be converted into a standard parameter entry field. Now change the value of the field to whatever you want (no blank spaces in the path, please). Of course after changing configurations, you have to press the "Save" button at the top of the window. That should do what you need, I hope.

mfoley · ‎06-24-2016

Hi @Yibing Liu When you created your local repo, did you follow the instructions at http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_Installing_HDP_AMB/content/_using_a_local_repository.html ? Did you use tarballs, or reposync? Did you provide an HTML server able to serve the repo contents? Finally, did you adjust the "baseurl" configurations in your ambari.repo, HDP.repo, and HDP-UTILS.repo files to point to the local repo? (and the "gpgkey" configuration in ambari.repo, unless you've turned it off) I don't really understand your statement "i also updated the version/stack to make my ambari connect to my local repo". Adjusting the .repo baseurls should be all that is needed. Is this a fresh install, or are you adding a new host to an existing cluster? Just to eliminate another common source of error, what version of python to you have installed on the server, and is it in the PATH for the userid performing the install?

mfoley · ‎06-23-2016

@sprakash The fact that distcp works with some configurations indicates you probably have Security set up right, as well as giving you an obvious work-around. To try to answer your question, please provide some clarifying information: When you speak of mapred-client.xml, do you mean mapred-site.xml on the client machine? When you speak of changing the framework, do you mean the "mapreduce.framework.name" configuration parameter in mapred-side.xml? Do you change it only on the client machine, or throughout both clusters? The allowed values of that parameter are "local", "classic", and "yarn". When you change it to not be "yarn", what do you set it to? Do you have "mapreduce.application.framework.path" set? If so, to what value?

mfoley · ‎04-19-2016

As Benjamin said, strongly encourage you to establish your process with a small test cluster first. However, I do not expect problems with the data. Hadoop is written in Java, so the form of data should be same between operating systems, especially all Linux variants. Warning: Do not upgrade both operating system and HDP version all at once! Change one major variable at a time, and make sure the system is stable in between. So go ahead and change OS, but keep the HDP version the same until you are done and satisfied with the state of the new OS. The biggest potential gotcha is if you experience ClusterID mismatch as a result of your backup and restore process. If you are backing up the data by distcp-ing it between clusters, then this won't be an issue; the namespaceID/clusterID/blockpoolID probably will change, but it won't matter since distcp actually creates new files. But if you are trying to use traditional file-based backup and restore, from tape or a SAN, then you may experience this: After you think you've fully restored, and you try to start up HDFS it will tell you you need to format the file system, or the hdfs file system may simply appear empty despite the files all being back in place. If this happens, "ClusterID mismatch" is the first thing to check, starting with http://hortonworks.com/blog/hdfs-metadata-directories-explained/ for background. Won't say more because you probably won't have the problem and it will be confusing to talk about in the abstract.

mfoley · ‎04-14-2016

And just a few more relevant facts: 1. The write pipeline for replication is parallelized in chunks, so the time to write an HDFS block with 3x replication is NOT 3x (write time on one datanode), but rather 1x (write time on one datanode) + 2x (delta), where "delta" is approximately the time to transmit and write one chunk. Where a block is 128 or 256 MB, a chunk is something like 64KB if I recall correctly. If your network between datanodes is at least 1Gbps then the time for delta is dominated by the disk write speed. 2. The last block of an HDFS file is typically a "short" block, since files aren't exact multiples of 128MB. HDFS only takes up as much of the native file system storage as needed (quantized by the native file system block size, typically 8KB), and does NOT take up the full 128MB block size for the final block of each file.

mfoley · ‎04-13-2016

Java 6, 7, and 8 were quite FORWARD-compatible, meaning that programs that ran in earlier versions generally would run successfully in later versions. At the same time, with Enterprise software one cannot just assume such compatibility, one must test and certify. On the other hand, each new Java version has added language features that are not BACKWARD-compatible, meaning that a program that uses new Java 8 language features will not be able to run under Java 7. Thus it would require Java 8. To date, Hadoop has only required Java 7. I looked at the Apache docs, and I think the work being done in branch-2.8 is only to make sure it is COMPATIBLE with Java 8, not REQUIRING Java 8. At some point in the future, however, there will be a branch designated to use more efficient new Java 8 language constructs, and therefore require Java 8 to be installed in the server. Once the Hadoop community accepts the requirement for Java 8, from that version forward Hadoop will no longer run successfully in Java 7. In the meantime they are making sure that from hadoop-2.8 forward, it is at least assured of being compatible with Java 8, for users who prefer that. Looking back over the "JDK Requirements" in the install documentation for various versions of HDP: HDP-2.1 HDP-2.2 HDP-2.3 HDP-2.4 We see that Java 6 started to be deprecated in HDP-2.1, remained usable with HDP-2.2, but was not compatible with HDP-2.3. That's because in HDP-2.3 we started using versions of the Hadoop stack components that utilized Java 7 language features and therefore were no longer compatible with Java 6. Starting with HDP-2.3 we continued supporting Java 7, but also started certifying with Java 8. No Java 8 language features were used (so that we could still support Java 7), but we tested with both Java 7 and 8 and certified that HDP-2.3 and HDP-2.4 worked with them both. That is the situation so far today.

Online	Offline
Last Visited	‎08-14-2019 06:45 PM

Member Since	‎10-22-2015 04:49 PM
Last Visited	‎08-14-2019 06:45 PM
Posts	83
Kudos received	79

Cloudera Community

Re: how to check all component on master are stop ...

Re: Network Bonding on Hadoop Cluster with Centos ...

Re: With two HA clusters configured for cross-clus...

Re: file location in HDP

Re: Is it possible to change the default "home" di...

Re: file location in HDP

Re: file location in HDP

Re: Is it possible to change the default "home" di...

Re: Is it possible to change the default "home" di...

Re: Is it possible to change the default "home" di...

Re: local repo install issue for HDP2.4

Re: Has anyone done distcp between secured cluster...

Re: hadoop nodes from SUSE to RHEL

Re: Write performance in HDFS

Re: About Java 8 support