Member since
03-09-2017
14
Posts
0
Kudos Received
0
Solutions
05-19-2017
02:31 PM
So after messing around it seems the correct way to do this, or at least the way I figured out how to do this, is to obtain the codec via the CompressionCodecFactory and invoking the method getCodecByClassName("org.apache.hadoop.io.compress.BZip2Codec").
... View more
05-17-2017
04:33 PM
I was reading that Bzip2 is a good compression format to use since it is splittable so i was trying to write a basic java program to take in a .txt file and write it to Hdfs compressed using bzip2. Here is my program: But I am getting this stack trace when I run:(First arg is location of file, second is where to put compressed file in Hdfs, and the last arg is a boolean saying to compress) I checked the io.compression.codecs property in core-site.xml and that doesn't seem to have bzip2 listed: I tried adding it via the configuration.set() method in my java program but that did not work. I also tried setting the io.native.lib.available property through configuration.set to false and that did not work. Does Hdp Sandbox not come with bzip2? Thanks for the help.
... View more
Labels:
05-17-2017
04:17 PM
Okay, that makes sense. And I just checked and on my mac I was actually doing the command: hadoop com/thomp/io/FileSystemCat /HadoopPrac/Data/sample.txt which also worked. Thanks again!
... View more
05-17-2017
02:34 AM
So the above works if I do: hadoop com/thomp/io/FileSystemCat hdfs://sandbox.hortonworks.com:8020/HadoopPrac/Data/sample.txt or if I do hadoop com/thomp/io/FileSystemCat hdfs://172.17.0.2:8020/HadoopPrac/Data/sample.txt Why is that? I set up hdp 2.6 sandbox on my mac and it works when I just do localhost. Why wouldn't it be the same thing since it is the same install? Here is a screen shot of the /etc/hosts file on the vm:
... View more
05-17-2017
12:15 AM
Hopefully someone can help me figure out what is going on. I downloaded Hdp 2.6 for virtualbox and followed the instructions to install and set up. I am running it on Linux Mint 18. I am following along in the 'Hadoop the Definitive Guide' book and I am trying to run one of the examples. Here is the code I wrote from the book: I set up my /etc/hosts like this, just following the instructions in the 'Learning the Ropes' tutorial. Following the book I do the following but I am getting a connection refused error. Is something not setup correctly? Any and all help would be GREATLY appreciated. I was up super late last night trying to figure this out with no luck.
... View more
Labels:
03-28-2017
10:43 PM
@Mike Riggs Thanks Mike for the tip. I guess all the references to VM I'll just kind of ignore and translate to me running on docker locally. Thanks all. I'll take a look see tomorrow or so and hopefully all is well.
... View more
03-28-2017
04:20 PM
@glupu, I think it is ok. I haven't done much with it since seeing the Amabri log in screen so I don't really know. I actually am not sure what to do. I was hoping there was a tutorial that showed me what to do next so I can start working with HDP and getting to learn it and Hadoop
... View more
03-28-2017
01:19 PM
@Rafael Coss, yes I turned off the pop-up blocker. I thought that was it too. @Jay SenSharma, I pretty much just copy and pasted the docker run command and I believe it exposed port 8080 @glupu, yes I am running the docker sandbox from my mac which is using docker for mac. I pretty much just followed the instructions from the link I posted above. I ran the 'docker run ...' command and then did a ssh -p 2222 root@localhost. I changed my password from 'hadoop' to my own and then ran /etc/init.d/startup_script start but after going to localhost:8888 the dashboard wouldn't launch. So yesterday I ran 'service ambari-server status' and it said it wasn't running but gave a message about a 'Stale PID File at: /var/run/ambari-server/ambari-server.pid'. So I ran 'service ambari-server start' and again 'service ambari-server status' showed the same thing even tough the 'ambar-server start' command said it completed successfully. Finally I ran 'service ambari-server restart' and then it was running and I can get to the dashboard. Why was that? And am I supposed to run the docker image of the sandbox this way? I noticed @jay had something which said 'Hortonworks Docker Sandbox' listed in his VirtualBox. I thought having a docker image was so that you didn't need a VM. So now I'm a bit confused on how I should be using the docker image of the sandbox. . . thanks, I appreciate the help!
... View more
03-27-2017
03:27 PM
I just followed the instructions at link to start using the docker image of the sandbox. When I go to localhost:8888 I click on the 'Launch Dashboard' button but this is what I get: So I get a tutorial page but the whole 'localhost didn't send any data' thing seems wrong. Not sure if it is related but when I ran the startup_script I get this WARNING for Starting Ranger-admin: Does this look okay? And then lastly just to make sure this is ok, when I ran the " docker load < HDP_2.5_docker.tar" command, my output is different and smaller in size than what is shown in the link above. Here is what I get: And here is what the link has as output: Sorry for all the pics and info but just wanted to make sure I didn't miss anything to help figure out why things seem to not be working. Thanks!
... View more
Labels:
03-26-2017
02:31 AM
Thanks a ton Jay for all the great info! Makes sense between the difference of Sandbox and HDP. I was wanting to start hands-on learning of Hadoop and Spark in a more realistic distributed-like environment. So I will try to set up a small multi-node cluster on my machine but I am sure I will hit some bumps. I'll just reply to this thread with any questions or issues, if that is ok. Thanks for the help! I really appreciate it.
... View more
03-25-2017
02:34 PM
Thanks for the reply Jay and the links. That looks interesting but I didn't see mention of Hortonworks or the sandbox anywhere. Maybe I missed it. Is this using hortonworks sandbox or something else? Sorry for being dense. Pretty green.
... View more
03-25-2017
01:58 PM
Hello all, I just a learner in the Hadoop and Spark ecosystem but I am excited to start learning. So to that end I was hopeful to begin by using the Hortonworks Sandbox Docker image to get up and running fast and light weight. Again I am new and have not yet loaded the image but I'm guessing this is a single node install just to play around with. Which is fine but I was really hoping for more of a pseudo-cluster experience. Is there a way to link together multiple Sandbox docker images to form a cluster on my machine? I did some poking around and really didn't find much help, at least not for a beginner like me. I was hoping someone has done this and could help me through it. Thanks!
... View more
Labels:
03-10-2017
09:59 AM
Thanks for the reply and the link but I guess I wasn't clear on my question, I am not sure how one actually writes the code for a Hadoop job on the host machine when all the jars and libraries needed are on the container. Do people write the code in the container and if so what do they use?
... View more
03-09-2017
06:59 AM
I am new to all this and was wanting to learn more about Hadoop and Spark, along with the other supporting technologies. So after a bit of research, using the quickstart Docker image seemed like a great option. The instructions for getting that up and running is pretty straight forward but how does one actually write and compile code to be used in the image? I would like to use Intellij as my IDE and I have that on my host machine but my host has no idea about hadoop or any other framework or jar needed to write and compile code. And the docker image is a command line thing, so not sure how to work within that using an IDE like Intellij. What is the work flow for writing code in an IDE that can be run in the quickstart image? I was really hoping for a simple 'Hello World' type example of how one can develop something to be run. Thanks for the help!
... View more
Labels: