About Sean

Sean · ‎12-14-2015

So once you've started Cloudera Manager it's only running management services, not CDH (since the full stack uses so much more memory than most users have on their laptops - it's better to have you start what you know you'll need). So once you CAN connect to CM, you will need to start the services you want via the web UI or the CM API (there is a command to start every service on the cluster, too). Now as for why you can't connect, after CM starts it does take a couple of minutes to open the port because it does a lot of checks before it will accept any user input. However if 'service cloudera-scm-server status' has said it's up for several minutes, the next thing I'd check is which interface it's bound to. I'd expect it to bind to everything (including localhost), but also try 'quickstart.cloudera' (since you're in the container, that should resolve, and it may actually be a different IP address than localhost/127.0.0.1 depending on how the network interfaces are presented). You can also run 'sudo lsof -i | grep 7180' and it should show you details of whatever's listening on that port. Failing all that, I'd also check the logs in /var/log/cloudera-scm-server and see if anything's gone wrong that you can see there.

Sean · ‎12-14-2015

Port forwarding in Docker can be tricky, see the "Networking" section here: https://hub.docker.com/r/cloudera/quickstart/. You need to instruct Docker to forward any ports you want to use when you start the container (e.g. 8888 for Hue, 7180 for Cloudera Manager), and then you have to lookup what port number on your host maps to that port number on the guest. So if you instruct Docker to launch your container with '-p 7180', from the guests perspective it's listening on that port. However, on your host machine, it will be assigned a different port (that way, many containers can run the same services without their ports conflicting). You would need to run 'docker port 7180' and it would show you the interface it was bound to (usually 0.0.0.0, meaning it's listening on all interfaces / IP addresses) and the port, which might be 31000 or something in that neighborhood. In which case, 31000 is actually the port you need to connect to.

Sean · ‎12-14-2015

The difference is that Cloudera Manager runs Hadoop services independent of Linux's service management (because it manages them with a cluster-wide context rather than the host-only context that the Linux service management has). So once you start Cloudera Manager, you will see that all of the Hadoop services are stopped according to Linux: they're being managed by Cloudera Manager, and not Linux anymore. The reason it's done in 2 steps is that most users of the VM do not have a need or sufficient memory on their laptops to run the entire stack in a single node including Cloudera Manager, so by default the image runs a CDH-only deployment with the services managed by Linux. /home/cloudera/cloudera-manager will disable all such services and enable CM for users that can do that and want to.

Sean · ‎12-09-2015

You need to register for a new access code every time you deploy a cluster.

Sean · ‎12-09-2015

Sorry, it's not - I'm trying to get that updated.

Sean · ‎12-04-2015

Remember that schema and data are two separate things in Hadoop. The files in the directory, they are simply data files. For tables to show up in Hive or Impala, you have to import or define the schema for those tables in Hive Metastore. I believe the reason you're not seeing the tables is because the logs you posted show that Hive is constantly struggling with garbage collection. My guess is that Sqoop tried to import the schema into Hive but timed out - but I don't know for sure unless you can post the text outputted of the Sqoop command. To be clear - are you running a QuickStart VM? I'm a little unclear on exactly what your environment is.

Sean · ‎12-04-2015

To answer your other question though, I wouldn't expect a different data format to make a difference here. There's enough competition for memory on the system that Hive is constantly doing garbage collection, and that shouldn't have anything to do with what format Sqoop is using for the data.

Sean · ‎12-04-2015

Well there are a lot of variables so a simple "minimum requirement" is a tough number to give. The tutorial was originally written for a 4-node cluster with 16 GB of RAM per node, and that's a little bit small for the master node. The QuickStart VM has a version of the tutorial with a smaller dataset. You can get away with 4 GB (but this includes the graphical desktop, so let's say 3 GB for a server) if you don't use Cloudera Manager and manage everything yourself (note that this is pretty complex). If you use the "Cloudera Express" option for Cloudera Manager, 8 GB is the absolute minimum, and if you're going to try out "Cloudera Enterprise" you need at least 10 GB. But the number of nodes, exactly which services you're running, exactly what else is going on on the machines, etc. all affects this.

Sean · ‎12-04-2015

I was referring to the output of your Sqoop command - they are printed to the terminal, not written to a log file. However the log snippets you did post do indicate a potential problem: if Hive was pausing too much for garbage collection, then Sqoop might have given up / timed out when doing the import. You may not have enough memory for the services to run well.

Sean · ‎12-04-2015

Can you post the output of your Sqoop job? I'm wondering if there were errors when it was doing the --hive-import part. There's 2 stages: writing the files in the new data format to HDFS, and then defining the schema for the tables in Hive's Metastore. It sounds like that 2nd stage failed...

Online	Offline
Last Visited	‎03-17-2016 10:55 PM

Member Since	‎07-12-2013 07:35 AM
Last Visited	‎03-17-2016 10:55 PM
Posts	435
Kudos received	117

Cloudera Community

Re: Quickstart VM welcome page doesn't recognize t...

Re: Hadoop installation on Ubuntu 14.o4

Re: In Cloudera Quickstart VM how to upgrade lates...

Re: Unable to transfer files from Mac Desktop to C...

Re: Cloudera service and host monitoring fails fre...

Re: Docker quickstart images issues on windows

Re: Docker quickstart images issues on windows

Re: Docker quickstart images issues on windows

Re: Is the Cloudera Live AWS Code single use?

Re: Where Can I download the docker quickstart ima...

Re: CDH 5.5.0 - Getting Started Tables not visible...

Re: CDH 5.5.0 - Getting Started Tables not visible...

Re: CDH 5.5.0 - Getting Started Tables not visible...

Re: CDH 5.5.0 - Getting Started Tables not visible...

Re: CDH 5.5.0 - Getting Started Tables not visible...