Member since
07-12-2013
435
Posts
117
Kudos Received
82
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1227 | 11-02-2016 11:02 AM | |
1850 | 10-05-2016 01:58 PM | |
6720 | 09-07-2016 08:32 AM | |
6271 | 09-07-2016 08:27 AM | |
1223 | 08-23-2016 08:35 AM |
07-08-2015
08:05 AM
1 Kudo
So in the tutorial you used Sqoop to import data from MySQL, right? Sqoop also supports Oracle (and a number of other data sources such as other relational databases, mainframes, etc.) and you can also use Sqoop to export the data back to a relational database. I'd suggest you have a look at Sqoop's documentation to see all the various options, etc. Sqoop in CDH is currently based on Sqoop 1.4.5 (with some other fixes / improvements back-ported): http://sqoop.apache.org/docs/1.4.5/index.html. There's also "Sqoop 2" which is still being developed but is available in CDH. It uses a client-server model instead of just the CLI tool. It was Sqoop 1 which you would've seen in the tutorial, though.
... View more
07-06-2015
05:02 PM
When the deployment finished running you should have received an email with a link to done resources and credentials, etc. If you haven't seen it, check spam, etc. If you can't find it, send me a private message with the email address you used to sign up, and I'll see what I can do to help.
... View more
07-03-2015
12:19 PM
1 Kudo
DFS Master (HDFS Namenode) is port 8020. The YARN Resource Manager is port 8032. I'm not that familiar with the hadoop plugin, but you should clarify whether you want to be using MapReduce from Hadoop 2.x (YARN acts as a scheduler, and you submit MapReduce jobs through YARN's ports). When they say "Map/Reduce Master", to me that sounds like MR1, when MapReduce ran it's own daemons. If it's MR1 you want to be using, you would actually want to use the JobTracker port, which is 8021. Even though MR1 is supported in CDH 5, we recommend Hadoop 2 / YARN for production and MR1 is not running in the QuickStart VM by default. Some work would be required to shutdown the YARN daemons and start the MR1 daemons; specifically, stopping that hadoop-yarn-resourcemanager and hadoop-yarn-nodemanager services, uninstalling the hadoop-conf-pseudo package, and installing the hadoop-0.20-conf-pseudo package instead, and then starting the hadoop-0.20-mapreduce-jobtracker and hadoop-0.20-mapreduce-tasktracker services. >> I also specify Host with ip of Clouder CDH5 VMware ip Make sure that you can ping that IP from your host machine. By default, the VM uses "NAT" which means you can't connect from your host machine. You'll want to use a "bridged" network or something similar instead so that you can initiate connections from your host machine.
... View more
07-03-2015
05:52 AM
As long as your laptop is running a supported operating system (RHEL or CentOS 5/6, SLES 11, Ubuntu 12.04 / 14.04, Debian 7), you can just follow this guide: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_qs.html Otherwise, you will first need to install one of these systems in a VM. You should be able to complete that Hadoop example in a VM with that much RAM, but be aware that you would have some trouble running the rest of the stack that is not core Hadoop: the QuickStart VM has everything pre-installed and configured, but requires 4GB of RAM for the VM itself.
... View more
06-08-2015
03:04 PM
4 Kudos
All the basic Hadoop services should be running when you start the VM. Port 8020 is for the hadoop-hdfs-namenode service, so my guess is that service has failed and just needs to be restarted. You can check the status of a service with service <service-name> status and you can restart a service with service <service-name> restart So 'service hadoop-hdfs-namenode restart' may be all you need. Also check the hadoop-hdfs-datanode service as it may also need to be restarted. The services should have been running, so if they're not it means something went wrong. If you're curious or if you continue to have a problem, have a look at the NameNode logs in /var/log/hadoop-hdfs for anything that looks like a fatal error and post it back here.
... View more
05-29-2015
01:17 PM
In tutorial #1 you copied some *.avsc files into HDFS. My guess is that step was skipped it failed for some reason. I would suggest trying that step again.
... View more
05-28-2015
08:26 AM
The original error has nothing to do with connecting to Hue. It was an error interactions between Impala and the other daemons it needs to complete the query. Just use the link in the email or on the guidance page to connect to Hue. The public IP should work just fine. If it doesn't, you are experiencing a separate issue and would need to provide more information.
... View more
05-19-2015
06:10 AM
Hi Mitali, I'd recommend you email support@gogrid.com with your account details and what you just posted here. They will be able to see what, if anything went wrong and correct it.
... View more
05-18-2015
11:36 AM
1 Kudo
You are correct - the username and password should both be 'cloudera' in the QuickStart VM. (It is 'admin' in Cloudera Live clusters hosted on GoGrid - which is where the screenshot comes from). I'll get that fixed in the next release.
... View more
05-05-2015
01:40 PM
I see there's an error in the tutorial. These queries are intended to be in the Beeline shell that is described above this step. The statements should be pasted into that shell, not into the Hive Query Editor app. I'll get that corrected... The "$s" in the format string is a special syntax interpreted by Hue as a parameter for the user to provide. To make it send the query to Hive as intended, you would need to escape the $ signs (e.g. %1$$s, etc.) However you may also run into some permissions issues querying the dataset via Hue that were beyond the scope of the tutorial - hence using Beeline. Thanks for reporting this!
... View more
05-04-2015
03:45 PM
Thanks for letting us know - glad you got it working! Is there are an article or blog post with more background on the bug you chould share?
... View more
05-04-2015
09:12 AM
I've had another VMWare Fusion user confirm that the latest (5.4.0-0) release works well for them, and they actually informed me that those lines we recently added to the .vmx file have been in use for a long time in Cloudera's courses without incident, so I don't think it's worth messing with those settings after all. My only other suggestion would be to see if there is any kind of error message you can get from Fusion about why the VM is failing to start. It would see from the following article that you can enable some diagnostic information collection - perhaps the resulting files will contain a more useful error message? http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1003894
... View more
05-04-2015
06:15 AM
I'm going to look into this, but while I do, here's something you can try if you'd like. These lines were added to the .vmx configuration file recently, and that's the only recent change that I can think of that might affect Fusion specifically. You could try removing them from the file and tryin again. (They're only added as a convenience to hide some annoying reminders and prompts). tools.remindinstall="FALSE"
tools.upgrade.policy="manual" Let me know if that does or doesn't work for you if you can try it out, otherwise I'll post back as soon as I can with more information.
... View more
04-28-2015
07:12 PM
1 Kudo
The reason is that CDH is installed in the VM using Linux packages, not parcels (so that using Cloudera Manager to manage the services is optional). If you'd like to install the Kafka parcel, you'll first need to move CDH to a parcel-based install. The documentation to do this can be found here: http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_migrating_packages_to_parcels.html.
... View more
03-24-2015
03:51 PM
Can you confirm in Cloudera Manager that the HDFS service is running and healthy? If the service is marked in any color other than green, there should be a little warning icon that you can click on to get any information about what may be wrong. If the service is healthy, can you tell me what happens when you run "hadoop fs -ls /user/examples/sqoop_import_order_items.avsc" from the command line on a machine in your cluster?
... View more
03-17-2015
11:35 AM
I'm afraid I'm not very familiar with R and running it against Hadoop. My first thought is that perhaps the program that creates the files and the program that looks for the files are running as different users? /user/cloudera is the default working directory for the cloudera user, but other users will default to other directories. e.g. if 'root' asks for a file called '0', unless there's an absolute path with it, it means /user/root/0. Is it possible these files exist under a different user's home directory?
... View more
03-13-2015
12:07 PM
1 Kudo
I believe this procedure should get you switched over from YARN / MR2 to MR1. After running it I was able to comput pi using MR1: for service in mapreduce-historyserver yarn-nodemanager yarn-proxyserver yarn-resourcemanager; do
sudo service hadoop-${service} stop
sudo chkconfig hadoop-${service} off
done
sudo yum remove -y hadoop-conf-pseudo
sudo yum install -y hadoop-0.20-conf-pseudo
for service in 0.20-mapreduce-jobtracker 0.20-mapreduce-tasktracker; do
sudo service hadoop-${service} start
sudo chkconfig hadoop-${service} on
done It stops and disables the MR2 / YARN services, swaps the configuration files, then starts and enables the MR1 services. Again, the tutorial is not written to be used (or tested) with with MR1, so it's possible you'll run into some other issues. I can't think if any specific incompatibilities - just recommending that if you want to walk through the tutorial, you do it with an environment as close to the original VM as possible - otherwise who knows what differences may be involved.
... View more
03-13-2015
11:15 AM
To answer Morgan's question, port 8020 is the HDFS NameNode, port 8021 is the JobTracker in MR1, which is where you would have submitted jobs in CDH 4. It can still be used in CDH 5, but as it is not the default, you'll need to switch around some configuration and services (and understand that the rest of the tutorial may not work exactly as expected because of the switch - I'd suggest perhaps starting with a fresh copy of the tutorial to be sure everything in the tutorial will work and not conflict with what you've been doing in R).
... View more
03-13-2015
11:13 AM
After reviewing the blog post, I noticed that it is written for the CDH 4.1.1 VM. I'm afraid there have been a number of changes since then that might be complicating things. The primary change, and the one that I think is complicating Sqoop for you, is the in CDH 4 we recommend MR1 for production, whereas in CDH 5 YARN has stabilized and we now recommend MR2 for production because of the superior resource management. I believe the following line is responsible for setting up your environment such that Sqoop is trying to use MR1 when it is not running: ln -s /etc/default/hadoop-0.20-mapreduce /etc/profile.d/hadoop.sh You could either try getting rid of that symlink and anything else that's telling the system to use MR1, or you could stop YARN / MR2 and use MR1 instead. I'll try post some instructions for doing the latter shortly...
... View more
03-13-2015
09:59 AM
1 Kudo
>> In the very first tutorial on cloudera, it reads "You should first log in to the Master Node of your cluster using SSH - you can get the credentials using the instructions on Your Cloudera Cluster. " It's a little confusing whether you're running these commands on your host machine, or on the VM. If you're reading the tutorial hosted on a website somewhere, it's written with you running this on a fully-distributed cluster in mind and SSH'ing in to the machine. There's a modified copy hosted on the VM itself (just go to localhost in the web browser in the VM, or on your host as port-forwarding should work for VirtualBox) that (in my copy at least) just tells you to click on the terminal icon on the VM's desktop and enter commands there. Which version of the VM are you using and where do you see that text? It should be possible to SSH into the VM, and even run these commands from your host machine but doing so requires a lot of network configuration to be set up correctly - it won't be set up that way by default and it can be complicated to get it working consistently on different hosts - which is why I recommend just using the terminal on the VM's desktop. The root cause of your connection refused error problem appears to be that Sqoop is trying to use MR1. The VM is set up to use MR2 / YARN by default, so that is probably why MR1 is not running and you can't connect. Cloudera supports running both MR1 and MR2, but you can't have a machine configured as a client to both at the same time. When I run this on my copy of the VM (and in all recent versions) Sqoop is definitely using MR2 / YARN. Have you change any other configurations before running Sqoop? Is it possible you've got Sqoop installed on your host machine and it's configured differently than Sqoop in the VM?
... View more
02-13-2015
09:25 AM
I hear you - I prefer .tar.gz myself but we found that with most formats (.tar.gz included) the ability to extract large archives (>2GB) was very inconsistent between different tools and it caused a lot of confusion among users about what the problem actually was.
... View more
02-13-2015
09:03 AM
You beat me to it! I was just downloading the file to confirm the integrity of the file. I downloaded cloudera-quickstart-vm-4.7.0-0-vmware.7z, the SHA-1 checksum matched, and I was able to extract it. If you're not doing so already, I recommend using a download manager for large files (I use DownThemAll! for Firefox) - it will deal with network failures more gracefully than the one built-in to most browsers and you're less likely to have a corrupted or interrupted download.
... View more
01-30-2015
06:54 AM
ZooKeeper is now required for some of the features that allow Solr to scale reliably ("SolrCloud"). You need to provide the address of your ZooKeeper ensemble as --zk (host1),(host2):(port) (port is usually 2181)
... View more
01-13-2015
08:11 AM
Thanks again for reporting the issue and sharing your workaround with other users. Just wanted to let you know that as of noon PT yesterday, now deployments should not have this problem. You should be able to access MySQL via the public IP from any of the instances now, and the tutorial now gives you a working command for the Sqoop import and Hive metadata.
... View more
01-06-2015
06:53 AM
1 Kudo
Thanks for letting us know about this - this is an error in a recent update to the tutorial. Those commands should be using the hostname, rather than the IP address. So I'd suggest trying 'f6129-cldramaster-01' for the NameNode instead of 216.121.116.82. We'll change the future MySQL setup to allow access via IP address as well, but it seems you found a work-around for that. The reason for the second failure is that the command is trying to use the public interface instead of the private interface. GoGrid machines typically have 2 network interfaces - one that is publicly accessible, and one that is private (but has higher performance). 216.* points to one of the public IP addresses, but some of the services that are not intended to be accessed directly only listen on the private interface (Hue and CM listen on the public interfaces as well). So again, using the hostname there should work, or the IP address of the internal interface (eth1). We'll get the tutorial content and MySQL config updated shortly... Please post back if you run into additional issues with the current tutorial and I'll try provide workarounds...
... View more
01-06-2015
06:47 AM
The line that says to us the black "Terminal" icon is actually only relevant for the QuickStart VM. It's not supposed to show up in the "Cloudera Live" environment, so just ignore it (and I'll look into why it's showing up for you when it shouldn't). You should be able to log in with a terminal using SSH from Mac OS / Linux, or perhaps a tool like PuTTY from Windows.
... View more
12-20-2014
06:19 PM
2 Kudos
FWIW, this should basically be the procedure to install and run from packages in the current QuickStart VM: REPO= http://archive-primary.cloudera.com/accumulo-c5/redhat/6/x86_64/cdh/cloudera-accumulo.repo (cd /etc/yum.repos.d && sudo wget ${REPO}) # install yum repository sudo yum install accumulo-* sudo service accumulo-master init # follow prompts for role in (cd /etc/init.d && ls accumulo\*); do sudo service ${role} start done
... View more
12-18-2014
10:05 AM
5 Kudos
If you go to the configuration tab for the Hue service and use the search bar to search for properties with "wildcard" in the name, you should see a property to have Hue server use a wildcard instead of binding to a specific NIC. If that isn't checked, check it and restart the service once it's saved. That property gets set when the service is created, so it's very unusual that it's not getting applied for you - I haven't seen that before. Thanks for reporting, though.
... View more
12-18-2014
07:58 AM
Hmm... The same server that hosts CM should also host Hue (on 8888, as you're trying) and a web server on port 80 with a "Guidance Page" with a link straight to Hue. Are you trying to connect to 8888 on the same IP as the one you can connect to Cloudera Manager on? If so, I'd be curious to know if when you look at the Hue service in CM and click on the Instances tab, if it shows the Hue server running on the host you're expecting.
... View more
12-18-2014
06:48 AM
Cloudera Manager is responsible for managing and monitoring Hue (among other things, of course). If you can log into Cloudera Manager the first screen you will see includes a list of all services. Hue's status will be shown with a red, yellow or green dot. You can click on the service, go to the processes tab, and see logs from stdout, stderr, and the actual "role log" (this is most likely the be what you want to see). What does it show as the status of Hue and do you see any errors in that log?
... View more
- « Previous
- Next »