Member since
07-12-2013
435
Posts
117
Kudos Received
82
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2340 | 11-02-2016 11:02 AM | |
| 3632 | 10-05-2016 01:58 PM | |
| 8296 | 09-07-2016 08:32 AM | |
| 8924 | 09-07-2016 08:27 AM | |
| 2522 | 08-23-2016 08:35 AM |
10-29-2015
02:01 PM
Check in /user/cloudera. Unless you're the 'hdfs' user, HDFS treats /user/[username] as your home directory. When your paths don't start with /, they go in there. If they DO start with /, it doesn't resolve it relative to your home directory. If you just type 'hadoop fs -ls', it'll look in your home directory and you should see it.
... View more
10-29-2015
12:34 PM
So the issue with metadata changes not showing up in clients is probably just because Impala caches metadata to make requests faster. When there's a metadata change, you can simply issue the 'invalidate metadata;' command. I'm not sure what you mean when you say it worked on the server though. Maybe it worked in the Hive app in Hue? Hive doesn't cache metadata like that - it looks it up for every query. Yeah the Sqoop command can be modified to import a specific table instead of all of them. I can't do anything on your cluster for you - once you get the credentials, Cloudera doesn't keep the SSH keys to log in in the future. But you should be able to do anything in HDFS if you preface the command with 'sudo -u hdfs'. So 'sudo -u hdfs hadoop fs -rm -r /user/hive/warehouse/orders', for instance.
... View more
10-28-2015
09:10 AM
The timestamps (e.g. 1374735600000) appear to be in milliseconds. That function expects timestamps to be in seconds. The documentation for such functions is here: http://www.cloudera.com/content/www/en-us/documentation/archive/impala/2-x/2-1-x/topics/impala_datetime_functions.html. Not sure what function to work best: there are others that refer to UTC timestamps instead of UNIX timestamps, but I'm not sure of the details there. The Impala forum might be a better place to ask for some pointers. Another option to investigate might be transferring the data into another table and converting those values to be in seconds.
... View more
10-27-2015
03:27 PM
Glad it's working. You should make the rules as specific or as general as your needs dictate. I had forgotten about the rule that allowed all outbound traffic, simply so any request originating in the cluster would succeed (since the ephemeral ports for Linux are allowed inbound traffic). The default firewall is quite strict about incoming traffic...
... View more
10-27-2015
06:08 AM
All traffic is denied by default, so add a rule to the existing rules that is set to 'ALLOW', set to apply to 'All traffic' / all ports, and set the source / destination IP to 0.0.0.0/0 (which means all). Do this on both the inbound rules tab, and the outbound rules tab.
... View more
10-26-2015
03:24 PM
So Sqoop is trying to do things, that might seem surprising if you're new to Hadoop: first, it's going to copy files containing the raw data to /user/hive/warehouse/, and then second, it's going to execute a CREATE TABLE (similar to what you may have used in SQL databases before) with Hive to recreate the metadata that goes with those files. In the output you showed me, it says the /user/hive/warehouse/ directory for the categories table already exists, and it's not expecting it to. It seems that a previous run has failed for a different reason, and we should clean it up before trying again. To get rid of the raw data files, run: 'sudo -u hdfs hadoop fs -rm -r /user/hive/warehouse/\*' (I'm assuming you don't have any other data in the cluster you care about). To get rid of the metadata, start the impala shell with 'impala-shell' (you don't need any other arguments because you're on the Quickstart VM and the defaults all happen to be correct). Run 'invalidate metadata;', and then 'show tables;'. For any tables you see, run 'drop table ;'. Then rerun the Sqoop job and it *should* succeed, but if it doesn't the output should give us the root cause of the real problem...
... View more
10-26-2015
02:14 PM
The Network ACL rules apply to the network as a whole (meaning the 5 machines in your cluster), and it should only apply to traffic going to or from your cluster, not between the machines themselves. Do also keep in mind that Network ACLs are a form of stateless firewall - meaning it doesn't filter based on the context of TCP connections. So you need to permit inbound traffic on port 21050, but also outbound traffic on 21050 (or whatever port your client is using for responses in the TCP connection - or "ephemeral ports"). The network ACL should already allow traffic on the standard ephemeral ports for both Linux and Windows operating systems. To assist in debugging this, I would use something like Wireshark to see exactly what's happening on the network - but that does require a pretty detailed understanding of how TCP and related protocols work. If you don't really care about the security of your cluster, you can also try just opening up the network ACL entirely to see if everything else is working before trying to lock it down again.
... View more
10-26-2015
02:04 PM
Can you check the output of Sqoop and any CREATE TABLE commands (depending on the version of the tutorial you're working from) for errors? Seems to me one of them has failed, but there's no way to know what or why without more information.
... View more
10-26-2015
02:03 PM
1 Kudo
Check the Hive Server 2 is running: 'sudo service hive-server2 status'. If it's not restart it with 'sudo service hive-server2 restart'. If you continue having issues, have a log at Hive Server 2 logs in /var/log/hive for any errors.
... View more
10-23-2015
02:02 PM
The Cloudera software is running under the standard 60-day free trial. After 60 days, some Enterprise features in Cloudera Manager will stop working (such as Cloudera Navigator), but all of CDH and any feature of Cloudera Manager that is available in the free version will continue to work. I believe Tableau Desktop is running under a 14-day free trial.
... View more