Support Questions

Find answers, ask questions, and share your expertise

Help with Cloudera Live on AWS

avatar
Explorer

I am trying to install Cloudera live on AWS with Tableau. The stack creation is complete.  I see 6 instances running on my account.  I did not receive any email with instructuctions on how to access Cloudera.  Can someone suggest how I can check if the installation is complete

 

Mark

2 ACCEPTED SOLUTIONS

avatar
Guru
Glad it's working. You should make the rules as specific or as general as
your needs dictate. I had forgotten about the rule that allowed all
outbound traffic, simply so any request originating in the cluster would
succeed (since the ephemeral ports for Linux are allowed inbound traffic).
The default firewall is quite strict about incoming traffic...

View solution in original post

avatar
Explorer

Hi Sean,

 

Thanks for your suggestion.  I will create a newpost.

 

Mark

View solution in original post

51 REPLIES 51

avatar
Explorer

Hi Sean:

 

With Putty:

 

I am able to connect through Putty with ec2-user as userid. I ran the script and I get an error:

-bash: import-all-tables: command not found

 

Please explain why I need to change userid,pwd,db_name and mysql driver location.  I am using cloudera live on AWS and I want to use the existing databases.

 

With Tableau:

 

I am able to connect using Impala but not through Hive. With Impala, when I connect, I don't see any schema.

 

I am using Hive Server 2 in ODBC configuration.  It doesn't connect to the server.  Please tell me the userid/pwd to connect through Hive Server 2.

 

Mark

 

 

avatar
Guru

You must have 'sqoop' before 'import-all-tables'. The full command in the tutorial is as follows:

 

sqoop import-all-tables \
    -m 3 \
    --connect jdbc:mysql://cloudera1:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

>> Please explain why I need to change userid,pwd,db_name and mysql driver location.  I am using cloudera live on AWS and I want to use the existing databases.

 

I'm not sure what steps in the tutorial you're referring to here.

 

>> With Impala, when I connect, I don't see any schema.

 

The Sqoop command will import some data. If you haven't already imported data, you should not see any schema in Impala.

 

>> Please tell me the userid/pwd to connect through Hive Server 2.

 

There isn't a password set up for Hive Server 2. You may find this thread helpful: http://community.cloudera.com/t5/Cloudera-Live-End-to-end/Cannot-connect-to-Hive-thru-JDBC-Connectio...

avatar
Explorer

Thanks for the reply.

 

I tried with Sqoop in front and I get the error:

 

-bash: sqoop: command not found

 

It looks like I am not in the right environment or the master node I am connecting to doesn't have sqoop installed.

 

Please check and let me know.

 

Mark

avatar
Guru
When you run 'hostname' from the machine you're running sqoop on, what do you see? On that machine, /usr/bin/sqoop should be the executable, and it should ultimately be calling /opt/cloudera/parcels/CDH/bin/sqoop. Do you see either of those files? Sqoop is bundled with everything else that gets installed - it would be very surprising to me that you got this far and had something missing, so I suspect you're not on the right machine.

avatar
Explorer

Hi Sean,

 

I appreciate your feedback.  With regards to Putty, I am not good at Unix.  All I did was to connect to

ec2-52-91-172-186.compute-1.amazonaws.com using mykeypair.ppk.  I used ec2-user as the userid. when I used /opt/cloudera/parcels/CDH/bin/sqoop at the prompt, I get an error - no such file or directory.  I am not good at Unix and I need your help with commands if you want me to check something.

 

With regards to Tableau, I tried to create a odbc connection using Cloudera odbc driver for Hive.  I did not use JDBC driver.  Can you confirm if I should use ODBC or JDBC for connecting to manager node?

 

Mark

 

 

avatar
Guru
Oh ODBC - my mistake. I have not used ODBC much with Hive. Hopefully
someone else can provide some insight there - I'd have to do some digging.

As for the Sqoop issue, there are 2 things I would check. First, can you
log into Cloudera Manager (using the link and credentials from the final
email you received)? The first screen should show the general health of the
cluster in a box to the left. Most of the services should have a green
circle, or a little black square. If a lot of the services are marked in
yellow or red, then something may be wrong with the cluster in general.
Next...

There should be 2 entries related to Sqoop. One will be called "Sqoop 2"
marked with a little black box because it's stopped by default <- this is a
service that is separate from the CLI tool as you're using it in the
tutorial. The other should just be called "Sqoop" or "Sqoop Client" or
something like that. It will be marked with a grey circle (since it's just
a CLI tool, it doesn't have a status, per se). Do you see that? If not,
click the button above this box to "Add a service", and select "Sqoop 1
Client". It'll ask you which hosts to deploy the tool on - just select all
of them, and click through the menus to complete the deployment. Then try
running Sqoop again. I can't imagine why Sqoop wouldn't be available on the
command-line already, if everything else got set up right, but try this and
see what happens. Maybe you'll find another clue along the way...


avatar
Explorer

Hi Sean,

 

With regards to Sqoop:

 

I added the new service Sqoop Client 1 and it seems to be running.  I went back to putty and ran the script again.  I still get the same error.

 

-bash: Sqoop: Command not found

 

Is there any other way I can test if Sqoop is running?

 

Thanks for your help.

 

Mark

avatar
Explorer

Hi Sean,

 

I think I finally found out where the problem was. I was not connecting to the Manager node in Putty.  I just did that and the script is running.  I will let you know once it finishes. I am hoping new tables will be created and I can query through hive or impala.  Thanks for your help.

 

 I still need to work through Tableau.  Please let me know if you find the solution on the right driver/connectivity parameters I should use.

 

Mark

avatar
Explorer

Hi Sean

 

I could get everything on the server to work and I finished all tutorial exercises. 

 

The only outstanding issue is connectivity from Tableau.  I am connecting from a windows machine using remote desktop.  I am not sure why I even need odbc driver on my machine.  It looks like there is something with connectivity parameters that is not correct.  I would appreciate your help.

 

Mark

avatar
Guru

So the Impala ODBC driver is installed on the Windows server that hosts Tableau Desktop. The Hive ODBC driver is separate. You can download a Windows installer for it here: http://www.cloudera.com/content/www/en-us/downloads/connectors/hive/odbc/2-5-16.html.html.

 

You can also find out more about Tableau and ODBC drivers here: http://kb.tableau.com/articles/knowledgebase/hadoop-hive-connection