Support Questions

Find answers, ask questions, and share your expertise

Help with Cloudera Live on AWS

Explorer

I am trying to install Cloudera live on AWS with Tableau. The stack creation is complete.  I see 6 instances running on my account.  I did not receive any email with instructuctions on how to access Cloudera.  Can someone suggest how I can check if the installation is complete

 

Mark

2 ACCEPTED SOLUTIONS

Master Collaborator
Glad it's working. You should make the rules as specific or as general as
your needs dictate. I had forgotten about the rule that allowed all
outbound traffic, simply so any request originating in the cluster would
succeed (since the ephemeral ports for Linux are allowed inbound traffic).
The default firewall is quite strict about incoming traffic...

View solution in original post

Explorer

Hi Sean,

 

Thanks for your suggestion.  I will create a newpost.

 

Mark

View solution in original post

51 REPLIES 51

Explorer

Please close this issue

 

Mark

Community Manager

Did you solve the issue Mark? If so, please share the solution in case it can assist others. 🙂


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Master Collaborator
There was a bit of a delay on Friday as the system caught up after an issue
was fixed.

Explorer

Thanks for the reply. Cloudera live install went through fine.  I could connect to all three environments(Hue, manager, & navigator).  I have tried to query through Hive and Impala and they work.  Now, I am trying to use Sqoop to transfer data from mysql to Hadoop and I need help with manager node IP:Port, userid and pwd.  I will be using putty to connect.

 
I am also looking to connect through Tableau and I don't seem to have the right drivers.
 
Can you help with these two issues?
 
Mark

 

Master Collaborator
The "Guidance Page" (linked to in the emails you received after the cluster
started) has a table with the IP addresses of all your nodes, including the
"Manager Node". If you're accessing the tutorial from the link on that
page, it should fill in the value of the IP addresses in the example
commands (such as the Sqoop command) for you.

The user ID to use for SSH is ec2-user and there is no password - it's the
EC2 key-pair you selected when deploying the CloudFormation template. The
first couple of pages in the tutorial have more detail. For the MySQL
database, the username is retail_dba and the password is cloudera (again -
this should be shown in the tutorial) - but MySQL will only accept
connections from the machines in your cluster.

Can you be more specific about why you think you're missing a driver? The
copy of Tableau Desktop hosted on the Windows instance should have built-in
support for connecting to Impala, etc. and other than being able to connect
to Remote Desktop, you should not need any other drivers.

Explorer

Hi Sean,

 

Thanks for your reply.  I used the ip address(54.172.147.35) and userid (ec2-user) through Putty.  I get this error message:

 

"Disconnected: No supported authentication methods available(server sent: public key, gssapi-keyex, gssapi-with-mic"

 

Can you help me with this issue?

 

Regarding Tableau, I can access the tool and can log into the system.  Then, I select "Cloudera Hadoop" as the server and enter 54.172.147.35 for server with port(10000). 

I select "HiveServer" for Type.  I see Authentication greyed out.  I cannot enter userid or pwd.  I click OK and I get a window with error message

 

"An error occurred while communicating with the Cloudera Hadoop data source '54.172.147.35'

 

I would appreciate if you can let me know where I am making a mistake in the workflow.

 

Thanks,

 

Mark

 

 

Master Collaborator

Regarding PuTTY, have you read through EC2's documentation on connecting to Linux instances from Windows? http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-connect-to-instance-linux.html#using-putty. It seems you need to go through a process of converting the .pem file (the key you selected when deploying the CloudFormation template) to a PuTTY-specific .ppk format, and then configure your connection to use that file for authentication.

 

As for the issue connecting from Tableau, I would recommend you try using the Private IP for the Manager Node to connect to Hive. If you're using the public IP, a bunch of firewall rules get applied, and they will block access to Hive since the service is not secured by default in Live clusters. However, from inside the cluster, all access to private IPs should be open. Also note that Hive Server 2 is running on the Manager Node: this is distinct from Impala (which the Tableau tutorial in Cloudera Live has you connect to), which is running on all of the Worker Nodes instead.

 

Hope that helps!

Explorer

Hi Sean,

 

Thanks for your reply.  I tried using Putty and I can connect now. I still need to run the script to move tables from mysql to HDFS.

 

With Tableau, I am stil getting the same error as shown below.  I am using Cloudera Hadoop as the server. I am also using private ip (10.0.0.81) and left the port at 10000.

 

Please let me know if I need to make any other changes.

 

Mark

 

The drivers necessary to connect to this database are not properly installed.

To connect to this database, perform the following steps:

  • Click the following link to go to download drivers: Download Drivers
  • Follow the instructions
  • Attempt to connect to the database again.

Detailed Error Message:

  • [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified
  • Unable to connect to the server "10.0.0.81". Check that the server is running and that you have access privileges to the requested database.

Explorer

1. Please refer to https://www.cloudera.com/content/www/en-us/developers/get-started-with-hadoop-tutorial/exercise-1.ht... for ingets ( Mysql to HDFS ).

It is straignt forward only need to change the db name, user id, pasword, mysql driver location and you should be good.

 

 

2. For ODBC driver: Did you install from below link?

http://www.cloudera.com/content/www/en-us/downloads/connectors/hive/odbc/2-5-16.html.html

Explorer

Hi Sean:

 

With Putty:

 

I am able to connect through Putty with ec2-user as userid. I ran the script and I get an error:

-bash: import-all-tables: command not found

 

Please explain why I need to change userid,pwd,db_name and mysql driver location.  I am using cloudera live on AWS and I want to use the existing databases.

 

With Tableau:

 

I am able to connect using Impala but not through Hive. With Impala, when I connect, I don't see any schema.

 

I am using Hive Server 2 in ODBC configuration.  It doesn't connect to the server.  Please tell me the userid/pwd to connect through Hive Server 2.

 

Mark

 

 

Master Collaborator

You must have 'sqoop' before 'import-all-tables'. The full command in the tutorial is as follows:

 

sqoop import-all-tables \
    -m 3 \
    --connect jdbc:mysql://cloudera1:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-parquetfile \
    --warehouse-dir=/user/hive/warehouse \
    --hive-import

>> Please explain why I need to change userid,pwd,db_name and mysql driver location.  I am using cloudera live on AWS and I want to use the existing databases.

 

I'm not sure what steps in the tutorial you're referring to here.

 

>> With Impala, when I connect, I don't see any schema.

 

The Sqoop command will import some data. If you haven't already imported data, you should not see any schema in Impala.

 

>> Please tell me the userid/pwd to connect through Hive Server 2.

 

There isn't a password set up for Hive Server 2. You may find this thread helpful: http://community.cloudera.com/t5/Cloudera-Live-End-to-end/Cannot-connect-to-Hive-thru-JDBC-Connectio...

Explorer

Thanks for the reply.

 

I tried with Sqoop in front and I get the error:

 

-bash: sqoop: command not found

 

It looks like I am not in the right environment or the master node I am connecting to doesn't have sqoop installed.

 

Please check and let me know.

 

Mark

Master Collaborator
When you run 'hostname' from the machine you're running sqoop on, what do you see? On that machine, /usr/bin/sqoop should be the executable, and it should ultimately be calling /opt/cloudera/parcels/CDH/bin/sqoop. Do you see either of those files? Sqoop is bundled with everything else that gets installed - it would be very surprising to me that you got this far and had something missing, so I suspect you're not on the right machine.

Explorer

Hi Sean,

 

I appreciate your feedback.  With regards to Putty, I am not good at Unix.  All I did was to connect to

ec2-52-91-172-186.compute-1.amazonaws.com using mykeypair.ppk.  I used ec2-user as the userid. when I used /opt/cloudera/parcels/CDH/bin/sqoop at the prompt, I get an error - no such file or directory.  I am not good at Unix and I need your help with commands if you want me to check something.

 

With regards to Tableau, I tried to create a odbc connection using Cloudera odbc driver for Hive.  I did not use JDBC driver.  Can you confirm if I should use ODBC or JDBC for connecting to manager node?

 

Mark

 

 

Master Collaborator
Oh ODBC - my mistake. I have not used ODBC much with Hive. Hopefully
someone else can provide some insight there - I'd have to do some digging.

As for the Sqoop issue, there are 2 things I would check. First, can you
log into Cloudera Manager (using the link and credentials from the final
email you received)? The first screen should show the general health of the
cluster in a box to the left. Most of the services should have a green
circle, or a little black square. If a lot of the services are marked in
yellow or red, then something may be wrong with the cluster in general.
Next...

There should be 2 entries related to Sqoop. One will be called "Sqoop 2"
marked with a little black box because it's stopped by default <- this is a
service that is separate from the CLI tool as you're using it in the
tutorial. The other should just be called "Sqoop" or "Sqoop Client" or
something like that. It will be marked with a grey circle (since it's just
a CLI tool, it doesn't have a status, per se). Do you see that? If not,
click the button above this box to "Add a service", and select "Sqoop 1
Client". It'll ask you which hosts to deploy the tool on - just select all
of them, and click through the menus to complete the deployment. Then try
running Sqoop again. I can't imagine why Sqoop wouldn't be available on the
command-line already, if everything else got set up right, but try this and
see what happens. Maybe you'll find another clue along the way...


Explorer

Hi Sean,

 

With regards to Sqoop:

 

I added the new service Sqoop Client 1 and it seems to be running.  I went back to putty and ran the script again.  I still get the same error.

 

-bash: Sqoop: Command not found

 

Is there any other way I can test if Sqoop is running?

 

Thanks for your help.

 

Mark

Explorer

Hi Sean,

 

I think I finally found out where the problem was. I was not connecting to the Manager node in Putty.  I just did that and the script is running.  I will let you know once it finishes. I am hoping new tables will be created and I can query through hive or impala.  Thanks for your help.

 

 I still need to work through Tableau.  Please let me know if you find the solution on the right driver/connectivity parameters I should use.

 

Mark

Explorer

Hi Sean

 

I could get everything on the server to work and I finished all tutorial exercises. 

 

The only outstanding issue is connectivity from Tableau.  I am connecting from a windows machine using remote desktop.  I am not sure why I even need odbc driver on my machine.  It looks like there is something with connectivity parameters that is not correct.  I would appreciate your help.

 

Mark

Master Collaborator

So the Impala ODBC driver is installed on the Windows server that hosts Tableau Desktop. The Hive ODBC driver is separate. You can download a Windows installer for it here: http://www.cloudera.com/content/www/en-us/downloads/connectors/hive/odbc/2-5-16.html.html.

 

You can also find out more about Tableau and ODBC drivers here: http://kb.tableau.com/articles/knowledgebase/hadoop-hive-connection

Explorer

Hi Sean,

 

Thanks for the information.  The problem I had with Impala was that I could connect but I am not seeing any of the tables that I could see through Hue.  Can you tell me what the host IP, port and userid I should use?

 

I will try to install odbc driver for Hive and try connecting.

 

Please reply when you get a chance.

 

Mark