Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hue on EMR - Hive warehouse connection error to s3

Hue on EMR - Hive warehouse connection error to s3

New Contributor

Hi,

 
I am trying to launch an EMR cluster release 5.3.1 with Hive 2.1.1 and Hue 3.11.
 
I tried to follow the instruction in this page http://gethue.com/introducing-s3-support-in-hue/.
 
I am launching the cluster through python script using boto3 with the following configuration Json:
 
 
[
     {
      "Classification": "core-site",
      "Properties": {
        "fs.s3a.awsAccessKeyId":"<aws key>",
        "fs.s3a.awsSecretAccessKey": "<aws secret key>"
        }
    },
    {
      "Classification": "hive-site",
      "Properties": {
        "hive.metastore.warehouse.dir":"s3://<bucket_name>/<hive-folder>",
        "javax.jdo.option.ConnectionURL": "jdbc:mysql://<rds-url>:3306/hivedb?createDatabaseIfNotExist=true",
        "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
        "javax.jdo.option.ConnectionUserName": "<db_user>",
        "javax.jdo.option.ConnectionPassword": "db_pass",
        "hive.exec.scratchdir":"/hive_temp/",
        "hive.exec.stagingdir" : "${hive.exec.scratchdir}/${user.name}/.staging",
        "hive.exec.dynamic.partition.mode":"nonstrict",
        "hive.exec.parallel":"true",
        "hive.exec.compress.intermediate":"true",
        "hive.optimize.index.filter":"true",
        "hive.optimize.index.groupby":"true",
        "hive.cluster.delegation.key.update-interval":"31536000000",
        "hive.cluster.delegation.token.renew-interval":"31536000000",
        "hive.cluster.delegation.token.max-lifetime":"31536000000"
        }
    },
    {
  "Classification": "hue-ini",
  "Properties": {},
  "Configurations": [
    {
      "Classification": "desktop",
      "Properties": {"user_access_history_size":"50",
                     "time_zone":"Europe/Berlin"
                     },
      "Configurations": [
        {
          "Classification": "database",
          "Properties": {
            "name": "hue_db",
            "user": "hue_user",
            "password": "hue_pass",
            "host": "<rds_host>",
            "port": "3306",
            "engine": "mysql"
          },
          "Configurations": []
        }
      ]
    },
    ## HUE AWS
    {
      "Classification": "aws",
      "Properties": {},
      "Configurations": [
      {
          "Classification": "aws_accounts",
          "Properties": {},
         "Configurations": [
      {     "Classification": "default",
             "Properties": {"allow_environment_credentials": "False",
                            "region": "eu-central-1"}
           }
          ]
          }]
        }
      ]
    }
  ]
 
This gives me back an error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the RunJobFlow operation: Classification 'aws_accounts' is not valid for parent classification 'aws'. Did I nested the Json incorrectly?   
In addition if I remove the Hue AWS part, the cluster is launched without errors but when logging into Hue there is an error for misconfiguration:
 
Hive  - Failed to access Hive warehouse: s3://<my_bucket>/<hive_directory>
 
Also when going into Query editor, there is an error for "Could not connect to <master-node-ip>:1000".
Another point to mention is that Hue and Hive are on the same RDS MySQL server and Hue user has full access to Hive DB.
 
I also tried adding the AWS access keys to the Hue AWS part of the Json. But getting the same error for Hive misconfiguration. 
 
Thanks in advance for your help.
2 REPLIES 2

Re: Hue on EMR - Hive warehouse connection error to s3

Usually 10000 and not 1000 is the HiveServer port.

In any case, we would recommend to check that they are in the same network
security group or are allowed to connect to each other

Highlighted

Re: Hue on EMR - Hive warehouse connection error to s3

New Contributor

I finally found the reason. It appears that launching the emr cluster with the following 2 hive configuration caused the hiveserver2 to fail on start.

 

"hive.exec.scratchdir":"/hive_temp/",
"hive.exec.stagingdir" : "${hive.exec.scratchdir}/${user.name}/.staging"

 

So after removing them, Hue can connect to the remove database and I have no problem run queries.

 

However I am still getting misconfiguration for cannot access the s3 path that I specified as my default dwh in hive-site:

 

  "hive.metastore.warehouse.dir":"s3://<bucket_name>/<hive-folder>",

 

Any ideas how to fix it?

 

Thanks for your help