Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎03-07-2017

Configure Hue on EMR with Hive meta store on S3

Hi,

I am trying to launch an EMR cluster release 5.3.1 with Hive 2.1.1 and Hue 3.11.

I tried to follow the instruction in this page http://gethue.com/introducing-s3-support-in-hue/.

I am launching the cluster through python script using boto3 with the following configuration Json:

 

[
     {
      "Classification": "core-site",
      "Properties": {
        "fs.s3a.awsAccessKeyId":"<aws key>",
        "fs.s3a.awsSecretAccessKey": "<aws secret key>"
        }
    },
    {
      "Classification": "hive-site",
      "Properties": {
        "hive.metastore.warehouse.dir":"s3://<bucket_name>/<hive-folder>",
        "javax.jdo.option.ConnectionURL": "jdbc:mysql://<rds-url>:3306/hivedb?createDatabaseIfNotExist=true",
        "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
        "javax.jdo.option.ConnectionUserName": "<db_user>",
        "javax.jdo.option.ConnectionPassword": "db_pass",
        "hive.exec.scratchdir":"/hive_temp/",
        "hive.exec.stagingdir" : "${hive.exec.scratchdir}/${user.name}/.staging",
        "hive.exec.dynamic.partition.mode":"nonstrict",
        "hive.exec.parallel":"true",
        "hive.exec.compress.intermediate":"true",
        "hive.optimize.index.filter":"true",
        "hive.optimize.index.groupby":"true",
        "hive.cluster.delegation.key.update-interval":"31536000000",
        "hive.cluster.delegation.token.renew-interval":"31536000000",
        "hive.cluster.delegation.token.max-lifetime":"31536000000"
        }
    },
    {
  "Classification": "hue-ini",
  "Properties": {},
  "Configurations": [
    {
      "Classification": "desktop",
      "Properties": {"user_access_history_size":"50",
                     "time_zone":"Europe/Berlin"
                     },
      "Configurations": [
        {
          "Classification": "database",
          "Properties": {
            "name": "hue_db",
            "user": "hue_user",
            "password": "hue_pass",
            "host": "<rds_host>",
            "port": "3306",
            "engine": "mysql"
          },
          "Configurations": []
        }
      ]
    },
    ## HUE AWS
    {
      "Classification": "aws",
      "Properties": {},
      "Configurations": [
      {
          "Classification": "aws_accounts",
          "Properties": {},
         "Configurations": [
      {     "Classification": "default",
             "Properties": {"allow_environment_credentials": "False",
                            "region": "eu-central-1"}
           }
          ]
          }]
        }
      ]
    }
  ]

This gives me back an error:
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the RunJobFlow operation: Classification 'aws_accounts' is not valid for parent classification 'aws'. Did I nested the Json incorrectly?
In addition if I remove the Hue AWS part, the cluster is launched without errors but when logging into Hue there is an error for misconfiguration:
Hive  - Failed to access Hive warehouse: s3://<my_bucket>/<hive_directory>

 

Also when going into Query editor, there is an error for "Could not connect to <master-node-ip>:1000".
Another point to mention is that Hue and Hive are on the same RDS MySQL server and Hue user has full access to Hive DB.

I also tried adding the AWS access keys to the Hue AWS part of the Json. But getting the same error for Hive misconfiguration.

Thanks in advance for your help.