Hi, we want to access hive data from a machine outside the cluster (cdh 6.1). The dataset is in a partitioned external hive table. The hive instance is ldap secured with a self-signed certificate. We are looking for the easiest way to query this data, for example through python.
Looks a bit harder than first expected. Apparently pyhive is a good candidate, but unfortunately it does not support ldap. There is a 2 year old extra package that can be installed for this, but we had no luck. So the question is: what is the most straightforward way to connect to hive from outside the cluster, without issues with ldap. Does anything simple exist in python, or do we need to go to Squirrel or dbeaver? Does anyone have recent instructions for ldap ?
Note: we were able to connect through impala, but apparently impala 3.1.0 on cdh6.1 does not allow to query partitoned external hive tables trhough impala (tried INVALIDATE METADATA/REFRESH/MSCK REPAIR TABLE). Querying gives: Failed to load metadata for table: 'mydb.mytable' CAUSED BY: TableLoadingException: Unrecognized table type for table: mydb.mytable .