- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do I access and query a HDFS table from Python?
- Labels:
-
Apache Hadoop
Created ‎11-17-2018 06:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have looked all over the internet and couldn't arrive at a neat solution.
Scenario:
I have hiveserver2 running in my company's QA environment. I am able to ssh into the env and perform queries on tables in HDFS using Hive. I inspected hive-site.xml in QA and found that it uses kerberos authentication. I would now like to execute queries programmatically using Python from my local machine for experimentation. I have come across PyHive and Beeline.
My biggest hurdle is that I cannot find the JDBC url or IP address of the hiveserver2. Where can I get this information?
I used the IP address of my QA env (ifconfig) in my Python script and it couldn't connect to hiveserver2.
Is there any other workaround for accessing the tables in HDFS from Python?
Created ‎11-18-2018 03:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try pyHive
Created ‎11-19-2018 02:54 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As mentioned in my question. I have already tried using PyHive.
Created ‎11-19-2018 05:04 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can get the url from the Hive, Summary page in Ambari. For a Kerberized cluster, you will need to cal kinit to get a ticket before launching the python program. The format of the connection string for kerberos is something like the following. The actual user principal (and authentication) will be taken from the Kerberos ticket. Test the connection string with Beeline to make sure it works.
beeline -u "jdbc:hive2://myhs2.foo:10000/default;principal=hive/myhs2@foo@MYKERBREALM;auth=kerberos"
Created ‎02-12-2020 11:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @vignesh_radhakr .
you can simply access your hive by using below :-
URL:-
conn = hive.Connection(host="masterIP", port=10000, username="cdh123")
note:- MasterIP need to pass with port 10000
Thanks
HadoopHelp
