Member since
05-10-2016
184
Posts
60
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4093 | 05-06-2017 10:21 PM | |
4105 | 05-04-2017 08:02 PM | |
5018 | 12-28-2016 04:49 PM | |
1242 | 11-11-2016 08:09 PM | |
3333 | 10-22-2016 03:03 AM |
01-13-2017
05:47 AM
2 Kudos
Goal The goal of of this article is to get familiar with the most convenient method of accessing HAWQ Database i.e. using psql native client Assumptions
HAWQ Database is available and active PSQL Native Client (Available for download @ https://network.pivotal.io) PSQL or "psql" (the binary command) is the ideal choice to connect to HAWQ database/warehouse because HAWQ itself is based on Postgres database. Steps to configure psql on Client/Edge node:
Make a note of the Public IP address for the Client/Edge Node, this will be needed later. Access to HAWQ is available by default from HAWQ master host, make a note of HAWQ database name and user you would want to use for enabling database access.
To identify name of database (psql -l) from the gpadmin user prompt in linux To identify available user names "psql -c \du" Download the HAWQ installation binaries from the following site https://network.pivotal.io/products/pivotal-hdb#/releases/3466/file_groups/520 Follow this guide to setup the HAWQ repository http://hdb.docs.pivotal.io/211/hdb/install/setup-hdb-repos.html Install psql binary, this should be available from the repo we just set up [root@sandbox ~]# yum install hawq
Now the client is installed, next ensure that HAWQ listens to this edge node. To do this, you need to identify the data directory for HAWQ master, here is how we can do that. Login to HAWQ master host via SSH as gpadmin user & execute the following command [root@sandbox ~]# ps aux | egrep silent
gpadmin 209268 0.1 3.5 591072 357720 ? Ss 03:02 0:00 /usr/local/hawq_2_1_0_0/bin/postgres -D /data/hawq/master -i -M master -p 5433 --silent-mode=true
gpadmin 209428 0.2 4.0 693584 407636 ? Ss 03:02 0:01 /usr/local/hawq_2_1_0_0/bin/postgres -D /data/hawq/segment -i -M segment -p 40000 --silent-mode=true
root 212230 0.0 0.0 8456 924 pts/1 S+ 03:12 0:00 egrep silent
NOTE: You might see an output like above, this is because the demo setup was on a sandbox, so both master and segment nodes are on the same sandbox. We need to look at the path which says "master", in this case "/data/hawq/master.
Change directory to master data folder and make a copy of the file called pg_hba.conf [root@sandbox ~]# cd /data/hawq/master
[root@sandbox master]# cp pg_hba.conf pg_hba.conf.orig
File "pg_hba.conf" will already have entries like the following local all gpadmin ident
host all gpadmin 127.0.0.1/28 trust
host all gpadmin 172.16.105.183/32 trust
Here "host" or "local" specifies if the client entry point is local (on same host) or "host" from a different node (either within the same cluster or on the LAN) "all" basically allows access to all the databases, this can be specific, for instance, "hdw" provided that is the name of your database "gpadmin" is the default superuser for HAWQ cluster, this user has access to the cluster (within specified IP ranges) IP address is either the 32-bit absolute IP address or could be opened to a subnet "trust" is the method Now to understand the last entry in the above example, we are providing access for from any "host", provided "all"/any database only when the user is "gpadmin" trying to connect from client which has the IP address "172.16.105.183" and method is "trust", so there is no password required. To understand pg_hba.conf better, here is the link: https://www.postgresql.org/docs/8.3/static/auth-pg-hba-conf.html
Now assuming that your edge node has an IP address of "172.26.78.183", database name is "test", username is "myuser", and you want to use securely use minimally secure login method "md5" then the entry would look something like this. host test myuser 172.26.78.183/32 md5
Once the entry is made, we need to ensure that HAWQ is infact open to listening to remote hosts. A "*" signifies that HAWQ can listen to all remote hosts. [gpadmin@sandbox ~]$ egrep -i --color listen_address /data/hawq/master/postgresql.conf
#listen_addresses = '*' # what IP address(es) to listen on;
Now ensure that our entry in pg_hba.conf file is cached by HAWQ process, hence, HAWQ will be able to restrict the creation new processes. (We need to be gpadmin user) [gpadmin@sandbox ~]$ /usr/local/hawq/bin/pg_ctl reload -D /data/hawq/master
server signaled
[gpadmin@sandbox ~]$
You should be able to connect using the psql now [gpadmin@sandbox ~]$ psql -U myuser -h 172.16.105.183 -d test
Password for user myuser:
psql (8.2.15)
Type "help" for help.
test=>
... View more
01-11-2017
09:37 PM
@ed day You can also look at (tez-config) tez.task.launch.cmd-opts via Ambari to see the amount of heap set for -Xmx and also ensure it matches the value in hive.tez.java.opts (hive config)
... View more
01-11-2017
07:44 PM
@suresh krish
Have you recently tried to alter the table/column you are querying ? Also, how are you inserting data into your table
... View more
01-04-2017
07:17 PM
@Rkg Grg It would be easy for you to test it out. If the table is managed then simply perform "describe formatted" and monitor if the time is changing after 2 minutes without changing anything. Alternatively, do the same test after adding 1 row and see if that changes the "transient_lastDdlTime".
... View more
01-03-2017
07:33 PM
Your select statement looks right, if the statement is correct then it must be the version which is missing the function md5.
... View more
12-29-2016
08:57 PM
Thanks for the capture @Praveen PentaReddy, in one of the related posts here https://community.hortonworks.com/questions/68012/hive-cant-get-the-md5-value.html It seems like apparently the version of hive which is available below HDP 2.5, might not have that available. That is the same sandbox I used to perform the test at my end. If you don't have that option, you can still write your own custom UDF to perform the same outcome. Have a look at this simple code which you can use for your custom UDF https://gist.github.com/dataminelab/1050002
... View more
12-29-2016
07:06 PM
Can you share the complete output ? works just fine for me: 0: jdbc:hive2://localhost:10000/default> select md5(col1) as md_location from test1 limit 5;
+-----------------------------------+--+
| md_location |
+-----------------------------------+--+
| c4ca4238a0b923820dcc509a6f75849b |
| c81e728d9d4c2f636f067f89cc14862c |
| eccbc87e4b5ce2fe28308fd9f2a7baf3 |
| a87ff679a2f3e71d9181a67b7542122c |
| e4da3b7fbbce2345d7772b0674a318d5 |
+-----------------------------------+--+
5 rows selected (0.115 seconds)
Which version of HDP are you using ?
... View more
12-29-2016
05:54 PM
@kotesh banoth can you help me with the output of "describe formatted <tablename>" for both external and managed tables ?
... View more
12-28-2016
07:51 PM
Can you provide output of "set hive.security.authorization.sqlstd.confwhitelist". You should also ensure that the property is not present in "hive.conf.restricted.list" as it supersedes the whitelist declaration.
... View more