Member since
05-10-2016
184
Posts
60
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4072 | 05-06-2017 10:21 PM | |
4077 | 05-04-2017 08:02 PM | |
5001 | 12-28-2016 04:49 PM | |
1239 | 11-11-2016 08:09 PM | |
3315 | 10-22-2016 03:03 AM |
07-08-2016
03:47 PM
https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html
... View more
06-30-2016
06:06 PM
Can you share the error message and log file for ambari server "/var/log/ambari-server/"
... View more
06-30-2016
02:59 PM
hey Abhijeet, not sure if you have tried this. "pig -printCmdDebug -x tez -useHcatalog" Let me know if this does not work for you.
... View more
06-21-2016
01:12 AM
Goals
Get familiar with psql command line CLI interface for HAWQ
Understanding how to access and list the options available for command line interface
Accessing help from within the CLI
NOTES
Requirement for above is to ensure that there is either a sandbox configured with HAWQ
A SSH interface to the sandbox either via HOST/remote OSX Or Putty when working with Windows
Connecting to the Guest Sandbox Machine
Depending upon the setup i.e., VirtualBox or VMWare Fusion for HAWQ & HDP, you can choose the connection method. For connecting via VMWare Fusion, you need to be aware of the IP address allocated to your virtual box.
Connecting via VMWare Fusion
HW13382:ODBC srai$ ssh root@172.16.105.137 -p 22
root@172.16.105.137's password:
Last login: Mon Jun 20 15:31:20 2016 from 172.16.105.1
[root@sandbox ~]#
Connecting via Virtualbox
HW13382:ODBC srai$ ssh root@localhost -p 2222
root@172.16.105.137's password:
Last login: Mon Jun 20 15:31:29 2016 from 172.16.105.1
[root@sandbox ~]#
NOTE: HAWQ has a preconfigured user account associated with it, similar to hive, known as "gpadmin". This by default is the superuser for the cluster and has all the privileges. For our exercise, we will be using this account.
Switch from root user account to "gpadmin" user
[root@sandbox ~]# su - gpadmin
[gpadmin@sandbox ~]$
Ensure that you have the binaries set in the environment, usually this should be taken care by itself during the installation
[gpadmin@sandbox ~]$ env | egrep GPHOME
GPHOME=/usr/local/hawq/.
[gpadmin@sandbox ~]$ which psql
/usr/local/hawq/bin/psql
[gpadmin@sandbox ~]$
Now that we can locate the "psql" utility, lets list down the options[gpadmin@sandbox ~]$ psql -?
This is psql 8.2.15, the PostgreSQL interactive terminal (Greenplum version).
Usage:
psql [OPTION]... [DBNAME [USERNAME]]
General options:
-c, --command=COMMAND run only single command (SQL or internal) and exit
-d, --dbname=DBNAME database name to connect to (default: "gpadmin")
-f, --file=FILENAME execute commands from file, then exit
-l, --list list available databases, then exit
-v, --set=, --variable=NAME=VALUE set psql variable NAME to VALUE
-X, --no-psqlrc do not read startup file (~/.psqlrc)
-1 ("one"), --single-transaction execute command file as a single transaction
--help show this help, then exit
--version output version information, then exit
Input and output options:
-a, --echo-all echo all input from script
-e, --echo-queries echo commands sent to server
-E, --echo-hidden display queries that internal commands generate
-L, --log-file=FILENAME send session log to file
-n, --no-readline disable enhanced command line editing (readline)
-o, --output=FILENAME send query results to file (or |pipe)
-q, --quiet run quietly (no messages, only query output)
-s, --single-step single-step mode (confirm each query)
-S, --single-line single-line mode (end of line terminates SQL command)
Output format options:
-A, --no-align unaligned table output mode
-F, --field-separator=STRING set field separator (default: "|")
-H, --html HTML table output mode
-P, --pset=VAR[=ARG] set printing option VAR to ARG (see \pset command)
-R, --record-separator=STRING
set record separator (default: newline)
-t, --tuples-only print rows only
-T, --table-attr=TEXT set HTML table tag attributes (e.g., width, border)
-x, --expanded turn on expanded table output
Connection options:
-h, --host=HOSTNAME database server host or socket directory (default: "sandbox.hortonworks.com")
-p, --port=PORT database server port (default: "10432")
-U, --username=USERNAME database user name (default: "gpadmin")
-w, --no-password never prompt for password
-W, --password force password prompt (should happen automatically)
For more information, type "\?" (for internal commands) or "\help" (for SQL
commands) from within psql, or consult the psql section in the PostgreSQL
documentation.
Report bugs to <pgsql-bugs@postgresql.org>.
Some of the key environment variables required to connect are PGPORT (-p), PGHOST(-h), PGDATABASE(-d) & -c. We can ensure that we have all these variables set
[gpadmin@sandbox ~]$ env | egrep 'PGHOST|PGPORT|PGDATABASE'
PGPORT=10432
PGDATABASE=mydemo
PGHOST=sandbox.hortonworks.co<br>
NOTE: Having these variables configured is a matter of convenience, else you would end up typing these variables. Here is how we would need to connect if these variables are not set
[gpadmin@sandbox ~]$ psql -h 172.16.105.137 -p 10432 -d mydemo
psql (8.2.15)
Type "help" for help.
mydemo=#
We can also execute commands from CLI without going into the psql shell and execute commands from the shell
[gpadmin@sandbox ~]$ psql -h 172.16.105.137 -p 10432 -d mydb -c "select * from mytable limit 5"
col1 | col2 | col3
------+------+------
0 | 0 | 0
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
(5 rows)
NOTE We can familiarize ourselves with the utilities and syntaxes later on since the sandbox is meant to tryout and break stuff. Syntaxes are terminated for SQL statements with a semicolon ";" similar to that in hive.
Accessing help from within the psql prompt requires you to type "\?" i.e., a backslash followed by a question mark
[gpadmin@sandbox ~]$ psql
psql (8.2.15)
Type "help" for help.
mydemo=# \?
General
\copyright show PostgreSQL usage and distribution terms
\g [FILE] or ; execute query (and send results to file or |pipe)
\h [NAME] help on syntax of SQL commands, * for all commands
\q quit psql
Query Buffer
\e [FILE] edit the query buffer (or file) with external editor
\ef [FUNCNAME] edit function definition with external editor
\p show the contents of the query buffer
\r reset (clear) the query buffer
\s [FILE] display history or save it to file
\w FILE write query buffer to file
Input/Output
\copy ... perform SQL COPY with data stream to the client host
\echo [STRING] write string to standard output
\i FILE execute commands from file
\o [FILE] send all query results to file or |pipe
\qecho [STRING] write string to query output stream (see \o)
Informational
(options: S = show system objects, + = additional detail)
\d[S+] list tables, views, and sequences
\d[S+] NAME describe table, view, sequence, or index
\da[S] [PATTERN] list aggregates
......
There are number of switches that can help, we will look at the most common ones. Here is the shortcut to display user tables within a specific schema.
mydb=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+---------+-------+---------+-------------
public | mytable | table | gpadmin | append only
(1 row)
mydb=#
NOTE To display tables from all the schemas within a database, we can use wildcard characters
mydb=# \dt *.*
List of relations
Schema | Name | Type | Owner | Storage
--------------------+-------------------------------+-------+---------+-------------
information_schema | sql_features | table | gpadmin | heap
information_schema | sql_implementation_info | table | gpadmin | heap
information_schema | sql_languages | table | gpadmin | heap
information_schema | sql_packages | table | gpadmin | heap
information_schema | sql_parts | table | gpadmin | heap
information_schema | sql_sizing | table | gpadmin | heap
information_schema | sql_sizing_profiles | table | gpadmin | heap
To display schema, better known as namespace, we can use "\dn"
mydb=# \dn
List of schemas
Name | Owner
--------------------+---------
hawq_toolkit | gpadmin
information_schema | gpadmin
pg_aoseg | gpadmin
pg_bitmapindex | gpadmin
pg_catalog | gpadmin
pg_toast | gpadmin
public | gpadmin
(7 rows)
mydb=#
To display a list of all the databases within a cluster, we can use "\l"
mydb=# \l
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+-------------------
mydb | gpadmin | UTF8 |
mydemo | gpadmin | UTF8 |
postgres | gpadmin | UTF8 |
template0 | gpadmin | UTF8 |
template1 | gpadmin | UTF8 |
(5 rows)
mydb=#
Viewing list of users within a cluster (Users are independent of databases, unlike in Oracle). Each user can have access to multiple databases, schemas and objects. To display all the users and roles, we can use "\du"
mydb=# \du
List of roles
Role name | Attributes | Member of
-----------+-----------------------------------+-----------
gpadmin | Superuser, Create role, Create DB |
mydb=#
There are multiple commands within psql shell like SELECT, CREATE, GRANT and so on. In order to read help for these commands, a "\h" i.e., backslash followed by "h" helps print the details. Here is an example mydb=# \h create user mydb=# \h create user
Command: CREATE USER
Description: define a new database role
Syntax:
CREATE USER name [ [ WITH ] option [ ... ] ]
where option can be:
SUPERUSER | NOSUPERUSER
| CREATEDB | NOCREATEDB
| CREATEROLE | NOCREATEROLE
| CREATEUSER | NOCREATEUSER
| INHERIT | NOINHERIT
| LOGIN | NOLOGIN
| CONNECTION LIMIT connlimit
| [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password'
| VALID UNTIL 'timestamp'
| IN ROLE rolename [, ...]
| IN GROUP rolename [, ...]
| ROLE rolename [, ...]
| ADMIN rolename [, ...]
| USER rolename [, ...]
| SYSID uid
| RESOURCE QUEUE queuename
mydb=# \h alter database
Command: ALTER DATABASE
Description: change a database
Syntax:
ALTER DATABASE name [ [ WITH ] option [ ... ] ]
where option can be:
CONNECTION LIMIT connlimit
ALTER DATABASE name SET parameter { TO | = } { value | DEFAULT }
ALTER DATABASE name RESET parameter
ALTER DATABASE name RENAME TO newname
ALTER DATABASE name OWNER TO new_owner
mydb=#
NOTE Total number of key Catalog tables in HAWQ are close to 100. This provides the users with an option to write their own views to sort users/objects and relations rather than learning about thousands of predefined views. Again, playing around with the utility within the Sandbox can help users get familiar with the syntax.
... View more
06-15-2016
07:20 PM
Goals Setup HDB 2.0 on HDP 2.4.0.0 Sandbox Access HDB 2.0 via pgAdmin3 for interactive access Notes This effort is to get up and running with HDB 2.0 on Hortonworks 2.4.0.0 Tryout Sandbox. This steps mentioned here are not intended for production usage and should be merely used as reference. HDB, a.k.a HAWQ will eventually be integrated as a service similar to other addons like Hive, Hbase etcetera This setup was completed using HDP 2.4.0.0 sandbox which can be downloaded here Article assumes that Sandbox is up and running on VMWare Fusion or Virtual box Reference for this article https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install For the purpose of convenience, we will use HAWQ as the term for HDB 2.0 Installing HAWQ on HDP 2.4.0.0 HAWQ 2.0 is one of the latest release from Pivotal and can be configured in other version of HDP 2.x as reflected in the articles here: https://community.hortonworks.com/content/kbentry/20420/install-apache-hawq-on-hdp-234.html
https://community.hortonworks.com/content/kbentry/34193/install-hdb-hawq-via-ambari-and-use-zeppelin-for-v.html Login to Sandbox via Terminal if you are using an OSX or via Putty and ensure that you have superuser privileges. Once set, create a directory for dependencies and other binaries that we would use throughout this article [root@sandbox ~]# mkdir -p /stage Upgrade the sandbox to avoid any dependency issues [root@sandbox stage]# yum update Login to the Pivotal's binary download portal via network.pivotal.io & download the following binaries & copy them over to the Hortonworks Sandbox virtual machine hdb-ambari-plugin-2.0.0-448.tar.gz
hdb-2.0.0.0-22126.tar.gz
Uncompress and untar the the directories and setup repository [root@sandbox stage]# ls -lrth
total 146M
-rw-r--r-- 1 root root 25K Jun 15 17:30 hdb-ambari-plugin-2.0.0-448.tar.gz
-rw-r--r-- 1 root root 146M Jun 15 17:30 hdb-2.0.0.0-22126.tar.gz
[root@sandbox stage]# tar -xzf hdb-2.0.0.0-22126.tar.gz
[root@sandbox stage]# tar -xzf hdb-ambari-plugin-2.0.0-448.tar.gz
[root@sandbox stage]# bash hdb-2.0.0.0/setup_repo.sh
HDB Repo file successfully created at /etc/yum.repos.d/HDB.repo.
Use http://sandbox.hortonworks.com/HDB to access the repository.
[root@sandbox stage]#
[root@sandbox stage]# bash hdb-ambari-plugin-2.0.0/setup_repo.sh
HDB-AMBARI-PLUGIN Repo file successfully created at /etc/yum.repos.d/HDB-AMBARI-PLUGIN.repo.
Use http://sandbox.hortonworks.com/HDB-AMBARI-PLUGIN to access the repository. Verify if the setup is configured for HAWQ as well as Ambari Plugin Matched from:[root@sandbox stage]# yum provides hdb\*
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
* base: mirrors.lga7.us.voxel.net
* epel: mirror.steadfast.net
* extras: pubmirrors.dal.corespace.com
* updates: mirrors.cmich.edu
hdb-ambari-plugin-2.0.0-448.noarch : hdb-ambari-plugin
Repo : HDB-AMBARI-PLUGIN
Matched from:
Other : hdb-ambari-plugin = 2.0.0-448
[root@sandbox stage]# yum provides hawq
Loaded plugins: fastestmirror, priorities
Loading mirror speeds from cached hostfile
* base: mirrors.lga7.us.voxel.net
* epel: mirror.steadfast.net
* extras: pubmirrors.dal.corespace.com
* updates: mirrors.cmich.edu
hawq-2.0.0.0-22126.x86_64 : Pivotal HDB, Hadoop Native SQL powered by Apache HAWQ (incubating)
Repo : HDB
Matched from: Install hdb-ambari-plugin for HAWQ [root@sandbox stage]# yum install hdb-ambari-plugin Login to the Ambari web portal and verify that HAWQ is available as a service which can be added just like any other service Add this custom property to hdfs-site.xml via Ambari and the value should be set to true dfs.allow.truncate Restart HDFS service via Ambari Proceed with adding HAWQ via Ambari as a new service During the "Customize Services" phase, enter port number 10432 or anything beyond linux internal ports as 5432 is reserved by Ambari for storing its metadata, in postgres database. Proceed with configuration and deploy the setup should complete, however, with warnings NOTE: HAWQ tries to initialize the cluster with default/hardcoded parallel connections and shared buffers which are 3000 and 4000 by default. Manually initialize HAWQ from command line by reducing the shared_buffers and max_connections as gpadmin user [root@sandbox stage]# su - gpadmin
[gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500 This should bring up the cluster which can be tested and tried out. [gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init'
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20160615.log
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-GPHOME is set to:
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/usr/local/hawq/.
20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init hawq with args: ['init', 'cluster']
Continue with HAWQ init Yy|Nn (default=N):
> y
20160615:18:32:25:046085 hawq_init:sandbox:gpadmin-[INFO]:-No standby host configured, skip it
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Check if hdfs path is available
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[WARNING]:-2016-06-15 18:32:26.024369, p46198, th140320952715232, WARNING the number of nodes in pipeline is 1 [sandbox.hortonworks.com(172.16.105.137)], is less than the expected number of replica 3 for block [block pool ID: BP-267552868-172.16.137.143-1457691099567 block ID 1073742404_1585] file /hawq_default/testFile
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 segment hosts defined
20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Set default_hash_table_bucket_number as: 6
20160615:18:32:31:046085 hawq_init:sandbox:gpadmin-[INFO]:-Start to init master node: 'sandbox.hortonworks.com'
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-20160615:18:32:39:046409 hawqinit.sh:sandbox:gpadmin-[INFO]:-Loading hawq_toolkit...
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Master init successfully
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init segments in list: ['sandbox.hortonworks.com']
20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Total segment number is: 1
.........
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 of 1 segments init successfully
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Segments init successfully on nodes '['sandbox.hortonworks.com']'
20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init HAWQ cluster successfully Verifying HAWQ database access by creating a database and a table within [gpadmin@sandbox ~]$ psql template1
psql (8.2.15)
Type "help" for help.
template1=# create database mydb;
CREATE DATABASE
template1=# \c mydb
You are now connected to database "mydb" as user "gpadmin".
mydb=#
mydb=#
mydb=# CREATE TABLE mytable (col1 int, col2 int, col3 int);
CREATE TABLE
mydb=# INSERT INTO mytable select i,i,i from generate_series(0,1000)i;
INSERT 0 1001
mydb=# SELECT count(*) from mytable;
count
-------
1001
(1 row)
mydb=#
At this point it should be manageable by Ambari
... View more
06-15-2016
01:22 AM
2 Kudos
HAWQ is the Greenplum Database/Warehouse engine implemented on top of Hadoop Filesystem. To know GreenplumIt gives an edge to understand how relations are stored within Postgres OR Greenplum database (since Greenplum is based on Postgres, of course with certain optimizations to make it work in parallel and have a master/slave architecture).
Unlike hive, a schema is an object within a database. In HAWQs world, this is the setup.
HAWQ Cluster - Combination of HAWQ Master, Standby Master and Segment Hosts. All these machines are physical
MASTER: Is a postgres master process, linux process, that can share the space with name node depending on the configuration and capacity of the Namenode.
STANDBY: Is a replica of MASTER process which includes the logs and metadata information, transferred via log shipping ensuring near realtime replication. Can share the space with Standby/Secondary Namenode host.
SEGMENT: Also known as compute nodes, are physical hosts which can host one or more processes termed as "instances" of postgres process each allocated with their own storage space, memory, CPU. The Segment instances can share the pace with data nodes, depending on the workload.
Database
Is a logical unit within HAWQ Cluster. One cluster can have more than one database, however, keeping this within a lower limit makes it easier to manage resources and maintenance. The orange container within the cluster is representation of a database. This container segregates the data specific to one database within a cluster.
The container OR resource that is located on master and standby servers contain metadata information which are crucial and used while generating optimized plans to execute a query. There is a logical container located even at segment servers which helps segments segregate the database at the segment level.
Schema & Tables
Schema is a logical unit within each database i.e., Schema is private to a database. A simple example to understand this would be something like
Database is like a school and Schema is like Standards. In real life scenarios, the schema can contain tables specific to a department, for instance, Finance, Marketing, Sales etcetera. This figure below describes the layout of how tables are located, pinned within each schema.
Here is an actual demonstration of how all of these look together when accessing a HAWQ cluster using CLI. PSQL or better "psql" is tool similar to mysql binary to access the HAWQ cluster using command line.
Here is how you can login to the HAWQ server: [gpadmin@sandbox ~]$ psql -h 172.16.105.136 -p 10432 test
psql (8.2.15)
Type "help" for help.
test=#
Here, psql is the client. "-h" represents the host on which we have installed the cluster. "-p" represents the port on which master is listening for clients/connections and finally "test" is the name of the database. So in this case, test is our database. We can now check the schemas within a database, there are some metadata schemas which are present as soon as a database is created, lets look at the schema list for test. test=# \dn
List of schemas
Name | Owner
--------------------+---------
hawq_toolkit | gpadmin
information_schema | gpadmin
pg_aoseg | gpadmin
pg_bitmapindex | gpadmin
pg_catalog | gpadmin
pg_toast | gpadmin
public | gpadmin
(7 rows)
test=#
These seven schemas are metadata schemas, should not be used for user objects except public. Creating an schema would create a logical separator for objects which are crucial to maintaining data governance and separation. Here is an example. test=# create schema finance;
CREATE SCHEMA
test=# \dn
List of schemas
Name | Owner
--------------------+---------
finance | gpadmin <<<<<<<<<<<<<<<
hawq_toolkit | gpadmin
information_schema | gpadmin
pg_aoseg | gpadmin
pg_bitmapindex | gpadmin
pg_catalog | gpadmin
pg_toast | gpadmin
public | gpadmin
(8 rows)
test=#
We create a schema "finance" which is now listed once we type "\dn" short for display namespace. We will now create table within this schema and see how its laid out. test=# create table finance.testtable (col1 int, col2 int, col3 double precision, col4 text);
CREATE TABLE
test=# \dt finance.testtable
List of relations
Schema | Name | Type | Owner | Storage
---------+-----------+-------+---------+-------------
finance | testtable | table | gpadmin | append only
(1 row)
test=# \d finance.testtable
Append-Only Table "finance.testtable"
Column | Type | Modifiers
--------+------------------+-----------
col1 | integer |
col2 | integer |
col3 | double precision |
col4 | text |
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: f
Distributed randomly
... View more
Labels:
06-03-2016
07:50 PM
I am able to reproduce this via CLI as well. For a refresher, working on a Sandbox downloaded from Hortonworks site, also, when I manually specify the HADOOP_CLASSPATH and HADOOP_CLIENT_OPTS via command line, I get it to work. I got the classpath and hadoop client opts using "pig -printCmdDebug -x tez -useHCatalog" there are indeed some additional jar's that are included in the hadoop class path and client opts.
... View more
06-03-2016
04:09 PM
Tried that one, didn't work. I am trying to use system DSN by the way, however, I do not suppose it really matters. The error is the same, it looks like it didn't even try to look at the new file.
... View more
- « Previous
- Next »