Created on 06-15-2016 01:22 AM - edited 08-17-2019 12:01 PM
HAWQ is the Greenplum Database/Warehouse engine implemented on top of Hadoop Filesystem. To know GreenplumIt gives an edge to understand how relations are stored within Postgres OR Greenplum database (since Greenplum is based on Postgres, of course with certain optimizations to make it work in parallel and have a master/slave architecture).
Unlike hive, a schema is an object within a database. In HAWQs world, this is the setup.
HAWQ Cluster - Combination of HAWQ Master, Standby Master and Segment Hosts. All these machines are physical
MASTER: Is a postgres master process, linux process, that can share the space with name node depending on the configuration and capacity of the Namenode.
STANDBY: Is a replica of MASTER process which includes the logs and metadata information, transferred via log shipping ensuring near realtime replication. Can share the space with Standby/Secondary Namenode host.
SEGMENT: Also known as compute nodes, are physical hosts which can host one or more processes termed as "instances" of postgres process each allocated with their own storage space, memory, CPU. The Segment instances can share the pace with data nodes, depending on the workload.
Is a logical unit within HAWQ Cluster. One cluster can have more than one database, however, keeping this within a lower limit makes it easier to manage resources and maintenance. The orange container within the cluster is representation of a database. This container segregates the data specific to one database within a cluster.
The container OR resource that is located on master and standby servers contain metadata information which are crucial and used while generating optimized plans to execute a query. There is a logical container located even at segment servers which helps segments segregate the database at the segment level.
Schema & Tables
Schema is a logical unit within each database i.e., Schema is private to a database. A simple example to understand this would be something like Database is like a school and Schema is like Standards. In real life scenarios, the schema can contain tables specific to a department, for instance, Finance, Marketing, Sales etcetera. This figure below describes the layout of how tables are located, pinned within each schema.
Here is an actual demonstration of how all of these look together when accessing a HAWQ cluster using CLI. PSQL or better "psql" is tool similar to mysql binary to access the HAWQ cluster using command line.
Here is how you can login to the HAWQ server:
[gpadmin@sandbox ~]$ psql -h 172.16.105.136 -p 10432 test psql (8.2.15) Type "help" for help. test=#
Here, psql is the client. "-h" represents the host on which we have installed the cluster. "-p" represents the port on which master is listening for clients/connections and finally "test" is the name of the database. So in this case, test is our database.
We can now check the schemas within a database, there are some metadata schemas which are present as soon as a database is created, lets look at the schema list for test.
test=# \dn List of schemas Name | Owner --------------------+--------- hawq_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (7 rows) test=#
These seven schemas are metadata schemas, should not be used for user objects except public. Creating an schema would create a logical separator for objects which are crucial to maintaining data governance and separation. Here is an example.
test=# create schema finance; CREATE SCHEMA test=# \dn List of schemas Name | Owner --------------------+--------- finance | gpadmin <<<<<<<<<<<<<<< hawq_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (8 rows) test=#
We create a schema "finance" which is now listed once we type "\dn" short for display namespace. We will now create table within this schema and see how its laid out.
test=# create table finance.testtable (col1 int, col2 int, col3 double precision, col4 text); CREATE TABLE test=# \dt finance.testtable List of relations Schema | Name | Type | Owner | Storage ---------+-----------+-------+---------+------------- finance | testtable | table | gpadmin | append only (1 row) test=# \d finance.testtable Append-Only Table "finance.testtable" Column | Type | Modifiers --------+------------------+----------- col1 | integer | col2 | integer | col3 | double precision | col4 | text | Compression Type: None Compression Level: 0 Block Size: 32768 Checksum: f Distributed randomly