About srai1

srai1 · ‎07-08-2016

https://community.hortonworks.com/questions/37412/cannot-obtain-block-length-for-locatedblock.html

srai1 · ‎06-30-2016

Can you share the error message and log file for ambari server "/var/log/ambari-server/"

srai1 · ‎06-30-2016

hey Abhijeet, not sure if you have tried this. "pig -printCmdDebug -x tez -useHcatalog" Let me know if this does not work for you.

srai1 · ‎06-21-2016

Goals Get familiar with psql command line CLI interface for HAWQ Understanding how to access and list the options available for command line interface Accessing help from within the CLI NOTES Requirement for above is to ensure that there is either a sandbox configured with HAWQ A SSH interface to the sandbox either via HOST/remote OSX Or Putty when working with Windows Connecting to the Guest Sandbox Machine Depending upon the setup i.e., VirtualBox or VMWare Fusion for HAWQ & HDP, you can choose the connection method. For connecting via VMWare Fusion, you need to be aware of the IP address allocated to your virtual box. Connecting via VMWare Fusion HW13382:ODBC srai$ ssh root@172.16.105.137 -p 22 root@172.16.105.137's password: Last login: Mon Jun 20 15:31:20 2016 from 172.16.105.1 [root@sandbox ~]# Connecting via Virtualbox HW13382:ODBC srai$ ssh root@localhost -p 2222 root@172.16.105.137's password: Last login: Mon Jun 20 15:31:29 2016 from 172.16.105.1 [root@sandbox ~]# NOTE: HAWQ has a preconfigured user account associated with it, similar to hive, known as "gpadmin". This by default is the superuser for the cluster and has all the privileges. For our exercise, we will be using this account. Switch from root user account to "gpadmin" user [root@sandbox ~]# su - gpadmin [gpadmin@sandbox ~]$ Ensure that you have the binaries set in the environment, usually this should be taken care by itself during the installation [gpadmin@sandbox ~]$ env | egrep GPHOME GPHOME=/usr/local/hawq/. [gpadmin@sandbox ~]$ which psql /usr/local/hawq/bin/psql [gpadmin@sandbox ~]$ Now that we can locate the "psql" utility, lets list down the options[gpadmin@sandbox ~]$ psql -? This is psql 8.2.15, the PostgreSQL interactive terminal (Greenplum version). Usage: psql [OPTION]... [DBNAME [USERNAME]] General options: -c, --command=COMMAND run only single command (SQL or internal) and exit -d, --dbname=DBNAME database name to connect to (default: "gpadmin") -f, --file=FILENAME execute commands from file, then exit -l, --list list available databases, then exit -v, --set=, --variable=NAME=VALUE set psql variable NAME to VALUE -X, --no-psqlrc do not read startup file (~/.psqlrc) -1 ("one"), --single-transaction execute command file as a single transaction --help show this help, then exit --version output version information, then exit Input and output options: -a, --echo-all echo all input from script -e, --echo-queries echo commands sent to server -E, --echo-hidden display queries that internal commands generate -L, --log-file=FILENAME send session log to file -n, --no-readline disable enhanced command line editing (readline) -o, --output=FILENAME send query results to file (or |pipe) -q, --quiet run quietly (no messages, only query output) -s, --single-step single-step mode (confirm each query) -S, --single-line single-line mode (end of line terminates SQL command) Output format options: -A, --no-align unaligned table output mode -F, --field-separator=STRING set field separator (default: "|") -H, --html HTML table output mode -P, --pset=VAR[=ARG] set printing option VAR to ARG (see \pset command) -R, --record-separator=STRING set record separator (default: newline) -t, --tuples-only print rows only -T, --table-attr=TEXT set HTML table tag attributes (e.g., width, border) -x, --expanded turn on expanded table output Connection options: -h, --host=HOSTNAME database server host or socket directory (default: "sandbox.hortonworks.com") -p, --port=PORT database server port (default: "10432") -U, --username=USERNAME database user name (default: "gpadmin") -w, --no-password never prompt for password -W, --password force password prompt (should happen automatically) For more information, type "\?" (for internal commands) or "\help" (for SQL commands) from within psql, or consult the psql section in the PostgreSQL documentation. Report bugs to <pgsql-bugs@postgresql.org>. Some of the key environment variables required to connect are PGPORT (-p), PGHOST(-h), PGDATABASE(-d) & -c. We can ensure that we have all these variables set [gpadmin@sandbox ~]$ env | egrep 'PGHOST|PGPORT|PGDATABASE' PGPORT=10432 PGDATABASE=mydemo PGHOST=sandbox.hortonworks.co<br> NOTE: Having these variables configured is a matter of convenience, else you would end up typing these variables. Here is how we would need to connect if these variables are not set [gpadmin@sandbox ~]$ psql -h 172.16.105.137 -p 10432 -d mydemo psql (8.2.15) Type "help" for help. mydemo=# We can also execute commands from CLI without going into the psql shell and execute commands from the shell [gpadmin@sandbox ~]$ psql -h 172.16.105.137 -p 10432 -d mydb -c "select * from mytable limit 5" col1 | col2 | col3 ------+------+------ 0 | 0 | 0 1 | 1 | 1 2 | 2 | 2 3 | 3 | 3 4 | 4 | 4 (5 rows) NOTE We can familiarize ourselves with the utilities and syntaxes later on since the sandbox is meant to tryout and break stuff. Syntaxes are terminated for SQL statements with a semicolon ";" similar to that in hive. Accessing help from within the psql prompt requires you to type "\?" i.e., a backslash followed by a question mark [gpadmin@sandbox ~]$ psql psql (8.2.15) Type "help" for help. mydemo=# \? General \copyright show PostgreSQL usage and distribution terms \g [FILE] or ; execute query (and send results to file or |pipe) \h [NAME] help on syntax of SQL commands, * for all commands \q quit psql Query Buffer \e [FILE] edit the query buffer (or file) with external editor \ef [FUNCNAME] edit function definition with external editor \p show the contents of the query buffer \r reset (clear) the query buffer \s [FILE] display history or save it to file \w FILE write query buffer to file Input/Output \copy ... perform SQL COPY with data stream to the client host \echo [STRING] write string to standard output \i FILE execute commands from file \o [FILE] send all query results to file or |pipe \qecho [STRING] write string to query output stream (see \o) Informational (options: S = show system objects, + = additional detail) \d[S+] list tables, views, and sequences \d[S+] NAME describe table, view, sequence, or index \da[S] [PATTERN] list aggregates ...... There are number of switches that can help, we will look at the most common ones. Here is the shortcut to display user tables within a specific schema. mydb=# \dt List of relations Schema | Name | Type | Owner | Storage --------+---------+-------+---------+------------- public | mytable | table | gpadmin | append only (1 row) mydb=# NOTE To display tables from all the schemas within a database, we can use wildcard characters mydb=# \dt *.* List of relations Schema | Name | Type | Owner | Storage --------------------+-------------------------------+-------+---------+------------- information_schema | sql_features | table | gpadmin | heap information_schema | sql_implementation_info | table | gpadmin | heap information_schema | sql_languages | table | gpadmin | heap information_schema | sql_packages | table | gpadmin | heap information_schema | sql_parts | table | gpadmin | heap information_schema | sql_sizing | table | gpadmin | heap information_schema | sql_sizing_profiles | table | gpadmin | heap To display schema, better known as namespace, we can use "\dn" mydb=# \dn List of schemas Name | Owner --------------------+--------- hawq_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (7 rows) mydb=# To display a list of all the databases within a cluster, we can use "\l" mydb=# \l List of databases Name | Owner | Encoding | Access privileges -----------+---------+----------+------------------- mydb | gpadmin | UTF8 | mydemo | gpadmin | UTF8 | postgres | gpadmin | UTF8 | template0 | gpadmin | UTF8 | template1 | gpadmin | UTF8 | (5 rows) mydb=# Viewing list of users within a cluster (Users are independent of databases, unlike in Oracle). Each user can have access to multiple databases, schemas and objects. To display all the users and roles, we can use "\du" mydb=# \du List of roles Role name | Attributes | Member of -----------+-----------------------------------+----------- gpadmin | Superuser, Create role, Create DB | mydb=# There are multiple commands within psql shell like SELECT, CREATE, GRANT and so on. In order to read help for these commands, a "\h" i.e., backslash followed by "h" helps print the details. Here is an example mydb=# \h create user mydb=# \h create user Command: CREATE USER Description: define a new database role Syntax: CREATE USER name [ [ WITH ] option [ ... ] ] where option can be: SUPERUSER | NOSUPERUSER | CREATEDB | NOCREATEDB | CREATEROLE | NOCREATEROLE | CREATEUSER | NOCREATEUSER | INHERIT | NOINHERIT | LOGIN | NOLOGIN | CONNECTION LIMIT connlimit | [ ENCRYPTED | UNENCRYPTED ] PASSWORD 'password' | VALID UNTIL 'timestamp' | IN ROLE rolename [, ...] | IN GROUP rolename [, ...] | ROLE rolename [, ...] | ADMIN rolename [, ...] | USER rolename [, ...] | SYSID uid | RESOURCE QUEUE queuename mydb=# \h alter database Command: ALTER DATABASE Description: change a database Syntax: ALTER DATABASE name [ [ WITH ] option [ ... ] ] where option can be: CONNECTION LIMIT connlimit ALTER DATABASE name SET parameter { TO | = } { value | DEFAULT } ALTER DATABASE name RESET parameter ALTER DATABASE name RENAME TO newname ALTER DATABASE name OWNER TO new_owner mydb=# NOTE Total number of key Catalog tables in HAWQ are close to 100. This provides the users with an option to write their own views to sort users/objects and relations rather than learning about thousands of predefined views. Again, playing around with the utility within the Sandbox can help users get familiar with the syntax.

srai1 · ‎06-19-2016

thanks @bschofield, it worked like a charm

srai1 · ‎06-18-2016

Thank you @bschofield that worked like charm

srai1 · ‎06-15-2016

Goals Setup HDB 2.0 on HDP 2.4.0.0 Sandbox Access HDB 2.0 via pgAdmin3 for interactive access Notes This effort is to get up and running with HDB 2.0 on Hortonworks 2.4.0.0 Tryout Sandbox. This steps mentioned here are not intended for production usage and should be merely used as reference. HDB, a.k.a HAWQ will eventually be integrated as a service similar to other addons like Hive, Hbase etcetera This setup was completed using HDP 2.4.0.0 sandbox which can be downloaded here Article assumes that Sandbox is up and running on VMWare Fusion or Virtual box Reference for this article https://cwiki.apache.org/confluence/display/HAWQ/Build+and+Install For the purpose of convenience, we will use HAWQ as the term for HDB 2.0 Installing HAWQ on HDP 2.4.0.0 HAWQ 2.0 is one of the latest release from Pivotal and can be configured in other version of HDP 2.x as reflected in the articles here: https://community.hortonworks.com/content/kbentry/20420/install-apache-hawq-on-hdp-234.html https://community.hortonworks.com/content/kbentry/34193/install-hdb-hawq-via-ambari-and-use-zeppelin-for-v.html Login to Sandbox via Terminal if you are using an OSX or via Putty and ensure that you have superuser privileges. Once set, create a directory for dependencies and other binaries that we would use throughout this article [root@sandbox ~]# mkdir -p /stage Upgrade the sandbox to avoid any dependency issues [root@sandbox stage]# yum update Login to the Pivotal's binary download portal via network.pivotal.io & download the following binaries & copy them over to the Hortonworks Sandbox virtual machine hdb-ambari-plugin-2.0.0-448.tar.gz hdb-2.0.0.0-22126.tar.gz Uncompress and untar the the directories and setup repository [root@sandbox stage]# ls -lrth total 146M -rw-r--r-- 1 root root 25K Jun 15 17:30 hdb-ambari-plugin-2.0.0-448.tar.gz -rw-r--r-- 1 root root 146M Jun 15 17:30 hdb-2.0.0.0-22126.tar.gz [root@sandbox stage]# tar -xzf hdb-2.0.0.0-22126.tar.gz [root@sandbox stage]# tar -xzf hdb-ambari-plugin-2.0.0-448.tar.gz [root@sandbox stage]# bash hdb-2.0.0.0/setup_repo.sh HDB Repo file successfully created at /etc/yum.repos.d/HDB.repo. Use http://sandbox.hortonworks.com/HDB to access the repository. [root@sandbox stage]# [root@sandbox stage]# bash hdb-ambari-plugin-2.0.0/setup_repo.sh HDB-AMBARI-PLUGIN Repo file successfully created at /etc/yum.repos.d/HDB-AMBARI-PLUGIN.repo. Use http://sandbox.hortonworks.com/HDB-AMBARI-PLUGIN to access the repository. Verify if the setup is configured for HAWQ as well as Ambari Plugin Matched from:[root@sandbox stage]# yum provides hdb\* Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: mirrors.lga7.us.voxel.net * epel: mirror.steadfast.net * extras: pubmirrors.dal.corespace.com * updates: mirrors.cmich.edu hdb-ambari-plugin-2.0.0-448.noarch : hdb-ambari-plugin Repo : HDB-AMBARI-PLUGIN Matched from: Other : hdb-ambari-plugin = 2.0.0-448 [root@sandbox stage]# yum provides hawq Loaded plugins: fastestmirror, priorities Loading mirror speeds from cached hostfile * base: mirrors.lga7.us.voxel.net * epel: mirror.steadfast.net * extras: pubmirrors.dal.corespace.com * updates: mirrors.cmich.edu hawq-2.0.0.0-22126.x86_64 : Pivotal HDB, Hadoop Native SQL powered by Apache HAWQ (incubating) Repo : HDB Matched from: Install hdb-ambari-plugin for HAWQ [root@sandbox stage]# yum install hdb-ambari-plugin Login to the Ambari web portal and verify that HAWQ is available as a service which can be added just like any other service Add this custom property to hdfs-site.xml via Ambari and the value should be set to true dfs.allow.truncate Restart HDFS service via Ambari Proceed with adding HAWQ via Ambari as a new service During the "Customize Services" phase, enter port number 10432 or anything beyond linux internal ports as 5432 is reserved by Ambari for storing its metadata, in postgres database. Proceed with configuration and deploy the setup should complete, however, with warnings NOTE: HAWQ tries to initialize the cluster with default/hardcoded parallel connections and shared buffers which are 3000 and 4000 by default. Manually initialize HAWQ from command line by reducing the shared_buffers and max_connections as gpadmin user [root@sandbox stage]# su - gpadmin [gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500 This should bring up the cluster which can be tested and tried out. [gpadmin@sandbox ~]$ hawq init cluster --max_connections 15 --shared_buffers 500 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Prepare to do 'hawq init' 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-You can find log in: 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/home/gpadmin/hawqAdminLogs/hawq_init_20160615.log 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-GPHOME is set to: 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-/usr/local/hawq/. 20160615:18:32:24:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init hawq with args: ['init', 'cluster'] Continue with HAWQ init Yy|Nn (default=N): > y 20160615:18:32:25:046085 hawq_init:sandbox:gpadmin-[INFO]:-No standby host configured, skip it 20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Check if hdfs path is available 20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[WARNING]:-2016-06-15 18:32:26.024369, p46198, th140320952715232, WARNING the number of nodes in pipeline is 1 [sandbox.hortonworks.com(172.16.105.137)], is less than the expected number of replica 3 for block [block pool ID: BP-267552868-172.16.137.143-1457691099567 block ID 1073742404_1585] file /hawq_default/testFile 20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 segment hosts defined 20160615:18:32:26:046085 hawq_init:sandbox:gpadmin-[INFO]:-Set default_hash_table_bucket_number as: 6 20160615:18:32:31:046085 hawq_init:sandbox:gpadmin-[INFO]:-Start to init master node: 'sandbox.hortonworks.com' 20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-20160615:18:32:39:046409 hawqinit.sh:sandbox:gpadmin-[INFO]:-Loading hawq_toolkit... 20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Master init successfully 20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init segments in list: ['sandbox.hortonworks.com'] 20160615:18:32:40:046085 hawq_init:sandbox:gpadmin-[INFO]:-Total segment number is: 1 ......... 20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-1 of 1 segments init successfully 20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Segments init successfully on nodes '['sandbox.hortonworks.com']' 20160615:18:32:49:046085 hawq_init:sandbox:gpadmin-[INFO]:-Init HAWQ cluster successfully Verifying HAWQ database access by creating a database and a table within [gpadmin@sandbox ~]$ psql template1 psql (8.2.15) Type "help" for help. template1=# create database mydb; CREATE DATABASE template1=# \c mydb You are now connected to database "mydb" as user "gpadmin". mydb=# mydb=# mydb=# CREATE TABLE mytable (col1 int, col2 int, col3 int); CREATE TABLE mydb=# INSERT INTO mytable select i,i,i from generate_series(0,1000)i; INSERT 0 1001 mydb=# SELECT count(*) from mytable; count ------- 1001 (1 row) mydb=# At this point it should be manageable by Ambari

srai1 · ‎06-15-2016

HAWQ is the Greenplum Database/Warehouse engine implemented on top of Hadoop Filesystem. To know GreenplumIt gives an edge to understand how relations are stored within Postgres OR Greenplum database (since Greenplum is based on Postgres, of course with certain optimizations to make it work in parallel and have a master/slave architecture). Unlike hive, a schema is an object within a database. In HAWQs world, this is the setup. HAWQ Cluster - Combination of HAWQ Master, Standby Master and Segment Hosts. All these machines are physical MASTER: Is a postgres master process, linux process, that can share the space with name node depending on the configuration and capacity of the Namenode. STANDBY: Is a replica of MASTER process which includes the logs and metadata information, transferred via log shipping ensuring near realtime replication. Can share the space with Standby/Secondary Namenode host. SEGMENT: Also known as compute nodes, are physical hosts which can host one or more processes termed as "instances" of postgres process each allocated with their own storage space, memory, CPU. The Segment instances can share the pace with data nodes, depending on the workload. Database Is a logical unit within HAWQ Cluster. One cluster can have more than one database, however, keeping this within a lower limit makes it easier to manage resources and maintenance. The orange container within the cluster is representation of a database. This container segregates the data specific to one database within a cluster. The container OR resource that is located on master and standby servers contain metadata information which are crucial and used while generating optimized plans to execute a query. There is a logical container located even at segment servers which helps segments segregate the database at the segment level. Schema & Tables Schema is a logical unit within each database i.e., Schema is private to a database. A simple example to understand this would be something like Database is like a school and Schema is like Standards. In real life scenarios, the schema can contain tables specific to a department, for instance, Finance, Marketing, Sales etcetera. This figure below describes the layout of how tables are located, pinned within each schema. Here is an actual demonstration of how all of these look together when accessing a HAWQ cluster using CLI. PSQL or better "psql" is tool similar to mysql binary to access the HAWQ cluster using command line. Here is how you can login to the HAWQ server: [gpadmin@sandbox ~]$ psql -h 172.16.105.136 -p 10432 test psql (8.2.15) Type "help" for help. test=# Here, psql is the client. "-h" represents the host on which we have installed the cluster. "-p" represents the port on which master is listening for clients/connections and finally "test" is the name of the database. So in this case, test is our database. We can now check the schemas within a database, there are some metadata schemas which are present as soon as a database is created, lets look at the schema list for test. test=# \dn List of schemas Name | Owner --------------------+--------- hawq_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (7 rows) test=# These seven schemas are metadata schemas, should not be used for user objects except public. Creating an schema would create a logical separator for objects which are crucial to maintaining data governance and separation. Here is an example. test=# create schema finance; CREATE SCHEMA test=# \dn List of schemas Name | Owner --------------------+--------- finance | gpadmin <<<<<<<<<<<<<<< hawq_toolkit | gpadmin information_schema | gpadmin pg_aoseg | gpadmin pg_bitmapindex | gpadmin pg_catalog | gpadmin pg_toast | gpadmin public | gpadmin (8 rows) test=# We create a schema "finance" which is now listed once we type "\dn" short for display namespace. We will now create table within this schema and see how its laid out. test=# create table finance.testtable (col1 int, col2 int, col3 double precision, col4 text); CREATE TABLE test=# \dt finance.testtable List of relations Schema | Name | Type | Owner | Storage ---------+-----------+-------+---------+------------- finance | testtable | table | gpadmin | append only (1 row) test=# \d finance.testtable Append-Only Table "finance.testtable" Column | Type | Modifiers --------+------------------+----------- col1 | integer | col2 | integer | col3 | double precision | col4 | text | Compression Type: None Compression Level: 0 Block Size: 32768 Checksum: f Distributed randomly

srai1 · ‎06-03-2016

I am able to reproduce this via CLI as well. For a refresher, working on a Sandbox downloaded from Hortonworks site, also, when I manually specify the HADOOP_CLASSPATH and HADOOP_CLIENT_OPTS via command line, I get it to work. I got the classpath and hadoop client opts using "pig -printCmdDebug -x tez -useHCatalog" there are indeed some additional jar's that are included in the hadoop class path and client opts.

srai1 · ‎06-03-2016

Tried that one, didn't work. I am trying to use system DSN by the way, however, I do not suppose it really matters. The error is the same, it looks like it didn't even try to look at the new file.

Online	Offline
Last Visited	‎10-05-2018 02:57 PM

Member Since	‎05-10-2016 03:24 AM
Last Visited	‎10-05-2018 02:57 PM
Posts	184
Kudos received	60

Cloudera Community

Re: Anonymous user requests to access on Hive HDFS...

Re: How to change the default logs path of HUE?

Re: While loading the data from external hive tabl...

Re: Hive Security using Apache Ranger

Re: File View Error: Unauthorized connection for s...

Re: Unable to load into HBase due to error with da...

Re: How can I enable pig -useHCatalog via Ambari. ...

Re: How can I enable pig -useHCatalog via Ambari. ...

Getting familiar with HAWQs Command line Interface

Re: Unable to connect Excel 2016 for MAC to Hive

Re: Unable to connect Excel 2016 for MAC to Hive

Installing HAWQ on 2.4.0.0 Hortonworks Sandbox

HAWQ Hierarchy basics

Re: How can I enable pig -useHCatalog via Ambari. ...

Re: Unable to connect Excel 2016 for MAC to Hive