Member since
09-29-2015
286
Posts
598
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8293 | 03-21-2017 07:34 PM | |
1424 | 11-16-2016 04:18 AM | |
781 | 10-18-2016 03:57 PM | |
2620 | 09-12-2016 03:36 PM | |
2650 | 08-25-2016 09:01 PM |
04-13-2018
03:08 PM
@Dominika Bialek Looks like the CloudBreak 2.5 docs were removed.
... View more
03-21-2017
07:34 PM
4 Kudos
WITH t1nulltest AS
( select date_column
,SUM(IF(c1 IS NULL,1,0)) OVER (PARTITION BY date_column) as c1null
,SUM(IF(c2 IS NULL,1,0)) OVER (PARTITION BY date_column) as c2null
,SUM(IF(c3 IS NULL,1,0)) OVER (PARTITION BY date_column) as c3null
,SUM(IF(c4 IS NULL,1,0)) OVER (PARTITION BY date_column) as c4null
,SUM(IF(c5 IS NULL,1,0)) OVER (PARTITION BY date_column) as c5null
,row_number() OVER (PARTITION BY date_column) as rowno
from t1)
select
date_column, c1null, c2null,c3null,c4null,c5null from t1nulltest
where rowno =1;
... View more
03-03-2017
07:33 PM
#3.. In this case flume is connecting to HBase via Phoenix JDBC. So the question is if we need to do something for the JDBC connection to secure with SSL..
... View more
02-10-2017
05:21 AM
This was awesome Tim
... View more
02-06-2017
09:58 PM
No it doesn't. You have to run additional scripts to delete HDP
... View more
02-04-2017
10:48 PM
@Sunile Manjee Can you clearly state what you entered for zeppelin.jdbc.principal? Is this the Hive Principal or the Zeppelin Principal with Key tab.
Also what exactly is in the URL for the JDBC Interpreter? Just: jdbc:hive2://HiveHost:10000/default;principal=hive/_HOST@MY-REALM.COM Finally, did you have to copy the Hive JDBC Jars or create softlinks to Zeppelin /usr/hdp/current/zeppelin-server/interpreter/jdbc
... View more
01-19-2017
04:10 PM
3 Kudos
HDB 2.1.1 Reference: http://hdb.docs.pivotal.io/211 http://hdb.docs.pivotal.io/211/hdb/releasenotes/HAWQ211ReleaseNotes.html http://hdb.docs.pivotal.io/211/hdb/install/install-ambari.html Download HDB from Hortonworks at http://hortonworks.com/downloads/ or directly from Pivotal at https://network.pivotal.io/products/pivotal-hdb (You need to create a pivotal account) What to look out for
If you use only only 1 Master Node, you cannot have a Hawq Master and Standby If I install Hawq Master on Same node with Ambari need to change PostGres Port from 5432 on Install Prep
Ensure that httpd is installed yum install httpd
sudo service httpd status
sudo service httpd start Get and Install repo
Log onto Pivotal and download hdb-2.1.1.0-7.tar /* On Ambari Node */
1. mkdir /staging
2. chmod a+rx /staging
3. scp -i <<your key>> -o 'StrictHostKeyChecking=no' hdb-2.1.1.0-7.tar root@<<ambarinode>>:~/staging
4. tar -zxvf hdb-2.1.1.0-7.tarcd /staging/hdb-2.1.1.0./setup_repo.sh
/* You should see the message “hdb-2.1.1.0 Repo file successfully created at /etc/yum.repos.d/hdb-2.1.1.0.repo. */
5. yum install -y hawq-ambari-plugin
6. cd /var/lib/hawq
7. ./add-hawq.py --user admin --password admin --stack HDP-2.5
/* if the repo is in the same node as Ambari else pint to where the repo lives*/
./add-hawq.py --user <admin-username> --password <admin-password> --stack HDP-2.5 --hawqrepo <hdb-2.1.x-url> --addonsrepo <hdb-add-ons-2.1.x-url>
8. ambari-server restart Configurations during Install with Ambari
Set VM overcommit to 0 if you plan to use Hive and/or LLAP also on the same cluster; Don’t follow Pivotal docs to set this to 2 ele your Hive processes will have memory issues. Advanced hdfs-site Property Setting dfs.allow.truncate true dfs.block.access.token.enable false for an unsecured HDFS cluster, or true for a secure cluster dfs.block.local-path-access.user gpadmin dfs.client.read.shortcircuit true dfs.client.socket-timeout*** 300000000 dfs.client.use.legacy.blockreader.local false dfs.datanode.handler.count 60 dfs.datanode.socket.write.timeout*** 7200000 dfs.namenode.handler.count 600 dfs.support.append true
Advanced core-site Property Setting ipc.client.connection.maxidletime** 3600000 ipc.client.connect.timeout** 300000 ipc.server.listen.queue.size 3300 Some HAWQ Resources
Date Type Formating Functions: https://www.postgresql.org/docs/8.2/static/functions-formatting.html Date Time Functions: https://www.postgresql.org/docs/8.2/static/functions-datetime.html Hawq Date Functions: http://tapoueh.org/blog/2013/08/20-Window-Functions HAWQ is better with dates; can automatically handle ’08/01/2016’ and ’01-Aug-2016’ PostGreSQL Cheat Sheet Commands: http://www.postgresonline.com/downloads/special_feature/postgresql83_psql_cheatsheet.pdf System Catalog Tables: http://hdb.docs.pivotal.io/131/docs-hawq-shared/ref_guide/system_catalogs/catalog_ref-tables.html HAWQ Toolkit
Make sure and make use of the Hawq Toolkit: http://hdb.docs.pivotal.io/211/hawq/reference/toolkit/hawq_toolkit.html How to find the data files for specific tables: https://discuss.pivotal.io/hc/en-us/articles/204072646-Pivotal-HAWQ-find-data-files-for-specific-tables Size of table on Disk: select * from hawq_toolkit.hawq_size_of_table_disk; How to find the Size of Database: select sodddatname, sodddatsize/(1024*1024) from hawq_toolkit.hawq_size_of_database; How to find the Size of Partitioned Tables: select * hawq_toolkit.hawq_size_of_partition_and_indexes_disk Tip to find how many segments for a Hawq Table SELECT gp_segment_id, COUNT(*)
FROM <<table>>
GROUP BY gp_segment_id
ORDER BY 1; Creating Tables <<TBD>
Make SURE AFTER YOU CREATE THE TABLE ANALYZE: As an Example: vacuum analyze device.priority_counter_hist_rand; Loading Data to Tables <<TBD> Potential HAWQ Errors Too many open files in system To fix this check the value for fs.file-max in /etc/sysctl.conf. If configured a value that is lower than the total # of open files for the entire system at a given point (lsof | wc -l) then we would have increase this. To increase this value follow the below steps
Open Files: lsof | wc -l ulimit -a | grep open Edit the following line in the /etc/sysctl.conf file: fs.file-max = value #value is the new file descriptor limit that you want to set. Apply the change by running the following command:# /sbin/sysctl -p We can disable over-commit temporarily: echo 0 > /proc/sys/vm/overcommit_memory For permanent solution:
Add vm.overcommit_memory = 0 in /etc/sysctl.conf #fs.file-max=65536 fs.file-max=2900000 #Added for Hortonworks HDB kernel.threads-max=798720 vm.overcommit_memory=0
... View more
- Find more articles tagged with:
- Design & Architecture
- hawq
- hdb
- How-ToTutorial
12-05-2016
08:51 PM
1 Kudo
I will test this also... As far I know, via Ambari, the user must be gpadmin.
If you are installing manually, you can set the OS user to be anything you want. However the postgress user is gpadmin.
I will get a Pivotal engineer to answer this also...hopefully in a day or so.
... View more
11-17-2016
02:42 PM
Maybe this should be a separate HCC question. However if you are using HDF 2.x/ Nifi 1.0 with Ambari, you cannot install it on the same platform using Ambari and HDP2.5. It would have to be a separate Ambari install on another node, just for HDF
Perhaps you may need to contact us at Hortonworks to get us to understand your use case further.
... View more
11-16-2016
04:18 AM
1 Kudo
Why are you using HDP 2.5? Also what version of Ambari are you using?
Please refer to http://hdb.docs.pivotal.io/201/hdb/releasenotes/HAWQ201ReleaseNotes.html
Please use HDP 2.4.2 and Ambari 2.4 or 2.4.1 Or you can go maverick and try deleting /var/run/ambari-server/stack-recommendations/1 manually yourself.
... View more
11-11-2016
06:18 PM
5 Kudos
Here are the Requirements: Total Data Size - Uncompressed: 13.5TB; Compressed: 2 TB Large Virtual Fact Table, View containing a Union All of 3 Large Tables, 11 Billion Records in Total Size Another view taking the large virtual fact table, with consecutive Left Out Joins on 8 Dimension Tables, so that no matter what 11 Billion records is always the result. There is timestamp data that you can use to filter rows by. Suppose you were given the following. How would you begin configuring Hortonworks for Hive? Would you focus on storage? How can we configure for compute? Lets assume: Platform: AWS Data Node Instance: r3_4xlarge Cores: 16 RAM: 122 GB EBS Storage: 2 x 1TB Disks So where do we begin? First Some Quick Calculations: Memory per Core: 122GB/16 = 7.625; Approximate 8 GB per CPU Core This means our largest Container Size PER Node per core is 8 GB
However we should not reserve all 16 Cores to Hadoop. Some Cores are need for OS and other processes. Let's Assume 14 Cores is reserved for YARN. Memory Allocated for All YARN containers on a node = No. of Virtual Cores x Memory Per Core
114688 MB = 14 * 8192 MB (8 *1024)
Note Also At 8 GB, we can run in parallel 14 Tasks (Mappers or Reducers), one per CPU, without wasting RAM. We can certainly run container sizes less than 8GB if we wish, Since our Optimal Container Size per Node is 8 GB, our Yarn Minimum Container Size must be a factor of 8GB to prevent wastage of memory, that is: 1,2,4,8 However Tez Container Size for Hive is a multiple of Yarn Minimum Container Size
Memory Settings YARN Hive TEZ Running Application Error
... View more
- Find more articles tagged with:
- Design & Architecture
- FAQ
- Hive
- how-to-tutorial
- memory
- performance
- tez
Labels:
10-18-2016
03:57 PM
2 Kudos
Scenario 1: Ranger KMS DB is down but Node is Up The keys are cached for a time. You can still read the data in the encrypted folder. HDFS has knowledge of the encryption zone key I assume that The Ranger KMS Service is still up, while the DB/ metastore is down. If you know the database cannot be recovered, and you don¹t have a back up of the keystore, you immediately begin to remove the encryption zone. You log in as an authorized user, or hdfs and begin copying the files to an unencrypted area and then remove the encrypted zone. I just tested this on my cluster
Scenario 2: The entire node was down. This means BOTH the Ranger DB and the Ranger KMS Service is down.
The Encryption Zone key is the Ranger KMS DB (Metastore) and you can also export and save to a file. You should back up and also make the Ranger KMS DB highly available. Once you export to a keystore file, you back up the file. If the cluster node goes down, you restore the Ranger KMS DB again from backup. If you cannot restore Ranger KMS DB from back up, you create a completely new Ranger KMS Db and get the backup Keystore file and as a special user run a script to import the key back to the newly created database. You can associate once again the encryption zone folder with the key using HDFS commands. If you Don¹t have BOTH the Keystore file and the Ranger KMS DB to restore then you don¹t have any option. The file remains encrypted.
See this article for script to export and import keys: https://community.hortonworks.com/articles/51909/how-to-copy-encrypted-data-between-two-hdp-cluster.html
... View more
09-12-2016
03:36 PM
1 Kudo
You can use LDAP in ADDITION to Kerberos. LDAP is the authentication authority. Kerberos is the ticketing system.
LDAP is like the DMV giving you your driver's licence. Kerberos is your boarding pass to get on the plane.
Kerberos can be enabled with AD, FreeIPA as your LDAP in HAdoop.
Ambari, Nifi, Ranger will authenticate with those LDAPs.
The only exception is Hive where when Kerberos is enabled it replaces LDAP authentication.
... View more
09-12-2016
03:05 PM
3 Kudos
Here is your answer:
You can easily spoof your Hadoop cluster with a change of a simple environment variable. See also https://community.hortonworks.com/questions/2982/kerberos-adldap-and-ranger.html
... View more
09-08-2016
07:34 PM
@Bryan Bende What do you mean by "So I went to nifi-https (https://localhost:8443/nifi) and went to the accounts section and approved the account for mycert.p12 and chose a role of "NiFi'."
Does that mean you added to the authorized_users file the DN associated with mycert.p12 and added a role of ROLE_Nifi?
... View more
08-25-2016
09:01 PM
4 Kudos
@Vineet @Pratheesh Nair
Ok.. Here is solution So apparently if you are installing on Amazon AWS EC2 (remember you are only given access to ec2-user, not root), and if you decided to NOT do passwordless ssh with default key named id_rsa for that ec2-user when installing Ambari and its agents, when you try to install Hortonworks HDB (HAWQ) via Ambari, Ambari WILL NOT exchange the keys for the gpadmin user for you.
It would create the gpadmin user with the password you give it on the HAWQ config screen during install, but no keys exchanged. NOTE: for my nodes I had a key for ssh that was NOT default name of id_rsa.
I do not know if this is a combination of using a non root user or the fact that ec2-user did not have its own passwordless ssh with default key named id_rsa.
In anycase ONLY on the HAWQ Master, for gpadmin would the keys exist.
If you tried the following to generate keys on the HAWQ Master you would still get an error, where it would NOT even accept the default gpadmin user password you set, even though it works. That was surprising.
su gpadmin
> source /usr/local/hawq/greenplum_path.sh
> hawq ssh-exkeys -f Hosts
>Enter password for existing user for node <......>
>Enter password for existing user for node <......>
>Enter password for existing user for node <......> So in essence you have to manually go to each node, and copy authorized_keys file from HAWQ Master to each node (chmod 600), into the /home/gpadmin/.ssh/ so that you can at least password ssh from that HAWQ master node.
Then you run the ssh-exkeys manually and it would work.
... View more
08-25-2016
05:48 PM
@Pratheesh Nair
ok I did not run it from hawq master, because Ambari is trying to run it from the HAWQ Standby node.
So I ran it from the HAWQ Master and got the dreaded error: gpadmin-[ERROR]:-Permission denied (publickey,gssapi-keyex,gssapi-with-mic). So I am attempting to run as gpadmin hawq ssh-exkeys -f Hosts It is asking for password for each hosts
I am attempting to set up passwordless ssh for the gpadmin user (since this is AWS, it was setup for the ec2-user)
Hopefully that may solve it.
... View more
08-25-2016
05:42 PM
@Pratheesh Nair
... View more
08-25-2016
04:06 PM
@Pratheesh Nair There are no logs on /data/hawq/masteron the HAWQ standby master. In fact nothing is created there unlike on the HAWQ master node. Yes the ips are different for the masters. When you run the command from the command line, it immediately returns This can be run only on master or standby host Even in verbose mode.
Yes I can do passwordless ssh if I provide the -i with pem file (as this is AWS) i.e. ssh -i <.pem> node
I cannot do a ssh node directly without passing the -i option.
... View more
08-25-2016
02:49 PM
Looking at the source code does not help: https://github.com/apache/incubator-hawq/blob/master/tools/bin/hawq
... View more
08-25-2016
02:16 PM
2 Kudos
I am not able to install the HAWQ Standby master on an AWS cluster running HDP 2.4.2 and Ambari 2.2.2
Here is the error:
"This can be run only on master or standby host" Not sure what that means.
It is not being installed on a DN, with PFX installed.
It is not bing installed on the Ambari node.
I am using the Name Node (since I only have 3 HDP master nodes) to install the HAWQ Standby Master.
I attempted to remove the HAWQ Standby (that does not start) from the Name node and placed it on another node just to test. It gives the same error.
So right now I am just running without a standby master.
See
How do I begin trouble shooting this?
... View more
Labels:
- Labels:
-
Apache Ambari
08-15-2016
01:16 AM
1 Kudo
It should point to https://network.pivotal.io/products/pivotal-hdb
... View more
08-12-2016
01:11 AM
Here is also a good article:
https://community.hortonworks.com/articles/22756/quickly-enable-ssl-encryption-for-hadoop-component.html
... View more
07-11-2016
03:37 PM
Double check your /etc/hosts file. Double check your DNS.
... View more
05-26-2016
03:40 PM
I ssume also tha Hive with PAM authentication will also be a viable option on Azure. https://community.hortonworks.com/articles/591/using-hive-with-pam-authentication.html
... View more
05-24-2016
02:41 PM
1 Kudo
Using Centrify with AD handles this
... View more
05-24-2016
01:50 PM
1 Kudo
Ranger and Knox are NOT LDAP server. Use AD, Open LDAP or Free IPA. Ranger is ONLY for authorization NOT authentication Here are you authentication options for Hive
However if you decide to enable Kerberos, then Hive authentication option is no longer LDAP directly but with Kerberos (and LDAP indiectly) The Githubs given above are with Kerberos against FreeIPA or OpenLDAP (and for one node)
... View more
05-23-2016
07:55 PM
@Brandon Wilson Thank you this helps. Yep please provide more detail if you can.
... View more
05-23-2016
07:30 PM
We are looking for re-assurance that our data sitting in Hive right now is protected from the outside world. We set up SQL workbench on our local PCs and connect to it like shown below. However, it doesn’t matter what we put in for Username & Password, it still let’s us connect to our Hive data which is concerning us for security reasons.
How can I be assured us that if some hacker out in the world came across our external IP (xx.xx.xx.xxxx), they wouldn’t be able to access our data in Hive? I cannot set up AD auth from Azure since I cannot access my corporate AD, so LDAP Authentication is not possible, and I do not want to set up a MIT KDC Should I: Leave Hive Authentication to None but apply SQL Standard Authorizations (See
https://community.hortonworks.com/questions/22086/can-we-enable-hive-acls-without-ranger.html
and
https://cwiki.apache.org/confluence/display/Hive/SQL+Standard+Based+Hive+Authorization) Should I set up Ranger instead of SQL Standard Auth? With either above would this ensue that if someone log in as hive/ hive, the tables are still secured with the appropriate authorizations?
... View more
Labels:
- Labels:
-
Apache Hive
05-20-2016
05:23 PM
See also https://github.com/steveloughran/kerberos_and_hadoop/blob/master/sections/errors.md
... View more