Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5165 | 08-12-2016 01:02 PM | |
2144 | 08-08-2016 10:00 AM | |
2517 | 08-03-2016 04:44 PM | |
5334 | 08-03-2016 02:53 PM | |
1364 | 08-01-2016 02:38 PM |
11-11-2015
07:06 PM
1 Kudo
I suppose you can use haproxy for example. However if you have kerberos and spnego you would need to add the proxy tickets similar to the oozie ha setup described here in the cloudera doc ( I would use ours if we would actually describe that ) http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_sg_oozie_ha_kerberos.html
... View more
11-11-2015
04:20 PM
7 Kudos
Using Hive in Oozie can be challenging. There are two available actions HiveAction and Hive2Action Th Hive action uses the hive client and needs to set a lot of libraries and connections. I ran into a lot of issues especially related to security. Also the logs are not available in the Hive server and hive server settings are not honoured. The Hive2 action is a solution for this. It runs a beeline command and connects through jdbc to the hive server. The below assumes that you have used LDAP or PAM security for your hive server. <action name="myhiveaction">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://server:10000/default</jdbc-url>
<password>${hivepassword}</password>
<script>/data/sql/dayly.sql</script>
<param>database=${database}</param>
<param>day=${day}</param>
</hive2>
<ok to="end"/>
<error to="kill"/>
</action>
The problem with this is that everybody who has access to the oozie logs has access to the password in the hivepassword parameter. This can be less than desirable. Luckily beeline provides a new function to use a password file. A file containing the hive password. beeline -u jdbc:hive2://sandbox:10000/default -n user -w passfile passfile being a file containing your password without any new lines at the end. Just the password. To use that in the Action you can give it as an argument. However you still need to upload the passfile to the oozie execution folder. This can be done in two ways ( create a lib folder under your workflow directory and put it there or use the file argument. <action name="myhiveaction">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://server:10000/default</jdbc-url>
<script>/data/sql/dayly.sql</script>
<param>database=${database}</param>
<param>day=${day}</param>
<argument>-wpassfile</argument>
<file>/user/myuser/passfile#passfile</file>
</hive2>
<ok to="end"/>
<error to="kill"/>
</action>
This will copy the password file to the temp directory and beeline will use it for authentication. Only the owner of the oozie workflow needs access to that file but other people can see the logs ( but not the password. ) Note: The Hive2Action seems to be weird about parameters. It is important to use -wpassfile not -w passfile The space will cause it to fail because it is adding the space to the filename. This is different for the command line beeline.
... View more
Labels:
11-09-2015
01:51 PM
2 Kudos
I think the answer depends much more on the nr. of queries per second than on RAM. 1GB is not enough but the moment you have 8-12 you should be fine outside of very specific usecases. The problem is more that hiveserver reaches limits when you run 10-15 queries per second. It is better in 2.3 which has parallel planning but it will not be able to do much more than 10-20 q/s in any case. Adding more RAM will not help you but increasing the number of parallel server threads and obviously adding additional hive servers. Obviously in most situations hive server will not be the bottleneck when you run into these kinds of query numbers
... View more
10-26-2015
04:52 AM
Thanks for the links, but I think we followed the instructions from the wiki ( adding the url into the firefox settings) do you think its possible that the issue is having multiple kerberos tickets in the windows machine? Does SPNEGO send all in other words or only the primary one ( which would be the wrong one the user directly gets from AD )
... View more
10-23-2015
03:43 AM
We have a user at xxxx who wants to access the web ui but gets a 401 on his windows machine. We have a valid ticket for the realm of the cluster but also a ticket for a different realm. ( the primary realm of the machine ) . We have done the steps for preparing firefox as specified in the storm ui question but it does not work. Any idea how to specify a principal? Also little addon. We sometimes see a 302 in CURL instead of a 200. We can also see this in the Ambari alerts. But ambari seems to think its ok ( as in timeline server is 302 and oozie 200 but I got 302 in oozie curl ) What does this mean exactly?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Oozie
-
Apache Storm
10-02-2015
04:34 AM
1 Kudo
That was actually the jar I was using. The hive-jdbc.jar is a link to the standalone jar. But I still had to add the two other jars. Otherwise I got Classnotfound exceptions.
... View more
09-30-2015
06:38 AM
4 Kudos
In some situations, Ranger is not an option but a Hive Authorization scheme is advisable. It is possible to use SQLStdAuth for this. However it comes with a couple caveats. Hive Wiki 1. Configuring SQLStdAuth ( HDP 2.3 ) - In Ambari, select Authorization SQLStdAuth. This will set all related configuration parameters like hive.enable.authorization and disable doAs. - Add admin users to admin role. If you have a user you want to be admin add him to hive.users.in.admin.role=hive,hue,myadmin 2. Prepare HDFS Since all queries now run as the hive user he needs to have read/write rights on all files in HDFS. This includes the load directories for external tables. Ideally change the owner to the warehousing folder to hive and set access right 700. I also added hive to an ETL group and made all load folders read AND writable to this group. 3. Create roles In Hive as an admin user - Become Admin: SET ROLE ADMIN; - Create roles: CREATE ROLE BI; ( should have read rights to all tables ) CREATE ROLE ETL; ( should have read/write rights to all tables ) - Add users to roles: GRANT ETL TO USER ETLUSER; GRANT BI TO USER BIUSER; - Make ETL Role owner of database to be able to create tables in the database ALTER DATABASE DEFAULT SET OWNER ROLE ETL; - Change table to be readable by BI GRANT SELECT ON MYTABLE TO BI; - Change table to be read and writable by ETL GRANT ALL ON MYTABLE TO ETL; NOTE: I did not find a way to make a ROLE into the owner of a table, so only the table owner or admin can drop tables but the ETL user could insert, drop partitions etc. 4. Beeline parameters SQLStdAuth restricts access to hive config parameters to a white list. In older environments Hive scripts would be parametrized with configuration parameters. -hiveconf day=20150201. This will not work anymore since the parameters are not in the whitelist. You can instead use beeline --hivevar day=20150201
... View more
Labels:
09-30-2015
06:21 AM
6 Kudos
Clients normally want a development environment for SQL. They often have Eclipse-based SQL development tools already ( Teradata SQL Editor, Eclipse Data tools platform ). Hue and command line are not always an option. To connect to HDP2.3 ( should work for HDP2.2) with Eclipse: 1. Install the Eclipse Data Tools Platform - Download Eclipse from eclipse.org ( for example Luna ) - Select Help->Install new Software - "Work with" the official update location for your release ( Luna ) - Install all plugins under "Database Development" and restart Eclipse 2. Create an Hive JDBC driver - Open the Data Source Explorer View ( Windows->Show View->Others->Type Data Source Explorer) - Under Database Connections select "New" - Select "Generic JDBC Driver". You should see three tabs ( Name, Jars , Properties ) - Create a new Driver with the plus button and give name "Hive" - On Jars add the following jars from your HDP installation From HDP_INSTALLATION/hive/lib ( hive-jdbc.jar and commons-loggingxxx.jar ) From HDP_INSTALLATION/hadoop/hadoop-commonxxx.jar - Under Properties ( might not show up sometimes, redo first steps if properties tab is blank ) Connection URL: jdbc:hive2://server:10000/default Database: default Driver Class: org.apache.hive.jdbc.HiveDriver User: Optional 3. Develop SQL - In the Data Source Explorer, create a connection - Create a project and files with extension .sql - When opening the file select your connection at the top - You can write SQL and execute it by right-click - Execute All - Execute Highlighted ... 4. Investigate Results Query results are shown in the SQL Results View. - You have a list of executed queries - Result Sets ( limited to 50000 rows configurable ) - You can export result sets as CSVs In addition to Data Development tools you can also install in Eclipse: Remote System Tools: - drag and drop files to your edge node out of Eclipse - You can even have remote projects that are directly stored and compiled on the edge node Scala IDE: - Develop Spark Applications and drag the jar files to your edge node Java IDE: - Write Hive/Pig UDFs and MapReduce jobs XML Editor: - Basic Syntax highlighting and XML checking for Oozie workflows ...
... View more
Labels:
09-30-2015
05:47 AM
8 Kudos
PAM authentication for Hive In some situations an organization does not have an LDAP server and does not want to use Kerberos for the authentication, since this complicates the integration with third-party software. A quick alternative is PAM authentication which is in Hive since HDP2.2.
It uses the OS users and passwords of the host running the hiveserver2 for authentication. The examples below are done using HDP 2.2/2.3 and Redhat. 1. Install JPam - Download JPam library and unzip on the hive host: JPam Libraries - Copy the .so file into the library path of the hive server. An easy way to find this is to use ps -ef | grep hiveserver2 and find the -Djava.library.path variable. For example copy the .so file to the following folder: /usr/hdp/2.2.4.2-2/hadoop/lib/native/Linux-amd64-64 2. Make shadow file accessible to hive user
/etc/shadow needs to be readable by hive: chgrp hive /etc/shadow chmod 550 /etc/shadow The wiki entry also says to make /etc/login.defs accessible but this doesn't seem to be necessary for the PAM modules I used. Wiki entry: Hive Security Wiki 3. Set PAM authentication In Ambari switch authentication to PAM
hive.server2.authentication = PAM 4. Set PAM modules Different possibilities, what worked for me was login, sshd: hive.server2.authentication.pam.services=login,sshd 5. Restart the hive server
You should now be able to login with username, password of the hiveserver2 host users. No kerberos problems no LDAP connection problems.
Note there are good reasons to use Kerberos or Knox because these support encryption and if you have an
LDAP environment this is definitely also a good option. But for a quick authentication configuration in an environment that accesses the server over secure networks PAM is a good option. 6. Getting Hue to run with PAM authentication Hue 2.6 does not officially support PAM. However LDAP works the same for Hue.
- Make a copy of your hive configuration ( for example into /etc/hue/hive-conf ) - Change the authentication setting in the copy to LDAP - Point hue to this configuration - Enter the valid hue user, password as "LDAP user"
... View more
09-29-2015
02:17 AM
3 Kudos
Namenode HA: Needs installation of httpfs: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hadoop-ha/content/ha-nn-deploy-hue.html However in my installation was a bug that httpfs install would not create the /etc/init.d/httpfs scripts so I manually had to link the folder from the httpfs installation directory. After that it worked. ResourceManager is pretty straight forward, it is written in the hue config. HiveServer2 HA: Not sure I would suppose Hive2 over http and an load balancer like HA proxy ( that would also be the way for oozie, ( see the answer to the oozie ha configuration question. )
... View more
- « Previous
- Next »