About bleonhardi

bleonhardi · ‎02-02-2016

I think there are different options. You can enable HTTP authentication using SPNEGO and kerberos for the web uis https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html ( Problem is that Spnego is a bit finicky since the computers running the web browser need to be in the kerberos realm ) And you can disable administration of queues for specific users and groups. I.e. only allow a specific subgroup of users to kill applications. https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Configuration Is your cluster kerberized?

bleonhardi · ‎02-02-2016

Yeah I think they missed a couple common jars in the hive-jdbc.jar I had to add two more jars to the classpath: From HDP_INSTALLATION/hive/lib ( hive-jdbc.jar and commons-loggingxxx.jar ) From HDP_INSTALLATION/hadoop/hadoop-commonxxx.jar https://community.hortonworks.com/articles/594/connecting-eclipse-to-hive.html

bleonhardi · ‎02-01-2016

Which it is: So most likely the phoenix client does not take the hbase-site.xml from ambari but uses default values. hbase.rpc.timeout Description This is for the RPC layer to define how long HBase client applications take for a remote call to time out. It uses pings to check connections but will eventually throw a TimeoutException. Default 60000

bleonhardi · ‎02-01-2016

But he said that one was 90s before and changing it didn't help. Ah well lets see what he comes up with. The issue might be that he changed it on the server side but should have changed on the client. Perhaps the default is 60s?

bleonhardi · ‎02-01-2016

Changing the log directories actually works it just is a bit of work but changing the installation directories of the components? Pretty sure that will end in tears. I would second Jonas with "Don't do it"

bleonhardi · ‎02-01-2016

There are phoenix parameters as well: https://phoenix.apache.org/tuning.html Its weird I don't see an applicable timeout. Query timeout would make sense but that is ten minutes and not 1 minute. phoenix.query.timeoutMs

bleonhardi · ‎02-01-2016

It would be really convenient if PigStorage Serde would exist as a Pig function as well. Then one could load it as a String check if its valid with SPLIT and then parse it into a tuple. Something like: A = LOAD 'myfile'; B = SPLIT IF PigStorage_valid($0) GOODDATA, OTHERWISE BADDATA; C = FOREACH B GENERATE PigStorage_parse($0) ... But since this doesn't exist I think the only options are to write these functions yourself or as Artem says use a regex, filter, ... to verify correctness write it and load it again with PigStorage.

bleonhardi · ‎02-01-2016

Unfortunately there is no real exception handling in Pig. The usual tip is to use UDFs. If the logic becomes too complicated for Artem's approach you could create a valid function in Java and couple it with the SPLIT. Adding a Java UDF is really simple in Pig. DATA = LOAD '/my/input/folder'; SPLIT A INTO GOODDATA IF valid($0), OTHERWISE BADDATA; STORE GOODDATA into '/tmp/good' STORE BADDATA into '/tmp/bad'; and the valid function would be a simple Java EvalFunction similar to the example below. You could check if the data has the expected number of pipe symbols the correct datatypes etc. https://pig.apache.org/docs/r0.7.0/udf.html#How+to+Use+a+Simple+Eval+Function

bleonhardi · ‎02-01-2016

Just a bit explanation in MapReduce1 there was the "hadoop" command to administrate your cluster ( run mapreduce programs etc. ). However Yarn comes with its own command line command for administration "yarn". Pig is under the cover using "hadoop jar" to run its compiled MapReduce program while HDP would like end users to use the newer "yarn jar". That is the warning. However "hadoop jar" is perfectly fine and if it ever would be deprecated it would be updated in pig as well. So yes you can safely ignore this warning.

bleonhardi · ‎02-01-2016

I take the sqoop part back. Someone is actually working on it. But its only available in sqoop2 not sqoop1. Unfortunately HDP doesn't currently support sqoop2. So you would have to manually install it. http://sqoop2.readthedocs.org/en/latest/Connectors.html#kafka-connector

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: Is there any way to disable "kill application"...

Re: RStudio And Hive

Re: Phoenix - Query Timeout

Re: Phoenix - Query Timeout

Re: An easy way to define a prefix for all hdp com...

Re: Phoenix - Query Timeout

Re: Error Handling during Pig LOAD Function

Re: Error Handling during Pig LOAD Function

Re: WARNING: Use "yarn jar" to launch YARN applica...

Re: How to integrate kafka to pull data from RDBMS