Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5422 | 08-12-2016 01:02 PM | |
2204 | 08-08-2016 10:00 AM | |
2612 | 08-03-2016 04:44 PM | |
5505 | 08-03-2016 02:53 PM | |
1426 | 08-01-2016 02:38 PM |
02-02-2016
09:52 AM
2 Kudos
I think there are different options. You can enable HTTP authentication using SPNEGO and kerberos for the web uis https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html ( Problem is that Spnego is a bit finicky since the computers running the web browser need to be in the kerberos realm ) And you can disable administration of queues for specific users and groups. I.e. only allow a specific subgroup of users to kill applications. https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Configuration Is your cluster kerberized?
... View more
02-02-2016
09:28 AM
1 Kudo
Yeah I think they missed a couple common jars in the hive-jdbc.jar I had to add two more jars to the classpath: From HDP_INSTALLATION/hive/lib ( hive-jdbc.jar and commons-loggingxxx.jar ) From HDP_INSTALLATION/hadoop/hadoop-commonxxx.jar https://community.hortonworks.com/articles/594/connecting-eclipse-to-hive.html
... View more
02-01-2016
05:59 PM
1 Kudo
Which it is: So most likely the phoenix client does not take the hbase-site.xml from ambari but uses default values. hbase.rpc.timeout Description This is for the RPC layer to define how long HBase client applications take for a remote call to time out. It uses pings to check connections but will eventually throw a TimeoutException. Default 60000
... View more
02-01-2016
05:57 PM
But he said that one was 90s before and changing it didn't help. Ah well lets see what he comes up with. The issue might be that he changed it on the server side but should have changed on the client. Perhaps the default is 60s?
... View more
02-01-2016
05:45 PM
2 Kudos
Changing the log directories actually works it just is a bit of work but changing the installation directories of the components? Pretty sure that will end in tears. I would second Jonas with "Don't do it"
... View more
02-01-2016
05:42 PM
There are phoenix parameters as well: https://phoenix.apache.org/tuning.html Its weird I don't see an applicable timeout. Query timeout would make sense but that is ten minutes and not 1 minute. phoenix.query.timeoutMs
... View more
02-01-2016
05:24 PM
It would be really convenient if PigStorage Serde would exist as a Pig function as well. Then one could load it as a String check if its valid with SPLIT and then parse it into a tuple. Something like: A = LOAD 'myfile'; B = SPLIT IF PigStorage_valid($0) GOODDATA, OTHERWISE BADDATA; C = FOREACH B GENERATE PigStorage_parse($0) ... But since this doesn't exist I think the only options are to write these functions yourself or as Artem says use a regex, filter, ... to verify correctness write it and load it again with PigStorage.
... View more
02-01-2016
05:09 PM
Unfortunately there is no real exception handling in Pig. The usual tip is to use UDFs. If the logic becomes too complicated for Artem's approach you could create a valid function in Java and couple it with the SPLIT. Adding a Java UDF is really simple in Pig. DATA = LOAD '/my/input/folder'; SPLIT A INTO GOODDATA IF valid($0), OTHERWISE BADDATA; STORE GOODDATA into '/tmp/good' STORE BADDATA into '/tmp/bad'; and the valid function would be a simple Java EvalFunction similar to the example below. You could check if the data has the expected number of pipe symbols the correct datatypes etc. https://pig.apache.org/docs/r0.7.0/udf.html#How+to+Use+a+Simple+Eval+Function
... View more
02-01-2016
03:04 PM
1 Kudo
Just a bit explanation in MapReduce1 there was the "hadoop" command to administrate your cluster ( run mapreduce programs etc. ). However Yarn comes with its own command line command for administration "yarn". Pig is under the cover using "hadoop jar" to run its compiled MapReduce program while HDP would like end users to use the newer "yarn jar". That is the warning. However "hadoop jar" is perfectly fine and if it ever would be deprecated it would be updated in pig as well. So yes you can safely ignore this warning.
... View more
02-01-2016
01:17 PM
1 Kudo
I take the sqoop part back. Someone is actually working on it. But its only available in sqoop2 not sqoop1. Unfortunately HDP doesn't currently support sqoop2. So you would have to manually install it. http://sqoop2.readthedocs.org/en/latest/Connectors.html#kafka-connector
... View more