Member since
07-16-2015
177
Posts
28
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14053 | 11-14-2017 01:11 AM | |
60526 | 11-03-2017 06:53 AM | |
4298 | 11-03-2017 06:18 AM | |
13507 | 09-12-2017 05:51 AM | |
1982 | 09-08-2017 02:50 AM |
01-27-2017
01:09 AM
1 Kudo
We also do not like this "magic number" but we find it useful. I think you should at least investigate your cluster when you have that warning in order to check that you do not have the "too many small files" issue. Even if we are not satisfied with the threshold configured, it is still useful as a reminder (and should only be considered as such). Having too many small files can also be performance-wise a bad thing regarding how map/reduce instanciate one separate mapper per block to read (if you use that data into jobs). By the way, for investigation this I often use the "fsck" utility. When using with a path you can get the block count, the global size and the avg size of a block. This will let you know where in your HDFS storage you have too many small files or not. When you have 200 000 blocks under a path with an average size of 2MB this should be a good indicator of having to many small files.
... View more
01-27-2017
12:42 AM
2 Kudos
Did you try to drop the partition using Hive query ? It should look like this : ALTER TABLE <table_name> DROP PARTITION (<partition_col_name>='<value>'); https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-DropPartitions If it does not delete the data you will need to delete the directory of the partition (in HDFS) after deleting it using the Hive query.
... View more
01-19-2017
03:23 AM
I don't think Impala has such a feature (but I could be wrong). If I were you, I would try to answer these questions : - "Why do I need this kind of ouput ?" - "What do I use it for ?" - "Can't I acheive my goal with an other output ?" Maybe you will find an other approach more adapted. By the way, I guess something like this would be better (but it will not make a huge difference) : SELECT a AS col FROM tmp
UNION ALL SELECT b AS col FROM tmp
UNION ALL SELECT c AS col FROM tmp
UNION ALL SELECT d AS col FROM tmp
... View more
01-17-2017
08:27 AM
Hive is not Oracle. You should not expect the same processing capabilities. Hive is designed to run long and heavy queries. Whereas it performed poorly for small queries like the one you try to optimize. Also note that Hive run on top of Yarn and per design Yarn takes time to instanciate containers and the JVM inside these containers. This should correspond to your question "why it takes so much time to start the Query Job". If you want to get a quick reply for a basic count(*) without any filter/condition you might want to read about Hive statistics.
... View more
01-10-2017
07:58 AM
What kind of web based interface do you need from HiveServer2 ? If it is a user interface for querying Hive then HiveServer2 do not provide one OOTB. But Know that Hue is using HiveServer2 when you are submitting hive queries inside the "Hive Editor".
... View more
12-19-2016
06:06 AM
Thx, that was interesting to know !
... View more
12-19-2016
05:51 AM
Hi, If it's not working only on edge nodes then there might be some configuration issue leading to that. What difference do you make beetwen "cluster nodes" and "edge nodes" ? Meaning : what roles are distributed to your edge nodes ? - For example, did you assign the HDFS&Yarn "gateway" role to your edge nodes ? - If no, try doing it - If yes, try redeploying the client configuration Might be something else.
... View more
12-19-2016
05:30 AM
You are right, I just tested it and there is no need for additional settings (other than having initialized the kerberos ticket). From what I read on your first post, it seems the same job do run successfully for users which do not have their home folder in HDFS encrypted ? (for the same kerberos realm) ? If that is the case, I would open a SR ticket in your shoe. It would be the quickest way for obtaining a feedback from Cloudera on the matter (if there is an incompatiblity or some particular settings for this particular use case).
... View more
12-19-2016
05:09 AM
2 Kudos
If you want to "drop" the categories table you should run an hive query like this : DROP TABLE categories; If you want to "delete" the content of the table only then try "TRUNCATE TABLE categories;". It should work or try deleting the table content in HDFS directly. As for your use of "hadoop fs", you should know that "hadoop fs -ls rm" does not exist. For deleting HDFS files or folders it is directly "hadoop fs -rm".
... View more
12-19-2016
03:33 AM
Hi, I don't know if the map/reduce job you are submitting is Kerberos compatible. That is the first check to do. Then, if that job is Kerberos compatible, it might need some settings like supplying a jaas configuration. The kinit of a ticket is sometimes not enough. For example, when running the map/reduce job "MapReduceIndexerTool", you need to supply a jaas configuration. HADOOP_OPTS="-Djava.security.auth.login.config=/home/user/jaas.conf" \
hadoop jar MapReduceIndexerTool See: https://www.cloudera.com/documentation/enterprise/5-4-x/topics/cdh_sg_search_security.html
... View more