Member since
09-29-2015
28
Posts
14
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
731 | 01-03-2017 10:36 PM | |
2001 | 12-30-2016 12:05 AM | |
4240 | 07-14-2016 06:51 PM |
06-04-2018
04:37 PM
Have recently run into multiple issues where ORC files on hive are not getting compacted. There are a couple of parameters required to enable concat on ORC. SET hive.merge.tezfiles=true; SET hive.execution.engine=tez; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=256000000; SET hive.merge.mapfiles=true; SET hive.merge.orcfile.stripe.level=true; SET mapreduce.input.fileinputformat.split.minsize=256000000; SET mapreduce.input.fileinputformat.split.maxsize=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.node=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.rack=256000000; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE'); alter table <table_name> partition ( file_date_partition='<partition_info>') concatenate; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE'); mapreduce.input.fileinputformat.split.minsize.per.node Specifies the minimum number of bytes that each input split should contain within a data node. The default value is 0, meaning that there is no minimum size mapreduce.input.fileinputformat.split.minsize.per.rack Specifies the minimum number of bytes that each input split should contain within a single rack. The default value is 0, meaning that there is no minimum size Make sure not to concat orc files if they are generated by spark as there is a know issue HIVE-17403 and hence being disabled in later versions. Example of this is a table/partition having 2 different files files (part-m-00000_1417075294718 and part-m-00018_1417075294718). Although both are completely different files, hive thinks these are files generated by separate instances of same task (because of failure or speculative execution). Hive will end up removing this file
... View more
Labels:
05-25-2018
06:12 PM
3 Kudos
PROBLEM Users able to drop table on hive though they are not the table owners. Need to enable metastore server security to start using the storage based auth. SOLUTION To enable metastore security we need to enable the following parameter hive.metastore.pre.event.listeners [This turns on metastore-side security.] Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener hive.security.metastore.authorization.manager [This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, use StorageBasedAuthorizationProvider as instructed above] Set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.metastore.authenticator.manager Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator hive.security.metastore.authorization.auth.reads When this is set to true, Hive metastore authorization also checks for read access. It is set to true by default. Read authorization checks were introduced in Hive 0.14.0
... View more
Labels:
04-18-2017
06:38 PM
@sanket patel intermittent zk issues can lead to cleaner chors failing. https://issues.apache.org/jira/browse/HBASE-15234
... View more
03-28-2017
12:14 AM
I would recommend to split up the file and then the MR job of yours on each of the file.
... View more
02-10-2017
05:58 PM
@Subramanian Santhanam can you please add more details with screenshots and logs?
... View more
02-05-2017
05:57 PM
Can you provide the container logs?
... View more
01-17-2017
05:22 AM
2 Kudos
Repo DescriptionRepo Info Github Repo URL https://github.com/sarunsingla/admin_utilities/blob/master/regionsize_per_regionserver.py Github account name sarunsingla/admin_utilities/blob/master Repo name regionsize_per_regionserver.py
... View more
Labels:
01-05-2017
03:23 PM
1 Kudo
Repo Description You can use this script to automatically take jstacks. Copy the script and just execute something like: <./script_name.pl>. Please let me know if this helps. It works on the basis of jstack -F , for systems which are very busy you can replace it with kill -3 pid. [root@node1 ~]# ./jstack.pl
Which component are you looking to take a jstack for:
namenode
Process name is : namenode
Process id for namenode is: 15046
How many jstack required:
2
Sleep between each jstack
1
Process Id for Namenode: 15046
Taking a jstack now
jstack_output_1483629788
Process Id for Namenode: 15046
Taking a jstack now
jstack_output_1483629790 Repo Info Github Repo URL https://github.com/sarunsingla/admin_utilities/blob/master/auto-jstack.pl Github account name sarunsingla/admin_utilities/blob/master Repo name auto-jstack.pl
... View more
01-05-2017
03:53 AM
please check if the heap configuration is as per the recommendation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html
... View more
01-03-2017
10:36 PM
@Mahen Jay can you please elaborate more on the usecase here? Do you already have 3 Zookeeper nodes and looking to add more on later stage? If this is the case, then yes you can always add more ZK after the cluster is created. Or are you saying to just skip zookeeper nodes for now? If this is the case then I do not think it would be possible as it is a depended service. You can always move the ZK nodes at a later stage to other machines. You need to have ZK nodes at the time of cluster creation. Please let me know if the use case is different.
... View more