About ssingla

ssingla · ‎06-04-2018

Have recently run into multiple issues where ORC files on hive are not getting compacted. There are a couple of parameters required to enable concat on ORC. SET hive.merge.tezfiles=true; SET hive.execution.engine=tez; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=256000000; SET hive.merge.mapfiles=true; SET hive.merge.orcfile.stripe.level=true; SET mapreduce.input.fileinputformat.split.minsize=256000000; SET mapreduce.input.fileinputformat.split.maxsize=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.node=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.rack=256000000; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE'); alter table <table_name> partition ( file_date_partition='<partition_info>') concatenate; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE'); mapreduce.input.fileinputformat.split.minsize.per.node Specifies the minimum number of bytes that each input split should contain within a data node. The default value is 0, meaning that there is no minimum size mapreduce.input.fileinputformat.split.minsize.per.rack Specifies the minimum number of bytes that each input split should contain within a single rack. The default value is 0, meaning that there is no minimum size Make sure not to concat orc files if they are generated by spark as there is a know issue HIVE-17403 and hence being disabled in later versions. Example of this is a table/partition having 2 different files files (part-m-00000_1417075294718 and part-m-00018_1417075294718). Although both are completely different files, hive thinks these are files generated by separate instances of same task (because of failure or speculative execution). Hive will end up removing this file

ssingla · ‎05-25-2018

PROBLEM Users able to drop table on hive though they are not the table owners. Need to enable metastore server security to start using the storage based auth. SOLUTION To enable metastore security we need to enable the following parameter hive.metastore.pre.event.listeners [This turns on metastore-side security.] Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener hive.security.metastore.authorization.manager [This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, use StorageBasedAuthorizationProvider as instructed above] Set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.metastore.authenticator.manager Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator hive.security.metastore.authorization.auth.reads When this is set to true, Hive metastore authorization also checks for read access. It is set to true by default. Read authorization checks were introduced in Hive 0.14.0

ssingla · ‎04-18-2017

@sanket patel intermittent zk issues can lead to cleaner chors failing. https://issues.apache.org/jira/browse/HBASE-15234

ssingla · ‎03-28-2017

I would recommend to split up the file and then the MR job of yours on each of the file.

ssingla · ‎02-10-2017

@Subramanian Santhanam can you please add more details with screenshots and logs?

ssingla · ‎02-05-2017

Can you provide the container logs?

ssingla · ‎01-05-2017

please check if the heap configuration is as per the recommendation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html

ssingla · ‎01-03-2017

@Mahen Jay can you please elaborate more on the usecase here? Do you already have 3 Zookeeper nodes and looking to add more on later stage? If this is the case, then yes you can always add more ZK after the cluster is created. Or are you saying to just skip zookeeper nodes for now? If this is the case then I do not think it would be possible as it is a depended service. You can always move the ZK nodes at a later stage to other machines. You need to have ZK nodes at the time of cluster creation. Please let me know if the use case is different.

ssingla · ‎01-03-2017

@David Sheard great that it worked. Can you please accept the answer 🙂

ssingla · ‎01-03-2017

User tries to decommission/recommission nodes from Ambari UI, nothing happens on the UI and it seems like the operation did not go through. Ambari-server Logs: WARN [C3P0PooledConnectionPoolManager[identityToken->2s8bny9j1mxgjkn9oj5d8|79679221]-HelperThread-#0] StatementUtils:223 - Statement close FAILED.com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1 at sun.reflect.GeneratedConstructorAccessor198.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) ROOT CAUSE The default JDBC driver installed with Ambari doesn't support MySQL 5.6.25. NOTE BEFORE THE WORKAROUND CAN BE FOLLOWED Make sure to delete the triggers from Ambari db before we follow the steps in the workaround section. Else it might result into an outage if there are to many triggers waiting in DB to be triggered when the connector version is fixed. Ambari Db tables to check: qrtz_calendars qrtz_fired_triggers qrtz_job_details qrtz_locks qrtz_paused_trigger_grps qrtz_scheduler_state qrtz_simple_triggers qrtz_simprop_triggers qrtz_triggers WORKAROUND Update mysql connector from http://mvnrepository.com/artifact/mysql/mysql-connector-java. Once the above is updated ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar where jdbc-driver is the path to the new driver.

Online	Offline
Last Visited	‎06-11-2021 04:47 PM

Member Since	‎09-29-2015 12:16 AM
Last Visited	‎06-11-2021 04:47 PM
Posts	28
Kudos received	11

Cloudera Community

Re: zookeeper installation after cluster setup

Re: AMBARI_METRIC won't start after install

Re: copy files within hdfs based on the modified t...

How to compact ORC files on Hive.

Enable Metastore security on hive

Re: oldWALs not getting cleared even with no repli...

Re: Execute MapReduce job only on a part of a HDFS...

Re: Unable to see the jobs in resource manager UI ...

Re: ERROR: Container complete event for unknown co...

Re: java.lang.OutOFMemoryError Java heap space - F...

Re: zookeeper installation after cluster setup

Re: AMBARI_METRIC won't start after install

Older verisons of MySQL will result in sql syntax ...