About ssingla

ssingla · ‎06-04-2018

Have recently run into multiple issues where ORC files on hive are not getting compacted. There are a couple of parameters required to enable concat on ORC. SET hive.merge.tezfiles=true; SET hive.execution.engine=tez; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=256000000; SET hive.merge.mapfiles=true; SET hive.merge.orcfile.stripe.level=true; SET mapreduce.input.fileinputformat.split.minsize=256000000; SET mapreduce.input.fileinputformat.split.maxsize=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.node=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.rack=256000000; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE'); alter table <table_name> partition ( file_date_partition='<partition_info>') concatenate; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE'); mapreduce.input.fileinputformat.split.minsize.per.node Specifies the minimum number of bytes that each input split should contain within a data node. The default value is 0, meaning that there is no minimum size mapreduce.input.fileinputformat.split.minsize.per.rack Specifies the minimum number of bytes that each input split should contain within a single rack. The default value is 0, meaning that there is no minimum size Make sure not to concat orc files if they are generated by spark as there is a know issue HIVE-17403 and hence being disabled in later versions. Example of this is a table/partition having 2 different files files (part-m-00000_1417075294718 and part-m-00018_1417075294718). Although both are completely different files, hive thinks these are files generated by separate instances of same task (because of failure or speculative execution). Hive will end up removing this file

ssingla · ‎05-25-2018

PROBLEM Users able to drop table on hive though they are not the table owners. Need to enable metastore server security to start using the storage based auth. SOLUTION To enable metastore security we need to enable the following parameter hive.metastore.pre.event.listeners [This turns on metastore-side security.] Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener hive.security.metastore.authorization.manager [This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, use StorageBasedAuthorizationProvider as instructed above] Set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.metastore.authenticator.manager Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator hive.security.metastore.authorization.auth.reads When this is set to true, Hive metastore authorization also checks for read access. It is set to true by default. Read authorization checks were introduced in Hive 0.14.0

andr83 · ‎06-02-2017

It's problem with your client java configuration not with cluster instances (ResourceManager, NodeMagaer, NN and others). So you need increase java heap for hadoop client: export HADOOP_OPTS="$HADOOP_OPTS -Xmx4G"

ssingla · ‎01-03-2017

User tries to decommission/recommission nodes from Ambari UI, nothing happens on the UI and it seems like the operation did not go through. Ambari-server Logs: WARN [C3P0PooledConnectionPoolManager[identityToken->2s8bny9j1mxgjkn9oj5d8|79679221]-HelperThread-#0] StatementUtils:223 - Statement close FAILED.com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1 at sun.reflect.GeneratedConstructorAccessor198.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.Util.getInstance(Util.java:386) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) ROOT CAUSE The default JDBC driver installed with Ambari doesn't support MySQL 5.6.25. NOTE BEFORE THE WORKAROUND CAN BE FOLLOWED Make sure to delete the triggers from Ambari db before we follow the steps in the workaround section. Else it might result into an outage if there are to many triggers waiting in DB to be triggered when the connector version is fixed. Ambari Db tables to check: qrtz_calendars qrtz_fired_triggers qrtz_job_details qrtz_locks qrtz_paused_trigger_grps qrtz_scheduler_state qrtz_simple_triggers qrtz_simprop_triggers qrtz_triggers WORKAROUND Update mysql connector from http://mvnrepository.com/artifact/mysql/mysql-connector-java. Once the above is updated ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar where jdbc-driver is the path to the new driver.

ssingla · ‎01-03-2017

USE CASE ACCESS MODEL [Let's say in this case TERADATA] User U1 can only read all tables in Database D1 & D2 User U2 can Read Database D1 and INSERT , UPDATE , DELETE , SELECT all tables in Database D2. User U2 Cannot DROP or Create Table in Database D2. User U3 can SELECT,INSERT,UPDATE,DELETE,DROP,CREATE (ALL ACCESS) on Database D1 and D2 OBJECTIVE :- Want to have same model on Hadoop with one improvement. We will have Storage Groups and ACLs – grouping the tables of same subject area. One Database may have more than one storage Group. Say SG11 , SG12 and SG21 and SG22 (SG11 and SG12 are associated with database D1 and SG21 & SG22 with D2) User U1 should read all Tables in D1 and D2 . User U2 should only INSERT,UPDATE,DELETE and SELECT Tables covered by SG11 ( in Database D1) - U2 will not be able to Update tables in SG12 ( in Database D1) but can read User U3 can do all operations on SG11,SG12 ( D1) and SG21,SG22 (D2) and is OWNER of all the objects in D1 and D2 . OUR TARGET U3 is admin user and is Owner of the object. U2 is batch Id and can write (insert , update , delete , select) to its storage group objects . U2 an read all objects in all selected Storage Groups. U1 is regular user and can read selected storage Groups. (there is more to it but do not want to complicate) CURRENT PLAN U1 gets “r” via SG1* and SG2* U2 gets “rwx” via SG11 and “r” via SG12 (U2 can drop a table due to SG11) . We grant U2 a role HIVE that has UPDATE, DELETE,INSERT,SELECT but no DROP – It has ACL that allows these operations at File level without being OWNER. U2 tries to Drop a table in SG11 but Hive Role/ authentication does not allow this. U2 can still update rows in table of SG11. SOLUTION Create users u1,u2,u3 Create users under /user/* on hdfs create 4 storage dirs under hdfs /data/[sg11,sg12,sg21,sg22] Grant ACLs to storage directories above: [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg12[hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg21 [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg22 Create 2 databases d1 and d2. hive> desc database d1; OK d1 hdfs://node1.example.com:8020/data/sg11 u3 USER Time taken: 0.275 seconds, Fetched: 1 row(s) hive> desc database d2; OK d2 hdfs://node1.example.com:8020/data/sg21 u3 USER Time taken: 0.156 seconds, Fetched: 1 row(s) [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u2:rwx /data/sg11 [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u2:r-- /data/sg12 [hdfs@node1 root]$ hdfs dfs -getfacl /data/sg11 hdfs@node1 root]$ hdfs dfs -getfacl /data/sg12 As user U3 see the following: file2.png(267.4 kB) To get the above working we need to have the following settings: hive.users.in.admin.role = root,hive,u3 Choose Authorization = SQLAUTH hive.server2.enable.doAs=true So this works as expected.

dpnctl · ‎01-07-2017

I understood now that all basic components have to be deployed on available resources and later they should be moved to new nodes as and when they are added into cluster Thanq for your time

Online	Offline
Last Visited	‎06-11-2021 04:47 PM

Member Since	‎09-29-2015 12:16 AM
Last Visited	‎06-11-2021 04:47 PM
Posts	28
Kudos received	11

Cloudera Community

Re: zookeeper installation after cluster setup

Re: AMBARI_METRIC won't start after install

Re: copy files within hdfs based on the modified t...

How to compact ORC files on Hive.

Enable Metastore security on hive

Re: java.lang.OutOFMemoryError Java heap space - F...

Older verisons of MySQL will result in sql syntax ...

How do SQLAuth and HDFS permissions work.

Re: zookeeper installation after cluster setup