Member since
09-29-2015
28
Posts
14
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
537 | 01-03-2017 10:36 PM | |
1515 | 12-30-2016 12:05 AM | |
3336 | 07-14-2016 06:51 PM |
06-04-2018
04:37 PM
Have recently run into multiple issues where ORC files on hive are not getting compacted. There are a couple of parameters required to enable concat on ORC. SET hive.merge.tezfiles=true; SET hive.execution.engine=tez; SET hive.merge.mapredfiles=true; SET hive.merge.size.per.task=256000000; SET hive.merge.smallfiles.avgsize=256000000; SET hive.merge.mapfiles=true; SET hive.merge.orcfile.stripe.level=true; SET mapreduce.input.fileinputformat.split.minsize=256000000; SET mapreduce.input.fileinputformat.split.maxsize=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.node=256000000; SET mapreduce.input.fileinputformat.split.minsize.per.rack=256000000; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='FALSE'); alter table <table_name> partition ( file_date_partition='<partition_info>') concatenate; ALTER TABLE <table_name> SET TBLPROPERTIES('EXTERNAL'='TRUE'); mapreduce.input.fileinputformat.split.minsize.per.node Specifies the minimum number of bytes that each input split should contain within a data node. The default value is 0, meaning that there is no minimum size mapreduce.input.fileinputformat.split.minsize.per.rack Specifies the minimum number of bytes that each input split should contain within a single rack. The default value is 0, meaning that there is no minimum size Make sure not to concat orc files if they are generated by spark as there is a know issue HIVE-17403 and hence being disabled in later versions. Example of this is a table/partition having 2 different files files (part-m-00000_1417075294718 and part-m-00018_1417075294718). Although both are completely different files, hive thinks these are files generated by separate instances of same task (because of failure or speculative execution). Hive will end up removing this file
... View more
Labels:
05-25-2018
06:12 PM
3 Kudos
PROBLEM Users able to drop table on hive though they are not the table owners. Need to enable metastore server security to start using the storage based auth. SOLUTION To enable metastore security we need to enable the following parameter hive.metastore.pre.event.listeners [This turns on metastore-side security.] Set to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener hive.security.metastore.authorization.manager [This tells Hive which metastore-side authorization provider to use. The default setting uses DefaultHiveMetastoreAuthorizationProvider, which implements the standard Hive grant/revoke model. To use an HDFS permission-based model (recommended) to do your authorization, use StorageBasedAuthorizationProvider as instructed above] Set to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider hive.security.metastore.authenticator.manager Set to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator hive.security.metastore.authorization.auth.reads When this is set to true, Hive metastore authorization also checks for read access. It is set to true by default. Read authorization checks were introduced in Hive 0.14.0
... View more
- Find more articles tagged with:
- Hive
- hiveserver2
- Issue Resolution
Labels:
04-18-2017
06:38 PM
@sanket patel intermittent zk issues can lead to cleaner chors failing. https://issues.apache.org/jira/browse/HBASE-15234
... View more
03-28-2017
12:14 AM
I would recommend to split up the file and then the MR job of yours on each of the file.
... View more
02-10-2017
05:58 PM
@Subramanian Santhanam can you please add more details with screenshots and logs?
... View more
02-05-2017
05:57 PM
Can you provide the container logs?
... View more
01-17-2017
05:22 AM
2 Kudos
Repo DescriptionRepo Info Github Repo URL https://github.com/sarunsingla/admin_utilities/blob/master/regionsize_per_regionserver.py Github account name sarunsingla/admin_utilities/blob/master Repo name regionsize_per_regionserver.py
... View more
- Find more articles tagged with:
- Hadoop Core
- HBase
- JMX
- region
- regionserver
- regionsize
- utilities
Labels:
01-05-2017
03:23 PM
1 Kudo
Repo Description You can use this script to automatically take jstacks. Copy the script and just execute something like: <./script_name.pl>. Please let me know if this helps. It works on the basis of jstack -F , for systems which are very busy you can replace it with kill -3 pid. [root@node1 ~]# ./jstack.pl
Which component are you looking to take a jstack for:
namenode
Process name is : namenode
Process id for namenode is: 15046
How many jstack required:
2
Sleep between each jstack
1
Process Id for Namenode: 15046
Taking a jstack now
jstack_output_1483629788
Process Id for Namenode: 15046
Taking a jstack now
jstack_output_1483629790 Repo Info Github Repo URL https://github.com/sarunsingla/admin_utilities/blob/master/auto-jstack.pl Github account name sarunsingla/admin_utilities/blob/master Repo name auto-jstack.pl
... View more
- Find more articles tagged with:
- automate
- automation
- jstack
- solutions
- utilities
01-05-2017
03:53 AM
please check if the heap configuration is as per the recommendation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html
... View more
01-03-2017
10:36 PM
@Mahen Jay can you please elaborate more on the usecase here? Do you already have 3 Zookeeper nodes and looking to add more on later stage? If this is the case, then yes you can always add more ZK after the cluster is created. Or are you saying to just skip zookeeper nodes for now? If this is the case then I do not think it would be possible as it is a depended service. You can always move the ZK nodes at a later stage to other machines. You need to have ZK nodes at the time of cluster creation. Please let me know if the use case is different.
... View more
01-03-2017
10:24 PM
@David Sheard great that it worked. Can you please accept the answer 🙂
... View more
01-03-2017
10:22 PM
1 Kudo
User tries to decommission/recommission nodes from Ambari UI, nothing happens on the UI and it seems like the operation did not go through. Ambari-server Logs: WARN [C3P0PooledConnectionPoolManager[identityToken->2s8bny9j1mxgjkn9oj5d8|79679221]-HelperThread-#0] StatementUtils:223 - Statement close FAILED.com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'OPTION SQL_SELECT_LIMIT=DEFAULT' at line 1
at sun.reflect.GeneratedConstructorAccessor198.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1052) ROOT CAUSE
The default JDBC driver installed with Ambari doesn't support MySQL 5.6.25. NOTE BEFORE THE WORKAROUND CAN BE FOLLOWED
Make sure to delete the triggers from Ambari db before we follow the steps in the workaround section. Else it might result into an outage if there are to many triggers waiting in DB to be triggered when the connector version is fixed. Ambari Db tables to check:
qrtz_calendars qrtz_fired_triggers qrtz_job_details qrtz_locks qrtz_paused_trigger_grps qrtz_scheduler_state qrtz_simple_triggers qrtz_simprop_triggers qrtz_triggers WORKAROUND Update mysql connector from http://mvnrepository.com/artifact/mysql/mysql-connector-java. Once the above is updated ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar where jdbc-driver is the path to the new driver.
... View more
- Find more articles tagged with:
- Ambari
- ambari-server
- Database
- Hadoop Core
- How-ToTutorial
- MySQL
Labels:
01-03-2017
09:27 PM
1 Kudo
USE CASE ACCESS MODEL [Let's say in this case TERADATA]
User U1 can only read all tables in Database D1 & D2 User
U2 can Read Database D1 and INSERT , UPDATE , DELETE , SELECT all
tables in Database D2. User U2 Cannot DROP or Create Table in Database
D2. User U3 can SELECT,INSERT,UPDATE,DELETE,DROP,CREATE (ALL ACCESS) on Database D1 and D2 OBJECTIVE :- Want to have same model on Hadoop with one improvement. We
will have Storage Groups and ACLs – grouping the tables of same subject
area. One Database may have more than one storage Group. Say SG11 ,
SG12 and SG21 and SG22 (SG11 and SG12 are associated with database D1
and SG21 & SG22 with D2) User U1 should read all Tables in D1 and D2 . User
U2 should only INSERT,UPDATE,DELETE and SELECT Tables covered by SG11 (
in Database D1) - U2 will not be able to Update tables in SG12 ( in
Database D1) but can read User U3 can do all operations on SG11,SG12 ( D1) and SG21,SG22 (D2) and is OWNER of all the objects in D1 and D2 . OUR TARGET U3 is admin user and is Owner of the object. U2
is batch Id and can write (insert , update , delete , select) to its
storage group objects . U2 an read all objects in all selected Storage
Groups. U1 is regular user and can read selected storage Groups. (there is more to it but do not want to complicate) CURRENT PLAN U1 gets “r” via SG1* and SG2* U2 gets “rwx” via SG11 and “r” via SG12 (U2 can drop a table due to SG11) . We
grant U2 a role HIVE that has UPDATE, DELETE,INSERT,SELECT but no DROP –
It has ACL that allows these operations at File level without being
OWNER. U2 tries to Drop a table in SG11 but Hive Role/ authentication does not allow this. U2 can still update rows in table of SG11.
SOLUTION
Create users u1,u2,u3 Create users under /user/* on hdfs create 4 storage dirs under hdfs /data/[sg11,sg12,sg21,sg22] Grant ACLs to storage directories above: [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg12[hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg21 [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u3:rwx /data/sg22 Create 2 databases d1 and d2. hive> desc database d1;
OK
d1 hdfs://node1.example.com:8020/data/sg11 u3 USER
Time taken: 0.275 seconds, Fetched: 1 row(s) hive> desc database d2;
OK
d2 hdfs://node1.example.com:8020/data/sg21 u3 USER
Time taken: 0.156 seconds, Fetched: 1 row(s) [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u2:rwx /data/sg11 [hdfs@node1 root]$ hdfs dfs -setfacl -m user:u2:r-- /data/sg12 [hdfs@node1 root]$ hdfs dfs -getfacl /data/sg11 hdfs@node1 root]$ hdfs dfs -getfacl /data/sg12 As user U3 see the following: file2.png(267.4 kB) To get the above working we need to have the following settings:
hive.users.in.admin.role = root,hive,u3 Choose Authorization = SQLAUTH hive.server2.enable.doAs=true So this works as expected.
... View more
- Find more articles tagged with:
- beeline
- FAQ
- Hadoop Core
- HDFS
- hdfs-permissions
- Hive
- How-ToTutorial
- Permissions
Labels:
12-30-2016
12:09 AM
try using https://community.hortonworks.com/questions/1021/how-to-remove-all-external-users-from-the-ranger-r.html
... View more
12-30-2016
12:05 AM
1 Kudo
Please try to install the ambari-metrics-collector yum install ambari-metrics-collector on the machine and restart ambari metrics.
... View more
12-30-2016
12:03 AM
Seems like the blocks that are shown as missing were on the disks that went bad. Can you please provide the Namenode logs and also for one of the files that show missing in with the fsck can you please check the following: hdfs fsck <path to the file> -files -blocks -locations -racks
... View more
08-07-2016
12:26 AM
1 Kudo
Could you please check if you have the local and log folder under the yarn folder? if not lets add these folders and give them yarn:hadoop access and restart NM to see if that fixes the issue.
... View more
07-26-2016
08:21 PM
This issue is tracked on https://issues.apache.org/jira/browse/HBASE-16288
... View more
07-14-2016
06:51 PM
1 Kudo
1) For moving files from 2nd april to another folder in hdfs.
for
i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print
$8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done 2) Once the above is done you can just setup a crontab.
Please try this scenario out on a test folder in non prod.
... View more