About dsun

dsun · ‎07-19-2017

Please don't forget to 'accept' the answer if it helped, thanks.

dsun · ‎07-19-2017

It appears all the dependency libs are present on the master, as well as all the region nodes, so I just added a parameter 'hbase.table.sanity.checks' in hbase-site.xml, and set it to 'false' in Ambari. After that, I restarted HBaseMaster as well as all the RegionServers, then phoenix started working.

dsun · ‎07-19-2017

It seems I'm running into a connection issue between Phoenix and HBase, below is the error: [root@dsun5 bin]# ./sqlline.py dsun0.field.hortonworks.com:2181:/hbase-unsecure Setting property: [incremental, false] Setting property: [isolation, TRANSACTION_READ_COMMITTED] issuing: !connect jdbc:phoenix:dsun0.field.hortonworks.com:2181:/hbase-unsecure none none org.apache.phoenix.jdbc.PhoenixDriver Connecting to jdbc:phoenix:dsun0.field.hortonworks.com:2181:/hbase-unsecure SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/phoenix/phoenix-4.7.0.2.6.1.0-129-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 17/07/19 22:46:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/07/19 22:46:11 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Error: org.apache.hadoop.hbase.DoNotRetryIOException: Class org.apache.phoenix.coprocessor.MetaDataEndpointImpl cannot be loaded Set hbase.table.sanity.checks to false at conf or table descriptor if you want to bypass sanity checks at org.apache.hadoop.hbase.master.HMaster.warnOrThrowExceptionForFailure(HMaster.java:1878) at org.apache.hadoop.hbase.master.HMaster.sanityCheckTableDescriptor(HMaster.java:1746) at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1652) at org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:483) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:59846) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2141) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167) (state=08000,code=101) phoenixserver.jar has been manually installed on the HBase master.

dsun · ‎07-19-2017

@sysadmin CreditVidya There are several approaches I can think of might help: 1. It appears MR intermediate data is not being purged properly by Hadoop itself, you can manually delete files/folders configured in mapreduce.cluster.local.dir after MR jobs are completed, say files/folders older than 3 days. You can probably create a cron job for that purpose. 2. Make sure to implement cleanup() method in each mapper/reducer class, which will clean up local resources, and aggregates before the task exists. 3. Run hdfs balancer regularly, normally weekly or bi-weekly, that way you won't have too much more hdfs data stored on some nodes comparing to the others, as MR jobs always try to use the local copy of the data first, and always keep an eye on 'disk usage' for each host in Ambari. Hope that helps.

dsun · ‎07-19-2017

@sysadmin CreditVidya There are several approaches I can think of might help: 1. It appears MR intermediate data is not being purged properly by Hadoop itself, you can manually delete files/folders configured in mapreduce.cluster.local.dir after MR jobs are completed, say files/folders older than 3 days. You can probably create a cron job for that purpose. 2. Make sure to implement cleanup() method in each mapper/reducer class, which will clean up local resources, and aggregates before the task exists. 3. Run hdfs balancer regularly, normally weekly or bi-weekly, that way you won't have too much more hdfs data stored on some nodes comparing to the others, as MR jobs always try to use the local copy of the data first, and always keep an eye on 'disk usage' for each host in Ambari. Hope that helps.

dsun · ‎07-18-2017

One approach you can take is to enable Hive impersonation - set ‘hive.server2.enable.doAs=false’ in Hive Configs, which will give permissions of the Hive related HDFS folders to the ‘hive’ user, and other users wouldn’t be able to access HDFS files directly. In your case, I assume you have doAs set to true, the user running the Hive query requires to have permissions defined for both HDFS and Hive in Ranger, which can be an issue if you have too many tables, as all your tables are managed under the hive/warehouse directory rather than user’s home folders, and for each table you will need to grant user permissions via HDFS policy in Ranger to the table location for the specific tables. Even you have ‘doAs’ set to true, you will still be able to see the actual user in Ranger Audit logs, and it’s just the HDFS related tasks will run as the ‘hive’ user.

dsun · ‎07-18-2017

Did you set up proper Hive resource access policies for the users/groups in Ranger? Here is a good totorial on how to set them up in Ranger: https://hortonworks.com/blog/best-practices-for-hive-authorization-using-apache-ranger-in-hdp-2-2/

dsun · ‎07-18-2017

You should set the permission of Hive warehouse as 700 instead of 000, so that normal users are unable to access the secured tables, and let Ranger control the Hive policies. In addition, you will need to make sure 'hive.warehouse.subdir.inherit.perms=true', that will enforce the newly created tables inherit the 700 permission. Hope that helps.

dsun · ‎07-18-2017

@sysadmin CreditVidya Assuming you are referring to 'Non DFS Used:' in the NameNode UI page, which is the total across the whole cluster, and could be in TB's depending the size of your total storage. ALSO, that number refers to 'How much configured DFS capacity are occupied by non dfs use', here is a good article around it: https://stackoverflow.com/questions/18477983/what-exactly-non-dfs-used-means Hope that helps.

dsun · ‎07-17-2017

As you have heterogeneous worker nodes, I'd recommend setting up two separate host config groups first, then manage HDFS separately. Here is the link to how to set up config groups in Ambari: https://docs.hortonworks.com/HDPDocuments/Ambari-2.5.1.0/bk_ambari-operations/content/using_host_config_groups.html For each host group, you can config the non DSF use by setting the proper value for 'dfs.datanode.du.reserved' (in bytes per volume), normally it should be 20%- 25% of disk storage. Also, keep in mind non DFS can grow into reserved DFS storage, you should regularly delete logs and other non HDFS data that are taking large local storage, I normally use commands like 'du -hsx * | sort -rh | head -10' to identify top 10 largest folders.

Online	Offline
Last Visited	‎05-18-2022 09:57 AM

Member Since	‎03-04-2019 08:25 AM
Last Visited	‎05-18-2022 09:57 AM
Posts	59
Kudos received	24

Cloudera Community

Re: Remove HDF mpack

Re: HDP 3.0 Cloudbreak Deployment possible?

Re: Ranger Admin Not Starting After Ambari Upgrade

Re: How to use NiFi to query Kudu backed Impala ta...

Re: Is there ever a possibility where data in hdfs...

Re: How to increase DFS space on existing cluster

Re: Phoenix and HBase connection

Phoenix and HBase connection

Re: How to increase DFS space on existing cluster

Re: How to increase DFS space on existing cluster

Re: How to manage Hive warehouse HDFS directory pe...

Re: How to manage Hive warehouse HDFS directory pe...

Re: How to manage Hive warehouse HDFS directory pe...

Re: How to increase DFS space on existing cluster

Re: How to increase DFS space on existing cluster