About pmj

RahulSoni · ‎03-05-2018

"i put a csv file into hdfs location and do an alter table to add that new location to the partition". Can you please explain this operation?

pmj · ‎02-14-2018

never mind, its gone automatically... probably yesterday deletion was still happening after going to trash

vperiasamy · ‎02-09-2018

Hi: Ranger usersync syncs users from various sources to make these users available during security policy authoring via Ranger UI. At the time of resource access, enforcement of policies is performed by Ranger plugins which depend on the actual service (for example HiveServer2 in case of Hive plugin, HDFS Namenode in case of HDFS plugin) to pass the identity of the user and the groups they belong to. To answer your question, sync source used for ranger usersync does not really affect the actual access enforcement. As long as the users in your text file are consistent with the real user source (LDAP/Unix or AD), ranger policies will work fine. Hope this helps.

bandarusridhar1 · ‎02-09-2018

@PJ Yes, its the same even for Userid's but make sure that user doesn't belongs to any other groups. Even if he belongs the 1st policy will get higher priority. Hope this helps.

SQLShaw · ‎02-06-2018

Hi @PJ, the honest truth is there is no good reason not to use ORC format. You can use another format like Parquet but it won't provide ACID, LLAP cache, or the same level of performance. I would say the decision is similar to not using indexes in a relational system or not running statistics. ORC is simply best practice for high performance data warehousing in Hive. Keep in mind that LLAP will allow you to cache raw text files. This may be an option if you have some strict SLA preventing you from incurring the conversion delay of the text file to ORC.

msudarsanan · ‎08-22-2018

Is there a retention we can set for these staging directories in ambari? Seems like they are not cleaning up automatically

b_rousseau · ‎01-15-2018

@PJ This might be due to io issue on JN host "Remote journal x.x.x.x:8485". Is it always the same JN which is lagging at failure? If so you should check IO load on this machine using iotop for instance. I can also be the result of a very large amount of transactions. What is the value of dfs.namenode.accesstime .precision?

rtheron · ‎08-17-2018

See https://community.hortonworks.com/questions/212611/hivepartitionssmall-filesconcatenate.html

pmj · ‎12-22-2017

@bkosaraju Thanks a lot, the splitting part works.... but i am still getting only the first match ... how do i get all matches?

Shu_ashu · ‎10-25-2017

@PJ Yeah, it might be that case.Because if you are having large number of records then it will take a lot of time to convert ORC data to csv format and if you compare these two process executing query with insert overwrite directory will perform much faster with no issues and also we can keep what ever delimiter we need and we don't need to worry about size of the data.

Online	Offline
Last Visited	‎04-02-2020 02:53 PM

Member Since	‎10-24-2015 06:29 PM
Last Visited	‎04-02-2020 02:53 PM
Posts	207
Kudos received	17

Cloudera Community

Re: alter table add partition took almost an hour

Re: Decommisiong status for a dead node

Re: Service check failing for YARN & Mapreduce...

Re: Install stops at Datanode through Ambari

Re: alter table add partition took almost an hour

Re: HDFS Pending Deletion Blocks Pending Deletion ...

Re: ranger user sync from text file

Re: restrict user access to queues

Re: when do you not use orc tables?

Re: hive staging files

Re: Urgent! one of the namenode shuts off in HA, n...

Re: orc small files Concatenate in Hive

Re: split column by regex and create a table

Re: convert orc table data into csv