About arunpoy

fardin_behboudi · ‎01-01-2017

it is not even related to the question

aervits · ‎05-13-2016

Accepting your own answer is only valid if your own solution corrected your issue and nobody else provided the right answer. I unaccepted your answer as it is a response to the other person and moved it to comment. Community is only as good as the value we provide, please respect the rules and give credit where it's due.

aervits · ‎05-05-2016

you can just use distcp https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist.html

m_a_vervuurt · ‎05-02-2016

Spark DataFrame use the Catalyst optimiser under the hood. The spark code gets transformed into an abstract syntax tree or logic plan on which several optimisations are applied before code is being generated from it. See the following paper for the fun explanation https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf. The reason why Spark DataFrames are fast in all languages is because whether you use Python, Java or Scala the implementation used under the hood is the Scala Implementation of a Data Frame or Catalyst Optimiser. Whether you use Scala, Java or Python the Logical Plans are passed on to the same Catalyst Optimiser.

arunpoy · ‎04-20-2016

Thanks @Robert Levas, I enabled DEBUG and found out the issue. AES 256 was not enabled on the linux machines. Ambari 2.1 doesnt seem to check this strictly, whereas ambari 2.2 is strictly enforcing. Thanks a lot for your help.

arunpoy · ‎04-15-2016

thanks @Predrag Minovic

nmaillard1 · ‎04-06-2016

Hello arunkumar As a general rule it will come back to what you are trying to achieve and how you want to service data. Remember that Hbase's performance is directly derived from the rowkey and hence how you access data. Hbase will split up data in regions served by region servers and on a lower level data will be split by Column Family. A single entry however will be served by the same region. At high level the difference between tall-narrow and flat-wide comes back to scans vs gets. Since Hbase has an ordered on the rowkey storage policy and full scans are costly. A Tall-narrow approach would be to have a more complex rowkey giving adjacency of similar elements and allowing to do focused scans for logical group of entries. A Flat-wide approach would ahve much more information in the entry itself, you "get" the entry through the rowkey and the entry would have sufficient information to do your compute or answer your query. hope this helps

vperiasamy · ‎03-22-2016

@ARUNKUMAR RAMASAMY - Ranger UI will show audit data from either Solr or DB. Since DB support will be deprecated from future releases, you are recommended to move to Solr. Audit to HDFS is for long term storage and this can be done in addition to Solr.

arunpoy · ‎03-21-2016

@Shivaji Thanks for your solution. It worked fine.

rchintaguntla · ‎03-11-2016

@arunkumar There is no direct way compare insensitive values from HBase. You need to write custom filter and add the jar to all region servers and client Or else you need to write custom coprocessor to check the value and not to skip the results when upper of value matching. If you use phoenix you can run query with where condition on UPPER(column_name) = 'XYZ'. It's just simple. Phoenix do lot of things for us.

Online	Offline
Last Visited	‎03-14-2019 05:48 AM

Member Since	‎01-21-2016 11:26 AM
Last Visited	‎03-14-2019 05:48 AM
Posts	290
Kudos received	76

Cloudera Community

Re: Dead region server showing on ambari even afte...

Re: Zeppelin phoenix error

Re: job history not available

Re: Submitting a spark sql job

Re: phoenix logs

Re: Hbase backup on S3

Re: why dataframes are faster in all lnaguages?

Re: Kerberos error after ambari upgrade

Re: phoenix query server installation

Re: Hbase tall-narrow or flat wide design

Re: Ranger is not able to audit and polices are no...

Re: Manual KDC and kerberos option in ambari

Re: hbase case insensitive query