Member since
01-21-2016
290
Posts
76
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3232 | 01-09-2017 11:00 AM | |
1306 | 12-15-2016 09:48 AM | |
5619 | 02-03-2016 07:00 AM |
05-13-2016
09:52 AM
Accepting your own answer is only valid if your own solution corrected your issue and nobody else provided the right answer. I unaccepted your answer as it is a response to the other person and moved it to comment. Community is only as good as the value we provide, please respect the rules and give credit where it's due.
... View more
05-05-2016
07:26 AM
1 Kudo
you can just use distcp https://community.hortonworks.com/questions/7165/how-to-copy-hdfs-file-to-aws-s3-bucket-hadoop-dist.html
... View more
05-02-2016
10:13 AM
2 Kudos
Spark DataFrame use the Catalyst optimiser under the hood. The spark code gets transformed into an abstract syntax tree or logic plan on which several optimisations are applied before code is being generated from it. See the following paper for the fun explanation https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf. The reason why Spark DataFrames are fast in all languages is because whether you use Python, Java or Scala the implementation used under the hood is the Scala Implementation of a Data Frame or Catalyst Optimiser. Whether you use Scala, Java or Python the Logical Plans are passed on to the same Catalyst Optimiser.
... View more
04-20-2016
06:19 AM
Thanks @Robert Levas, I enabled DEBUG and found out the issue. AES 256 was not enabled on the linux machines. Ambari 2.1 doesnt seem to check this strictly, whereas ambari 2.2 is strictly enforcing. Thanks a lot for your help.
... View more
04-06-2016
07:05 AM
4 Kudos
Hello arunkumar As a general rule it will come back to what you are trying to achieve and how you want to service data. Remember that Hbase's performance is directly derived from the rowkey and hence how you access data. Hbase will split up data in regions served by region servers and on a lower level data will be split by Column Family. A single entry however will be served by the same region. At high level the difference between tall-narrow and flat-wide comes back to scans vs gets. Since Hbase has an ordered on the rowkey storage policy and full scans are costly. A Tall-narrow approach would be to have a more complex rowkey giving adjacency of similar elements and allowing to do focused scans for logical group of entries. A Flat-wide approach would ahve much more information in the entry itself, you "get" the entry through the rowkey and the entry would have sufficient information to do your compute or answer your query. hope this helps
... View more
03-22-2016
06:48 PM
@ARUNKUMAR RAMASAMY - Ranger UI will show audit data from either Solr or DB. Since DB support will be deprecated from future releases, you are recommended to move to Solr. Audit to HDFS is for long term storage and this can be done in addition to Solr.
... View more
03-11-2016
03:30 PM
2 Kudos
@arunkumar There is no direct way compare insensitive values from HBase. You need to write custom filter and add the jar to all region servers and client Or else you need to write custom coprocessor to check the value and not to skip the results when upper of value matching. If you use phoenix you can run query with where condition on UPPER(column_name) = 'XYZ'. It's just simple. Phoenix do lot of things for us.
... View more
- « Previous
- Next »