Member since
06-09-2016
185
Posts
22
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2164 | 04-21-2017 07:57 AM | |
1349 | 04-18-2017 07:07 AM | |
3193 | 02-27-2017 05:41 AM | |
891 | 12-09-2016 11:05 AM | |
1256 | 11-24-2016 11:20 AM |
02-13-2017
09:29 AM
4 Kudos
Hi @Avijeet Dash What @Jobin George suggested would help to share common static configuratiosn at various part of a NiFi flow. In addition to that, if you'd like to know how to Put/Get from distributed cache, and how to enrich FlowFiles with cached values, this example might be helpful: Template file is available here: https://gist.github.com/ijokarumawak/8ba9a2a1b224603f877e960a942a6f2b Thanks, Koji
... View more
02-03-2017
04:36 AM
That's great @Tibor Kiss - I am trying to run a spark streaming - how do I say to run on standalone cluster mode?
... View more
02-03-2017
04:29 PM
@Avijeet Dash I agree with you. It is much more reliable if after your streaming job, your data lands in Kafka and then written to HBase/HDFS. This decouples your streaming job from writing. I wouldn't recommend using Flume. Go with the combination of Nifi and Kafka.
... View more
01-25-2017
02:31 PM
1 Kudo
In my opinion it is best to still regard Hive as an analytical DB. With the ACID (updates) and streaming features the community is stretching the tool to things it wasn't designed for. These are not to be used at very large scale and very large loads. ACID and streaming will put tremendous strain on the Hive metastore. In the end the native storage model of Hive is still based on streaming through whole HDFS files, even with ORC. Without true indexes Hive will never be a real good match for high transactional workloads. Doing large analytical sweeps/scans through data is still at odds with high speed random read/write/update/delete. But that is not bad, there are just other components in HDP to do the other jobs right.
... View more
01-23-2017
06:37 AM
The only thing you can do is limit which IP's can access your cluster. Basically specifying security rules for inbound traffic (or outbound also). http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups
... View more
01-11-2017
10:05 PM
@Avijeet Dash Here is a link for HBase sizing that you can use: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_Sys_Admin_Guides/content/ch_clust_capacity.html If you are using both HBase and SOLR, I am going to assume you are going to index HBase columns in SOLR. There are two concepts in SOLR when it comes to sizing. What will you be indexing and what will you be storing. If you know what you'll be storing (all of HBase columns? Probably not, but I am no one to say) and what will you be indexing (definitely not everything but whatever you index will be in addition to what you store). As for SOLR is better without HDFS is more of an opinion. I have seen cluster where SOLR cloud is running just fine along side HBase and HDFS. Here is what you should remember. Zookeeper should have its own dedicated disk (please do not share zookeeper disks - I cannot over emphasize this). Size appropriately. Meaning have the right amount of CPU and memory resources. If you are going to give 4GB of heap space to SOLR then there will likely be problems (do not go on the other extreme as it will result in Java garbage collection pauses - ideal heap to start with is 8-12 GB). Another thing to remember is what kind of queries will your end users be running. If they start scanning entire SOLR index, there shouldn't be a doubt that you will run into issues.
... View more
01-27-2017
09:08 AM
Thanks @mqureshi - that answers my question. However a number of components have started using HBASE as a meta-data store such as Atlas, Falcon etc. How to see these use cases?
... View more
01-04-2017
09:57 PM
1 Kudo
I was unable to find a way around this. The NameNode just gives admin rights to the system user name which started its process, by default hdfs user. You can also give others superuser permissions with dfs.permissions.superusergroup and dfs.cluster.administrators. It seems ranger doesn't disallow superusers unless in the case of KMS encrypted zones. In terms of KMS I can see there is a blacklist mechanism to disallow superuser. I don't think there is a similar feature for Ranger itself.
... View more
01-02-2017
05:42 AM
@Divakar Annapureddy I checked the document Eliminates the root account and replaces it with a compliance administrator account that executes commands with sudo This requirement doesn't seem to be supported by Ranger - hdfs can access folders protected by Ranger
... View more
12-13-2016
03:37 PM
4 Kudos
"I read that accumulo supports cell level security, and hbase doesn't. Is this true?" Both systems support cell-level security; however, I would say that Accumulo's is a more "battle-hardened" implementation. I'm not aware of any case studies behind comparing the two implementations. "and secondly accumulo supports multiple data sources ingestion better
and hbase one source such as one web site. is it true? in what ways?" No, I don't know in what way this would be possible. Both systems can ingest data from a variety of sources. This sounds like something was taken out of context. "can someone share any accumulo case studies?" http://accumulo.apache.org/papers/ has some content, http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf, https://arxiv.org/abs/1406.4923, and http://accumulo.apache.org/papers/accumulo-benchmarking-2.1.pdf are each interesting. This talk from PHEMI by Russ Weeks is also particularly nice http://accumulosummit.com/program/talks/preventing-bugs-how-phemi-put-accumulo-to-work-in-the-field/ "Can accumulo be used with full support and rest of hadoop ecosystem?" In short, "yes", but this is subjective due to what you consider the "rest of hadoop ecosystem" and what degree of integration you're expecting. The same goes for HBase. As for HDP, yes, both HBase and Accumulo are fully supported as Tim pointed out already. I would suggest you ask more pointed questions if you have specific concerns.
... View more