Member since
09-18-2015
3274
Posts
1159
Kudos Received
426
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 45458 | 02-09-2016 06:13 PM |
01-31-2016
04:30 PM
2 Kudos
Original Article Can I authorize access to Kafka over a non-secure channel via Ranger? Yes. you can control access by ip-address. Can I authorize access to Kafka over non-secure channel by user/user-groups? No, one can’t use user/group based access to authorize Kafka access over a non-secure channel. This is because it isn't possible to assert client’s identity over the non-secure channel. Why do we have to specify public user group on all policies items created for authorizing Kafka access over non-secure channel?
Kafka can’t assert the identity of client user over a non-secure channel. Thus, Kafka treats all users for such access as an anonymous user (a special user literally named ANONYMOUS ). Ranger's public user group is a means to model all users which, of course, includes this anonymous user ( ANONYMOUS ). What are the specific things to watch out for when setting up authorization for accessing Kafka over non-secure channel?
Make sure that all broker-ips have Kafka admin access to all topics, i.e. *.
Make sure no publishers or consumers are running on broker nodes that need access control. Since broker ips have open access it isn’t possible to control access on those nodes. Please take time to read the original article.
... View more
01-31-2016
02:12 PM
1 Kudo
Tools in use: HBase shell and Zeppelin
User demouser needs access to HBase table called PRICES.
User zeppelin needs the same access to run few queries.
You can run this demo by using Hortonworks Sandbox
... View more
01-30-2016
07:28 PM
@Shaofeng Shi Thanks for sharing all the comments. I wonder if it;s possible to post them as an article...Please
... View more
01-29-2016
06:01 PM
@rgarcia Do you have information on latest version of Ambari and HDP that is compatible with Centrify?
... View more
01-23-2016
11:30 PM
3 Kudos
Node labels enable you partition a cluster into sub-clusters so that jobs can be run on nodes with specific characteristics. For example, you can use node labels to run memory-intensive jobs only on nodes with a larger amount of RAM. Node labels can be assigned to cluster nodes, and specified as exclusive or shareable. You can then associate node labels with capacity scheduler queues. Each node can have only one node label.
Demo:
Use case
2 node labels : node1 & node2 + Default & Spark queue
Submit job to node1
Node labels added : yarn rmadmin -addToClusterNodeLabels "node1(exclusive=true),node2(exclusive=false)"
Label assigned: yarn rmadmin -replaceLabelsOnNode "phdns02.cloud.hortonworks.com=node2 phdns01.cloud.hortonworks.com=node1"
Job Submission:
Job send to node1 only and assign to queue spark
hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node1
Job send to node2 only and assign to queue spark
hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue spark -node_label_expression node2
Job send to node1 only and assign to queue default
hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar-queue default -node_label_expression node1
More details - Doc link
SlideShare
... View more
Labels:
01-20-2016
03:10 PM
3 Kudos
Apache Ranger let's you control the users who can submit the jobs to Yarn queues.
Demo 1 )
Default queue
User demouser can submit jobs to default queue and can be controlled from Ranger UI
Demo 2) Non-Default queue
User demouser can submit jobs to spark queue and can be controlled from Ranger UI
Happy Hadooping!!
More
information
... View more
Labels:
01-19-2016
12:41 PM
4 Kudos
Linkedin Post Have you heard of computing on traditional disk-based or flash-based technologies? We all use disks/flash storage in our laptops, desktop or servers sitting in the data center. What is Apache Ignite In-Memory Data Fabric ? It's high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time "orders of magnitude faster than possible with traditional disk-based or flash-based technologies." Fabric: In information technology, fabric is a synonym for the words framework or platform. You can view Ignite as a collection of independent, well-integrated, in-memory components geared to improve performance and scalability of your application.Source How Ignite fits with HDFS? (Ignite file system , IGFS) IGFS shakes hand with HDFS. Hadoop can run over IGFS in plug-n-play fashion and significantly reduce I/O and improve both, latency and throughput. Why do we need another layer on the top of HDFS? IGFS supports dual-mode. As you can see in the above pic, it can be deployed as main file system or it can sit on top of HDFS to provide caching layer, provides highly configurable read-through and write-through behaviour. IGFS serve as an in-memory caching layer over disk-based HDFS. Installation scroll down to "In-Memory Hadoop Accelerator:" The best part of the install is HADOOP_README.txt and useful documentation. Check out HDP and Ignite setup guide Hive and Ignite Spark and Ignite Zeppelin and Ignite Ignite and DataGrid
... View more
Labels:
01-19-2016
12:36 PM
8 Kudos
Apache Drill is an open source, low-latency query engine for Hadoop that delivers secure, interactive SQL analytics at petabyte scale. With the ability to discover schemas on-the-fly, Drill is a pioneer in delivering self-service data exploration capabilities on data stored in multiple formats in files or NoSQL databases. By adhering to ANSI SQL standards, Drill does not require a learning curve and integrates seamlessly with visualization tools. Source Tutorial: Grab latest version of drill wget http://getdrill.org/drill/download/apache-drill-1.2.0.tar.gz
tar xvfz apache-drill-1.2.0.tar.gz
/root/drill/apache-drill-1.2.0 [root@node1 apache-drill-1.2.0]# cd bin/ Start Drill in distributed mode ( You can start in embedded mode too) [root@node1 conf]# ../bin/drillbit.shstart starting drillbit, logging to /root/drill/apache-drill-1.2.0/log/drillbit.out Drill Web Console - http://host:8047 Enable storage plugins Click Storage --> Enable hbase,hive,mongo Modify storage plugin for Hive and HBase as per your Hadoop cluster setup for example: click update for hive under storage plugins modify hive.metastore.uris Hive Test: launch hive shell hive> create table drill_hive ( info string); hive> insert into table drill_hive values ('This is Hive and you are using Drill'); [root@node1 bin]# ./drill-conf apache drill 1.2.0 "start your sql engine" 0: jdbc:drill:> use hive; 0: jdbc:drill:> select info from drill_hive; HBase Change storegae properties in case HBase zookeeper.znode.parent points to /hbase-unsecure add "zookeeper.znode.parent": "/hbase-unsecure" Let's check the query plan and metrics Click Profile You will see queries under completed queries. Click the Query to see the query execution stats. What is foreman?
Link More information For ODBC/JDBC setup Happy Hadoooping!!!!
... View more
Labels: