Member since
03-24-2016
184
Posts
239
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2683 | 10-21-2017 08:24 PM | |
1643 | 09-24-2017 04:06 AM | |
5794 | 05-15-2017 08:44 PM | |
1780 | 01-25-2017 09:20 PM | |
5884 | 01-22-2017 11:51 PM |
02-09-2023
07:46 AM
@jagan20, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
02-20-2020
09:15 AM
@BI_Gabor,
Yes, this thread is older and was marked 'Solved' in April of 2016; you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your question that could aid others in providing a more accurate answer.
... View more
12-21-2017
08:53 AM
Also I've tried spark-llap on HDP-2.6.2.0 with Spark 1.6.3 and http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/1.0.0.2.5.5.5-2/spark-llap-1.0.0.2.5.5.5-2-assembly.jar, but unfortunately, when I tried to execute a simple "select count" query in beeline, got the following error messages: 0: jdbc:hive2://node-05:10015/default> select count(*) from ods_order.cc_customer;
Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
TungstenAggregate(key=[], functions=[(count(1),mode=Final,isDistinct=false)], output=[_c0#56L])
+- TungstenExchange SinglePartition, None
+- TungstenAggregate(key=[], functions=[(count(1),mode=Partial,isDistinct=false)], output=[count#59L])
+- Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@690c5838,Map(table -> ods_order.cc_customer, url -> jdbc:hive2://node-01.hdp.wiseda.com.cn:10500))[] (state=,code=0) thriftserver-err-msg.txt and the log messages in thriftserver as shown in attached "thriftserver-err-msg.txt".
... View more
11-01-2017
07:36 PM
@Vadim Vaks You are welcome and thanks for the minimized query parameters.
... View more
09-24-2017
04:06 AM
1 Kudo
@Amey Hegde I used your blue print to create a cluster via Cloud Break and I am able to enable Phoenix without any issues. Log into Ambari --> Select Hbase Service --> Click the Config Tab (Settings, not Advanced) -- > Scroll to the bottom of the page to the section called Phoenix SQL --> Click on the switch called "Enable Phoenix" --> Save Settings --> Restart all affected services . This will start an install and config process that will install Phoenix binaries and make a few configuration tweaks to hbase-site. If you SSH to the console, you can run: /usr/hdp/current/phoenix-client/bin/sqlline.py, you can immediately start creating tables. Once you get data loaded, you can issue queries from here as well.
... View more
05-11-2017
03:10 AM
@khorvath I dug into the Ambari logs as you suggested. There is nothing obvious but Ambari is definitely returning success/complete as it begins to install services. I probably should have led with this, but, I am using early bits from Ambari 2.5.1 (the version where HDP and HDF can be installed in the same cluster). There is probably some sort of disconnect between what Salt is looking far and what Ambari is actually returning. Perhaps this is already being addressed in CB 1.15.0. Thanks your your help.
... View more
12-29-2016
06:40 PM
6 Kudos
This tutorial is a follow on to the Apache Spark Fine Grain Security with LLAP Test Drive tutorial. These two articles cover the entire range of security authroization capabilities available for Spark on the Hortonworks Data Platform. Getting Started Install an HDP 2.5.3 Cluster via Ambari. Make sure the following components are installed: Hive Spark Spark Thrift Server Hbase Ambari Infra Atlas Ranger Enable LLAP Navigate to the Hive Configuration Page and click Enable Interactive Query. Ambari will ask what host group to put the Hiveserver2 service into. Select the Host Group with the most available resources. With Interactive Query enabled, Ambari will display new configurations options. These options provide control of resource allocation for the LLAP service. LLAP is a set of long lived daemons that facilitate interactive query response times and fine grain security for Spark. Since the goal of this tutorial is to test out fine grain security for Spark, LLAP only needs a minimal allocation of resources. However, if more resources are available, feel free to crank up the allocation and run some Hive queries against the Hive Interactive server to get a feel for how LLAP improves Hive's performance. Save configurations, confirm and proceed. Restart all required services. Navigate to Hive Summary tab and ensure that Hiveserver2 Interactive is started Download Spark-LLAP Assembly From the command line as root: wget -P /usr/hdp/current/spark-client/lib/ http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/1.0.0.2.5.3.0-37/spark-llap-1.0.0.2.5.3.0-37-assembly.jar Copy the assembly to the same location on each host where Spark may start an executor. If queues are not enabled, this likely means all hosts running a node manager service. Make sure all users have read permissions to that location and the assembly file Configure Spark for LLAP - In Ambari, navigate to the Spark service configuration tab: - Find Custom-spark-defaults, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Custom spark-thrift-sparkconf, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Advanced-spark-env - Set spark_thrift_cmd_opts attribute to --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - Save all configuration changes - Restart all components of Spark - Make sure Spark-Thrift server is started Enable Ranger for Hive - Navigate to Ranger Service Configs tab - Click on Ranger Plugin Tab - Click the switch labeled "Enable Ranger Hive Plugin" - Save Configs - Restart All Required Services Create Stage Sample Data in External Hive Table - From Command line cd /tmp
wget https://www.dropbox.com/s/r70i8j1ujx4h7j8/data.zip
unzip data.zip
sudo -u hdfs hadoop fs -mkdir /tmp/FactSales
sudo -u hdfs hadoop fs -chmod 777 /tmp/FactSales
sudo -u hdfs hadoop fs -put /tmp/data/FactSales.csv /tmp/FactSales
beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "CREATE TABLE factsales_tmp (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int,LoadDate timestamp, UpdateDate timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/tmp/FactSales'" Move data into Hive Tables - From Command line beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "CREATE TABLE factsales (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int, LoadDate timestamp, UpdateDate timestamp) clustered by (saleskey) into 7 buckets stored as ORC"
beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "INSERT INTO factsales SELECT * FROM factsales_tmp" View Meta Data in Atlas - Navigate to the Atlas Service - Click on Quicklinks --> Atlas Dashboard - user: admin password: admin - Create a new Tag called "secure" - Click on Search --> Flip the Switch to "DSL" --> Select "hive_table" and submit the search - When we created the sample Hive tables earlier, the Hive Hook updated Atlas with meta data representing the newly created data sets - Click on Factsales to see details including lineage and schema information for Factsales Hive table - Scroll down and click on the Schema tab - Click on the Plus sign next to the Storekey column to add tag and add the "secure" tag we created earlier - The storekey column of the factsales hive table is now tagged as "secure". We can now configure Ranger to secure access to the storekey field based on meta data in Atlas. Configure Ranger Security Policies - Navigate to the Ranger Service - Click on Quicklinks --> Ranger Admin UI - user: admin password: admin - Click on Access Manager --> Tag Based Polices -Click the Plus Sign to add a new Tag service -Click Add New Policy, name and add the new service - The new tag service will show up as a link. Click the link to enter the tag service configuration screen. - Click Add New Policy - Name the policy and enter "secure" in the TAG field. This tag refers to the tag we created in Atlas. Once the policy is configured, The Ranger Tag-Synch service will look far notification from Atlas that the "secure" tag was added to an entity. When it sees that notification, it will update Authorization as described by the Tag based policies. - Scroll down and click on the link to expand the Deny Condition section - Set the User field to User hive and the component Permission section to Hive - Click Add to finalize and create the policy. Now Atlas will notify Ranger whenever an entity is tagged as "secure" or the "secure" tag is removed. The "secure" tag policy permissions will apply to any entity tagged with the "secure" tag. - Click on Access Manager and select Resource Based Policies - Next to the {clustername}_hive service link, click the edit icon (looks like a pen on paper). Make sure to click the icon and not the link. - Select the Tag service we created earlier from the drop down and click save. This step is important as this is how Ranger will associate the tag notifications coming from Atlas the Hive security service. - You should find yourself at Resource Based Policies screen again. This tim click on {clustername}_hive service link, under the Hive section - Several default Hive security policies should be visible. - User hive is allowed access to all tables and all columns - The cluster is now secured with Resource and Tag based policies. Let's test out how these work together using Spark. Test Fine Grain Security with Spark - Connect to Spark-Thrift server using beeline as hive User and verify sample tables are visible beeline -u jdbc:hive2://{spark-thrift-server-host}:10015 -n hive
Connecting to jdbc:hive2://{spark-thrift-server-host}:10015
Connected to: Spark SQL (version 1.6.2)
Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive
0: jdbc:hive2://{spark-thrift-server-host}:10015> show tables;
+----------------+--------------+--+
| tableName | isTemporary |
+----------------+--------------+--+
| factsales | false |
| factsales_tmp | false |
+----------------+--------------+--+
2 rows selected (0.793 seconds)
- Get the Explain Plan for a simple query 0: jdbc:hive2://sparksecure01-195-1-0:10015> explain select storekey from factsales;
| == Physical Plan == |
| Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@44bfb65b,Map(table -> default.factsales, url -> jdbc:hive2://sparksecure01-195-1-0.field.hortonworks.com:10500))[storekey#66] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+
2 rows selected (1.744 seconds)
- The explain plan should show that the table will be scanned using the LlapRelation class. This confirms that Spark is using LLAP to read from HDFS. - Recall that the User hive should have complete access to all databases, tables, and columns per the Ranger resource based policy. - Attempt to select storekey from factsales as the User hive - Even though User hive should have full access to the factsales table, we were able to restrict access to the storekey column by designating it as "secure" using a tag in Atlas. - Attempt to select saleskey from factsales as the User hive. The saleskey column is not designated as secure via tag. - Access to the saleskey field is allowed since the User hive has acess and the field is not designated as secure. - Return to the Factsales page in Atlas and remove the "secure" tag from the storekey column. - Wait 30-60 seconds for the notification from Atlas to be picked up, processed, and propagated. - Attempt to select storekey from factsales as the User hive once again. - This time access is allowed since the secured tag has been removed from the storekey column of the factsales table in Atlas. - Back in the Ranger UI, Click on Audit to see all of the access attempts that have been recorded by Ranger. - Notice that the first access attempt was denied based on the tag [secure]. Ranger already provides extremely fine grain security for both Hive and Spark. However, in combination with Atlas, yet another level of security can be added. Tag based security for Spark provides additional flexibility in controlling access to datasets.
... View more
Labels:
01-19-2017
02:30 PM
This is a great article..I have a question around the ThriftServer. The article description says "SparkSQL, Ranger, and LLAP via Spark Thrift Server.." but the implementation uses HiveServer2? so can ranger work with Spark Thrift server? Is there a ranger plugin for Spark ThriftServer?
... View more