About vvaks

vvaks · ‎01-22-2017

@Sankaraiah Narayanasamy You can't list Hbase tables using Spark SQL because Hbase tables do not have a schema. Each row can have a different number of columns and each column is stored as a byte array not a specific data types. HiveContext will only allow you to list tables in Hive not Hbase. If you have Apache Phoenix installed over the top of Hbase, it is possible to see a list of tables, but not using HiveContext. If you are trying to see a list of Hive Tables that SparkSQL can access, then the command is "show tables" not "list". So your code should be. val listOfTables = hiveContext.sql("show tables") This will work assuming that you have Spark configured to point at the Hive Metastore.

vvaks · ‎12-29-2016

This tutorial is a follow on to the Apache Spark Fine Grain Security with LLAP Test Drive tutorial. These two articles cover the entire range of security authroization capabilities available for Spark on the Hortonworks Data Platform. Getting Started Install an HDP 2.5.3 Cluster via Ambari. Make sure the following components are installed: Hive Spark Spark Thrift Server Hbase Ambari Infra Atlas Ranger Enable LLAP Navigate to the Hive Configuration Page and click Enable Interactive Query. Ambari will ask what host group to put the Hiveserver2 service into. Select the Host Group with the most available resources. With Interactive Query enabled, Ambari will display new configurations options. These options provide control of resource allocation for the LLAP service. LLAP is a set of long lived daemons that facilitate interactive query response times and fine grain security for Spark. Since the goal of this tutorial is to test out fine grain security for Spark, LLAP only needs a minimal allocation of resources. However, if more resources are available, feel free to crank up the allocation and run some Hive queries against the Hive Interactive server to get a feel for how LLAP improves Hive's performance. Save configurations, confirm and proceed. Restart all required services. Navigate to Hive Summary tab and ensure that Hiveserver2 Interactive is started Download Spark-LLAP Assembly From the command line as root: wget -P /usr/hdp/current/spark-client/lib/ http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/1.0.0.2.5.3.0-37/spark-llap-1.0.0.2.5.3.0-37-assembly.jar Copy the assembly to the same location on each host where Spark may start an executor. If queues are not enabled, this likely means all hosts running a node manager service. Make sure all users have read permissions to that location and the assembly file Configure Spark for LLAP - In Ambari, navigate to the Spark service configuration tab: - Find Custom-spark-defaults, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Custom spark-thrift-sparkconf, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Advanced-spark-env - Set spark_thrift_cmd_opts attribute to --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - Save all configuration changes - Restart all components of Spark - Make sure Spark-Thrift server is started Enable Ranger for Hive - Navigate to Ranger Service Configs tab - Click on Ranger Plugin Tab - Click the switch labeled "Enable Ranger Hive Plugin" - Save Configs - Restart All Required Services Create Stage Sample Data in External Hive Table - From Command line cd /tmp wget https://www.dropbox.com/s/r70i8j1ujx4h7j8/data.zip unzip data.zip sudo -u hdfs hadoop fs -mkdir /tmp/FactSales sudo -u hdfs hadoop fs -chmod 777 /tmp/FactSales sudo -u hdfs hadoop fs -put /tmp/data/FactSales.csv /tmp/FactSales beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "CREATE TABLE factsales_tmp (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int,LoadDate timestamp, UpdateDate timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '/tmp/FactSales'" Move data into Hive Tables - From Command line beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "CREATE TABLE factsales (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int, LoadDate timestamp, UpdateDate timestamp) clustered by (saleskey) into 7 buckets stored as ORC" beeline -u jdbc:hive2://{hiveserver-host}:10000 -n hive -e "INSERT INTO factsales SELECT * FROM factsales_tmp" View Meta Data in Atlas - Navigate to the Atlas Service - Click on Quicklinks --> Atlas Dashboard - user: admin password: admin - Create a new Tag called "secure" - Click on Search --> Flip the Switch to "DSL" --> Select "hive_table" and submit the search - When we created the sample Hive tables earlier, the Hive Hook updated Atlas with meta data representing the newly created data sets - Click on Factsales to see details including lineage and schema information for Factsales Hive table - Scroll down and click on the Schema tab - Click on the Plus sign next to the Storekey column to add tag and add the "secure" tag we created earlier - The storekey column of the factsales hive table is now tagged as "secure". We can now configure Ranger to secure access to the storekey field based on meta data in Atlas. Configure Ranger Security Policies - Navigate to the Ranger Service - Click on Quicklinks --> Ranger Admin UI - user: admin password: admin - Click on Access Manager --> Tag Based Polices -Click the Plus Sign to add a new Tag service -Click Add New Policy, name and add the new service - The new tag service will show up as a link. Click the link to enter the tag service configuration screen. - Click Add New Policy - Name the policy and enter "secure" in the TAG field. This tag refers to the tag we created in Atlas. Once the policy is configured, The Ranger Tag-Synch service will look far notification from Atlas that the "secure" tag was added to an entity. When it sees that notification, it will update Authorization as described by the Tag based policies. - Scroll down and click on the link to expand the Deny Condition section - Set the User field to User hive and the component Permission section to Hive - Click Add to finalize and create the policy. Now Atlas will notify Ranger whenever an entity is tagged as "secure" or the "secure" tag is removed. The "secure" tag policy permissions will apply to any entity tagged with the "secure" tag. - Click on Access Manager and select Resource Based Policies - Next to the {clustername}_hive service link, click the edit icon (looks like a pen on paper). Make sure to click the icon and not the link. - Select the Tag service we created earlier from the drop down and click save. This step is important as this is how Ranger will associate the tag notifications coming from Atlas the Hive security service. - You should find yourself at Resource Based Policies screen again. This tim click on {clustername}_hive service link, under the Hive section - Several default Hive security policies should be visible. - User hive is allowed access to all tables and all columns - The cluster is now secured with Resource and Tag based policies. Let's test out how these work together using Spark. Test Fine Grain Security with Spark - Connect to Spark-Thrift server using beeline as hive User and verify sample tables are visible beeline -u jdbc:hive2://{spark-thrift-server-host}:10015 -n hive Connecting to jdbc:hive2://{spark-thrift-server-host}:10015 Connected to: Spark SQL (version 1.6.2) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive 0: jdbc:hive2://{spark-thrift-server-host}:10015> show tables; +----------------+--------------+--+ | tableName | isTemporary | +----------------+--------------+--+ | factsales | false | | factsales_tmp | false | +----------------+--------------+--+ 2 rows selected (0.793 seconds) - Get the Explain Plan for a simple query 0: jdbc:hive2://sparksecure01-195-1-0:10015> explain select storekey from factsales; | == Physical Plan == | | Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@44bfb65b,Map(table -> default.factsales, url -> jdbc:hive2://sparksecure01-195-1-0.field.hortonworks.com:10500))[storekey#66] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ 2 rows selected (1.744 seconds) - The explain plan should show that the table will be scanned using the LlapRelation class. This confirms that Spark is using LLAP to read from HDFS. - Recall that the User hive should have complete access to all databases, tables, and columns per the Ranger resource based policy. - Attempt to select storekey from factsales as the User hive - Even though User hive should have full access to the factsales table, we were able to restrict access to the storekey column by designating it as "secure" using a tag in Atlas. - Attempt to select saleskey from factsales as the User hive. The saleskey column is not designated as secure via tag. - Access to the saleskey field is allowed since the User hive has acess and the field is not designated as secure. - Return to the Factsales page in Atlas and remove the "secure" tag from the storekey column. - Wait 30-60 seconds for the notification from Atlas to be picked up, processed, and propagated. - Attempt to select storekey from factsales as the User hive once again. - This time access is allowed since the secured tag has been removed from the storekey column of the factsales table in Atlas. - Back in the Ranger UI, Click on Audit to see all of the access attempts that have been recorded by Ranger. - Notice that the first access attempt was denied based on the tag [secure]. Ranger already provides extremely fine grain security for both Hive and Spark. However, in combination with Atlas, yet another level of security can be added. Tag based security for Spark provides additional flexibility in controlling access to datasets.

vvaks · ‎12-18-2016

This tutorial is a companion to this article: https://community.hortonworks.com/articles/72182/bring-spark-to-the-business-with-fine-grain-securi.html The article outlines the use cases and potential benefits to the business that Spark fine grain security with LLAP may yield. This article also has a second part that covers how to apply Tag based security for Spark using Ranger and Atlas in combination. Tag Based (Meta Data) Security for Apache Spark with LLAP, Atlas, and Ranger Getting Started Install an HDP 2.5.3 Cluster via Ambari. Make sure the following components are installed: Hive Spark Spark Thrift Server Ambari Infra Ranger Enable LLAP Navigate to the Hive Configuration Page and click Enable Interactive Query. Ambari will ask what host group to put the Hiveserver2 service into. Select the Host Group with the most available resources. With Interactive Query enabled, Ambari will display new configurations options. These options provide control of resource allocation for the LLAP service. LLAP is a set of long lived daemons that facilitate interactive query response times and fine grain security for Spark. Since the goal of this tutorial is to test out fine grain security for Spark, LLAP only needs a minimal allocation of resources. However, if more resources are available, feel free to crank up the allocation and run some Hive queries against the Hive Interactive server to get a feel for how LLAP improves Hive's performance. Save configurations, confirm and proceed. Restart all required services. Navigate to Hive Summary tab and ensure that Hiveserver2 Interactive is started Download Spark-LLAP Assembly From the command line as root: wget -P /usr/hdp/current/spark-client/lib/ http://repo.hortonworks.com/content/repositories/releases/com/hortonworks/spark-llap/1.0.0.2.5.3.0-37/spark-llap-1.0.0.2.5.3.0-37-assembly.jar Copy the assembly to the same location on each host where Spark may start an executor. If queues are not enabled, this likely means all hosts running a node manager service. Make sure all users have read permissions to that location and the assembly file Configure Spark for LLAP - In Ambari, navigate to the Spark service configuration tab: - Find Custom-spark-defaults, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Custom spark-thrift-sparkconf, - click add property and add the following properties: - spark.sql.hive.hiveserver2.url=jdbc:hive2://{hiveserver-interactive-hostname}:10500 - spark.jars=/usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - spark.hadoop.hive.zookeeper.quorum={some-or-all-zookeeper-hostnames}:2181 - spark.hadoop.hive.llap.daemon.service.hosts=@llap0 - Find Advanced-spark-env - Set spark_thrift_cmd_opts attribute to --jars /usr/hdp/current/spark-client/lib/spark-llap-1.0.0.2.5.3.0-37-assembly.jar - Save all configuration changes - Restart all components of Spark - Make sure Spark-Thrift server is started Enable Ranger for Hive - Navigate to Ranger Service Configs tab - Click on Ranger Plugin Tab - Click the switch labeled "Enable Ranger Hive Plugin" - Save Configs - Restart All Required Services Create Stage Sample Data in External Hive Table - From Command line cd /tmp wget https://www.dropbox.com/s/r70i8j1ujx4h7j8/data.zip unzip data.zip sudo -u hdfs hadoop fs -mkdir /tmp/FactSales sudo -u hdfs hadoop fs -chmod 777 /tmp/FactSales sudo -u hdfs hadoop fs -put /tmp/data/FactSales.csv /tmp/FactSales beeline -u jdbc:hive2://sparksecure01-195-3-2:10000 -n hive -e "CREATE TABLE factsales_tmp (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int,LoadDate timestamp, UpdateDate timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED ASTEXTFILE LOCATION '/tmp/FactSales'" Move data into Hive Tables - From Command line beeline -u jdbc:hive2://sparksecure01-195-3-2:10000 -n hive -e "CREATE TABLE factsales (SalesKey int ,DateKey timestamp, channelKey int, StoreKey int, ProductKey int, PromotionKey int, CurrencyKey int, UnitCost float, UnitPrice float, SalesQuantity int, ReturnQuantity int, ReturnAmount float, DiscountQuantity int, DiscountAmount float, TotalCost float, SalesAmount float, ETLLoadID int, LoadDate timestamp, UpdateDate timestamp) clustered by (saleskey) into 7 buckets stored as ORC" beeline -u jdbc:hive2://sparksecure01-195-3-2:10000 -n hive -e "INSERT INTO factsales SELECT * FROM factsales_tmp" Configure Ranger Security Policies - Navigate to the Ranger Service - Click on Quicklinks --> Ranger Admin UI - user: admin password: admin - Click on {clustername}_hive service link, under the Hive section - Several Hive security policies should be visible - Add a new Column level policy for User spark as show in the screenshot below. Make sure that the storekey column is excluded from access. - User hive is allowed access to all tables and all columns, user spark is restricted from accessing the storekey column in the factsales table - Click on the Masking Tab - Add a new Masking policy for the User spark to redact the salesamount column - Click on the Row Level Filter tab and add a new Row Level Filter policy for User spark to only show productkey < 100 Test Fine Grain Security with Spark - Connect to Spark-Thrift server using beeline as hive User and verify sample tables are visible beeline -u jdbc:hive2://sparksecure01-195-1-0:10015 -n hive Connecting to jdbc:hive2://sparksecure01-195-1-0:10015 Connected to: Spark SQL (version 1.6.2) Driver: Hive JDBC (version 1.2.1000.2.5.3.0-37) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.5.3.0-37 by Apache Hive 0: jdbc:hive2://sparksecure01-195-1-0:10015> show tables; +----------------+--------------+--+ | tableName | isTemporary | +----------------+--------------+--+ | factsales | false | | factsales_tmp | false | +----------------+--------------+--+ 2 rows selected (0.793 seconds) - Get the Explain Plan for a simple query 0: jdbc:hive2://sparksecure01-195-1-0:10015> explain select storekey from factsales; | == Physical Plan == | | Scan LlapRelation(org.apache.spark.sql.hive.llap.LlapContext@44bfb65b,Map(table -> default.factsales, url -> jdbc:hive2://sparksecure01-195-1-0.field.hortonworks.com:10500))[storekey#66] | +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--+ 2 rows selected (1.744 seconds) - The explain plan should show that the table will be scanned using the LlapRelation class. This confirms that Spark is using LLAP to read from HDFS. - Verify that hive User is able to see the storekey, unredacted salesamount, and unfiltered productkey columns in the factsales table, as specified by the policy Hit Ctrl-C to exit beeline - Connect to Spark-Thrift server using beeline as User spark and run the exact same query as the User hive just ran. An exception will be thrown by the authorization plugin because User spark is not allowed to see results of any query that includes the storekey column. - Try the same query but omit storekey column from the request. The response will show a filtered productkey column and a redacted salesamount column. View Audit Trail - Navigate back to the Ranger Admin UI - Navigate to Audit (Link at the top of the screen) Ranger Audit registers both Allowed and Denied access events Now access to data through Spark Thrift server is secured by the same granular security policies as Hive. Ranger provides the centralized policies, LLAP ensures they are enforced. Now BI tools can be pointed at Spark or Hive interchangeably.

vvaks · ‎12-16-2016

Apache Spark has ignited an explosion of data exploration on very large data sets. Spark played a big role in making general purpose distributed compute accessible. Anyone with some level of skill in Python, Scala, Java, and now R, can just sit down and start exploring data at scale. It also democratized Data Science by offering ML as a series of black boxes. Training of Artificial Intelligence is now possible for those of us who do not have PHDs in Statistics and Mathematics. Now Spark SQL is also helping to bring data exploration to the business unit directly. In partnership with Apache Hive, Spark has been enabling users to explore very large data sets using SQL expression. However, in order to truly make Spark SQL available for ad-hoc access by business analysts using BI tools, fine grain security and governance are necessary. Spark provides strong authentication via Kerberos and wire encryption via SSL. However, up to this point, Authorization was only possible via HDFS ACLs. This approach works relatively well when Spark is used as a general purpose compute framework. That is, using Java/Scala/Python to express logic that cannot be encapsulated in a SQL statement. However, when structured schema with columns and rows is applied, fine grain security becomes a challenge. Data in the same table may belong to two different groups, each with their own regulatory requirements. Data may have regional restrictions, time based availability restrictions, departmental restrictions, ect. Currently, Spark does not have a built in authorization sub-system. It tries to read the data set as instructed and either succeeds or fails based on file system permissions. There is no way to define a pluggable module that contains an instructions set for fine grain authorization. This means that authorization policy enforcement must be performed somewhere outside of Spark. In other words, some other system has to tell Spark that it is not allowed to read the data because it contains a restricted column. At this point there are two likely solutions. The first is to create and authorization subsystem within Spark itself. The second is to configure Spark to read the file system through a daemon that is external to Spark. The second option is particularly attractive because it can provide benefits far beyond just security. Thus, the community created LLAP (Live Long and Process). LLAP is a collection of long lived daemons that works in tandem with the HDFS Data Node service. LLAP is optional and modular so it can be turned on or off. At the moment, Apache Hive has the most built in integration with LLAP. However, the intent of LLAP is to generally provide benefits to applications running in Yarn. When enabled, LLAP provides numerous performance benefits: - Processing Offload - IO Optimization - Caching Since the focus of this articles security for Spark, refer to the LLAP Apache wiki for more details on LLAP. https://cwiki.apache.org/confluence/display/Hive/LLAP With LLAP enabled, Spark reads from HDFS go directly through LLAP. Besides conferring all of the aforementioned benefits on Spark, LLAP is also a natural place to enforce fine grain security policies. The only other capability required is a centralized authorization system. This need is met by Apache Ranger. Apache Ranger provides centralized authorization and audit services for many components that run on Yarn or rely on data from HDFS. Ranger allows authoring of security policies for: - HDFS - Yarn - Hive (Spark with LLAP) - HBase - Kafka - Storm - Solr - Atlas - Knox Each of the above services integrate with Ranger via a plugin that pulls the latest security policies, caches them, and then applies them at run time. Now that we have defined how fine grain authorization and audit can be applied to Spark, let's review the overall architecture. Spark receives the query statement and communicates with Hive to obtain the relevant schemas and query plan. The Ranger Hive plugin checks the cached security policy and tells Spark what columns is allowed to access. Spark does not have it's own authorization system so it begins attempting to read from the filesystem through LLAP LLAP reviews the read request and realizes that it contains columns that the user making the request does not have permission to read. LLAP instantly stops processing the request and throws an Authorization exception to Spark. Notice that there was no need to create any type of view abstraction over the data. The only action required for fine grain security enforcement is to configure a security policy in Ranger and enable LLAP. Ranger also provides column masking and row filtering capabilities. Masking policy is similar to a column level policy. The main difference is that all columns are returned but the restricted columns contain only asterisks or a hash of the original value. Ranger also provides that ability to apply Row level security. Using a Row level security policy, users can be prevented from seeing some of the rows in a table but still display all rows not restricted by policy. Consider a scenario where Financial Managers should only be able to see clients assigned to them. Row level policy from Ranger would instruct Hive to return a query plan that includes a predicate. That predicate filters out all customers not assigned to the Financial Manager trying access the data. Spark receives the modified query plan and initiates processing, reading data through LLAP. LLAP ensures that the predicate is applied and that the restricted rows are not returned. With such an array of fine grain security capabilities, Spark can now be exposed directly to BI tools via a Thrift Server. Business Analyst can now wield the power of Apache Spark. In general, LLAP integration has the potential to greatly enhance Spark from both a performance and security perspective. Fine grain security will help to bring the benefits of Spark to the business. Such a development should help to fuel more investment, collection, and exploration of data. If you would like to test out this capability for yourself, check out the following tutorial: https://community.hortonworks.com/content/kbentry/72454/apache-spark-fine-grain-security-with-llap-test-dr.html

vvaks · ‎11-11-2016

@Sunile Manjee Atlas creates two topics Atlas_Entities and Atlas_Hook. When a Hook fires it will send all of the meta data passed to it into Atlas entities and send them to Atlas via the Atlas_Hook topic. When Atlas successfully creates the new entities it received from aHook, it will publish the resulting entities to the Atlas_Entities topic. You can watch either topic to know that an entity or set of entities are being created or a request to create them has been sent. You can also go back and read the entire topic from the first available offset to see what entities or sets of entities have been created over that period. You can then calculate lineage using the same graph processing techniques used by Titan (the Graph API used by Atlas). However, there is no actual lineage information actually on the topic, just the JSON that describes the entities being created and references to other entities. This is because Kafka is nothing more than a message bus, it buffers messages for asynchronous read. It cannot do Graph calculation and even if it could, it only retains data for a limited period of time. Thus Atlas uses Titan to calculate lineage based on data stored in Hbase.

vvaks · ‎11-01-2016

@Anil Reddy Glad to hear that the install worked. Would you mind accepting this answer as correct? Without investigation, I would guess that Hbase and Metrics are going down due to memory pressure. Running the whole application inside the a VM puts a lot of pressure on your machine. Try stoping Metrics, Falcon, Flume services to save on memory.

vvaks · ‎10-29-2016

@Anil Reddy Looks like HDF 2.0 breaks HDP Ambari. In hindsight this makes sense since there was a lot of work done in HDF 2.0 to make it work well with it's own Ambari. The wrapper changed significantly. I put in some backward compatibility since Sandbox 2.4 already has a copy of HDF 1.2 available to install from Ambari. Give it another shot, it should work end to end now. One last thing, Storm 0.10.0 takes a while to fully load the topology. I have seen it take up to 5 minutes. Storm 1.0.1 is much more efficient and loads the same topology in a few seconds. Make sure that the Storm log shows that all bolts and spouts have fully loaded before you attempt to load the UI. This is because Storm is what initializes Hbase with the data required for the UI to start. You may see a stack trace related to Atlas, just ignore that. Let me know if that work

vvaks · ‎10-28-2016

@Anil Reddy Ok, the issue is that installer did not get the latest Nifi. I hate to ask... could you create a fresh sandbox and run the install.sh script grabbing the output as before. It should be grabbing the latest Nifi bits and it looks like it did not. API changed between Nifi 1.2 and Nifi 2.0 and that is why the flow did not instantiate. Try a fesh install, capture the logs like before and respond to this comment.

vvaks · ‎10-28-2016

@Anil Reddy You are actually very close. The UI is loading successfully. What you are seeing is an empty Inbox. Picture Outlook if there were no emails received. You will see Credit Card transactions coming through after you start the simulator. You are correct that the reason the simulator cannot send messages is that there is nothing listening on 8082. Apache Nifi is supposed to have an instantiated flow that listens on 8082 for HTTP Post as JSON. That is how the simulator generates transactions for the application to process. Looking at the install log, everything looks good except for when the process tried to instantiate the Nifi Flow. Do the following: Log into Nifi (http://sandbox.hortonworks.com:9090/nifi), Capture a screen shot and respond to this answer with a comment that includes that screenshot.

vvaks · ‎10-21-2016

@Attila Kanto The blueprint is attached. industrydemoblueprint-v11.txt Screenshot of ps aux | grep salt-api and netstat to show the one listening on 3080. (during Blueprint deployment) Screenshot of steps taken post completion of HDP Cluster provisioning vi Bluepring The Cloudbreak log continues to show the following: cloudbreak_1 | 2016-10-21 13:19:07,602 [containerBootstrapBuilderExecutor-19] call:68 WARN c.s.c.o.OrchestratorBootstrapRunner - [owner:46c1f915-0f53-4567-b233-de54ed271274] [type:STACK] [id:18] [name:credit-fraud-demo] Orchestrator component SaltJobIdTracker failed to start, retrying [43/90] Elapsed time: 773 ms; Reason: Job: JobId{jobId='20161021131136120098'} is running currently, waiting for next polling attempt, additional info: SaltJobIdTracker{saltJobRunner=HighStateChecker{BaseSaltJobRunner{target=[10.0.189.38], jid=JobId{jobId='20161021131136120098'}, jobState=IN_PROGRESS}}} salt master log after moving /var/log/salt/master to /var/log/salt/master.bak and capturing the new /var/log/salt/master master.txt Cloudbreak log after 90 attempts and failures cloudbreak_1 | com.sequenceiq.cloudbreak.core.CloudbreakException: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: java.util.concurrent.ExecutionException: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: Job: JobId{jobId='20161021131136120098'} is running currently, waiting for next polling attempt cloudbreak_1 | at com.sequenceiq.cloudbreak.service.cluster.flow.OrchestratorRecipeExecutor.postInstall(OrchestratorRecipeExecutor.java:80) cloudbreak_1 | at com.sequenceiq.cloudbreak.service.cluster.flow.RecipeEngine.executePostInstall(RecipeEngine.java:94) cloudbreak_1 | at com.sequenceiq.cloudbreak.service.cluster.flow.AmbariClusterConnector.buildAmbariCluster(AmbariClusterConnector.java:223) cloudbreak_1 | at com.sequenceiq.cloudbreak.core.cluster.AmbariClusterCreationService.buildAmbariCluster(AmbariClusterCreationService.java:27) cloudbreak_1 | at com.sequenceiq.cloudbreak.reactor.handler.cluster.InstallClusterHandler.accept(InstallClusterHandler.java:35) cloudbreak_1 | at com.sequenceiq.cloudbreak.reactor.handler.cluster.InstallClusterHandler.accept(InstallClusterHandler.java:18) cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:317) cloudbreak_1 | at reactor.bus.EventBus$3.accept(EventBus.java:310) cloudbreak_1 | at reactor.bus.routing.ConsumerFilteringRouter.route(ConsumerFilteringRouter.java:72) cloudbreak_1 | at reactor.bus.routing.TraceableDelegatingRouter.route(TraceableDelegatingRouter.java:51) cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:591) cloudbreak_1 | at reactor.bus.EventBus.accept(EventBus.java:63) cloudbreak_1 | at reactor.core.dispatch.AbstractLifecycleDispatcher.route(AbstractLifecycleDispatcher.java:160) cloudbreak_1 | at reactor.core.dispatch.MultiThreadDispatcher$MultiThreadTask.run(MultiThreadDispatcher.java:74) cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) cloudbreak_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) cloudbreak_1 | at java.lang.Thread.run(Thread.java:745) cloudbreak_1 | Caused by: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: java.util.concurrent.ExecutionException: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: Job: JobId{jobId='20161021131136120098'} is running currently, waiting for next polling attempt cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.salt.SaltOrchestrator.executeRecipes(SaltOrchestrator.java:301) cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.salt.SaltOrchestrator.postInstallRecipes(SaltOrchestrator.java:233) cloudbreak_1 | at com.sequenceiq.cloudbreak.service.cluster.flow.OrchestratorRecipeExecutor.postInstall(OrchestratorRecipeExecutor.java:77) cloudbreak_1 | ... 16 common frames omitted cloudbreak_1 | Caused by: java.util.concurrent.ExecutionException: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: Job: JobId{jobId='20161021131136120098'} is running currently, waiting for next polling attempt cloudbreak_1 | at java.util.concurrent.FutureTask.report(FutureTask.java:122) cloudbreak_1 | at java.util.concurrent.FutureTask.get(FutureTask.java:192) cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.salt.SaltOrchestrator.runNewService(SaltOrchestrator.java:274) cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.salt.SaltOrchestrator.executeRecipes(SaltOrchestrator.java:294) cloudbreak_1 | ... 18 common frames omitted cloudbreak_1 | Caused by: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: Job: JobId{jobId='20161021131136120098'} is running currently, waiting for next polling attempt cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.salt.poller.SaltJobIdTracker.call(SaltJobIdTracker.java:54) cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.OrchestratorBootstrapRunner.call(OrchestratorBootstrapRunner.java:60) cloudbreak_1 | at com.sequenceiq.cloudbreak.orchestrator.OrchestratorBootstrapRunner.call(OrchestratorBootstrapRunner.java:14) cloudbreak_1 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) cloudbreak_1 | ... 3 common frames omitted cloudbreak_1 | 2016-10-21 13:27:36,531 [reactorDispatcher-98] accept:70 DEBUG c.s.c.c.f.F As before, the Ambari cluster is up and fully usable but shows as failed in Cloudbreak console

Online	Offline
Last Visited	‎05-08-2018 09:31 PM

Member Since	‎03-24-2016 01:35 PM
Last Visited	‎05-08-2018 09:31 PM
Posts	184
Kudos received	165

Cloudera Community

Re: Why doesn't Atlas draw lineage?

Re: Unable to add phoenix application using Cloudb...

Re: Running a Spark Job with NiFi using Execute Pr...

Re: CreditCardTransactionMonitor Demo - Transactio...

Re: List hbase tables Spark sql

Re: List hbase tables Spark sql

Tag Based (Meta Data) Security for Apache Spark wi...

Apache Spark Fine Grain Security with LLAP Test Dr...

Bring Spark to the Business with Fine Grain Securi...

Re: Atlas export linage via Kafka?

Re: Credit card monitor app: UI coming as empty af...

Re: Credit card monitor app: UI coming as empty af...

Re: Credit card monitor app: UI coming as empty af...

Re: Credit card monitor app: UI coming as empty af...

Re: Cloudbreak failing to create cluster in AWS: S...