Member since
03-24-2016
184
Posts
239
Kudos Received
39
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3248 | 10-21-2017 08:24 PM | |
| 1964 | 09-24-2017 04:06 AM | |
| 6479 | 05-15-2017 08:44 PM | |
| 2206 | 01-25-2017 09:20 PM | |
| 7286 | 01-22-2017 11:51 PM |
06-17-2016
06:10 PM
@Manoj Dhake The tag is a marker that Ranger tag based policies use. Before tags, you would have to create a policy that specifically mentioned the fields, tables, ect that you wanted to secure. With tags, you create a marker on that field or table or other supported entities that mark those entities as targets for the policy. Atlas stores all of entities in manages as graph entities within a graph data base component called Titan. Titan requires a backend data store so you can choose tools like BerkleyDB, HBase, ect. Titan also needs an indexing engine to support search. You can choose from Solr or Elastic Search. Ranger has plugins that allow it to interact with many HDP components: HDFS Yarn Hive Hbase Storm Kafka Knox Using the Hive plugin, Ranger can intercept a request against a Hive table based on the schema, apply any relevant security policies, and when applicable, cause Hive to throw a security exception. Check out this link for more details: http://hortonworks.com/hadoop-tutorial/manage-security-policy-hive-hbase-knox-ranger/
... View more
06-17-2016
05:35 PM
@Manoj Dhake Would you mind asking that as a separate question? It will be easier for others to find. After you submit the question, reply to this thread @ me.
... View more
06-12-2016
10:58 PM
2 Kudos
@Manoj Dhake The following series of REST calls shows how to figure out the name of the entities classified as Tables. You can then call out to the Lineage resource URI for that table to get input/output graphs and schema. curl -X GET http://localhost:21000/api/atlas/types
{"results":["Fact","Process","storm_topology_reference","event","View","ETL","DB","Dimension","nifi_flow","Infrastructure","JdbcAccess","StorageDesc","Column","DataSet","PII","Table","Metric","LoadProcess"],"count":18,"requestId":"qtp1320338790-13 - dcd2b32a-ad75-43e0-89ee-10b87bdbd1fd"}
curl -X GET http://localhost:21000/api/atlas/entities?type=Table
{"requestId":"qtp1320338790-79 - 4a8eb1d7-a694-4936-819e-07808f8383e2","typeName":"Table","results":["cd71b47d-2616-4494-a3de-ddd04bc569e8","4efdeae5-2a27-4b24-88bf-cf25c0c156d5","582650eb-72ca-4994-a1ec-3c833be0120e","f4e4a676-e19f-4d3b-bc07-af3ce166ea4e","34ac76b3-b38a-4a40-a48b-57a3270177ab","729d43dc-b8be-4cb5-9f9a-9c22de205c54"]
curl -X GET http://localhost:21000/api/atlas/entities/cd71b47d-2616-4494-a3de-ddd04bc569e8
{"requestId":"qtp1320338790-13 - e5f16b73-80c9-4fe9-bf5a-aadba2f2910e","GUID":"cd71b47d-2616-4494-a3de-ddd04bc569e8","definition":"{\n \"jsonClass\":\"org.apache.atlas.typesystem.json.InstanceSerialization$_Reference\",\n \"id\":{\n \"jsonClass\":\"org.apache.atlas.typesystem.json.InstanceSerialization$_Id\",\n \"id\":\"cd71b47d-2616-4494-a3de-ddd04bc569e8\",\n \"version\":0,\n \"typeName\":\"Table\"\n },\n \"typeName\":\"Table\",\n \"values\":{\n \"tableType\":\"Managed\",\n \"name\":\"sales_fact_monthly_mv\"
curl -X GET http://localhost:21000/api/atlas/lineage/hive/table/sales_fact_monthly_mv/inputs/graph
{"requestId":"qtp1320338790-84 - 7fd236b2-501b-4155-b94b-6e56b38c72c8","tableName":"sales_fact_monthly_mv","results":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__tempQueryResultStruct11","values":{"vertices":{"4efdeae5-2a27-4b24-88bf-cf25c0c156d5":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__tempQueryResultStruct10","values":{"vertexId":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__IdType","values":{"guid":"4efdeae5-2a27-4b24-88bf-cf25c0c156d5","typeName":"Table"}},"name":"sales_fact"}},"f4e4a676-e19f-4d3b-bc07-af3ce166ea4e":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__tempQueryResultStruct10","values":{"vertexId":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__IdType","values":{"guid":"f4e4a676-e19f-4d3b-bc07-af3ce166ea4e","typeName":"Table"}},"name":"time_dim"}},"729d43dc-b8be-4cb5-9f9a-9c22de205c54":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__tempQueryResultStruct10","values":{"vertexId":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__IdType","values":{"guid":"729d43dc-b8be-4cb5-9f9a-9c22de205c54","typeName":"Table"}},"name":"sales_fact_daily_mv"}},"cd71b47d-2616-4494-a3de-ddd04bc569e8":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__tempQueryResultStruct10","values":{"vertexId":{"jsonClass":"org.apache.atlas.typesystem.json.InstanceSerialization$_Struct","typeName":"__IdType","values":{"guid":"cd71b47d-2616-4494-a3de-ddd04bc569e8","typeName":"Table"}},"name":"sales_fact_monthly_mv"}}},"edges":{"81419739-9952-4c3e-a7ab-b2f47ae7a362":["729d43dc-b8be-4cb5-9f9a-9c22de205c54"],"729d43dc-b8be-4cb5-9f9a-9c22de205c54":["43b242f7-39ed-45b4-8861-ba7c4c50ff0c"],"cd71b47d-2616-4494-a3de-ddd04bc569e8":["81419739-9952-4c3e-a7ab-b2f47ae7a362"],"43b242f7-39ed-45b4-8861-ba7c4c50ff0c":["4efdeae5-2a27-4b24-88bf-cf25c0c156d5","f4e4a676-e19f-4d3b-bc07-af3ce166ea4e"]}}}}
curl -X GET http://localhost:21000/api/atlas/lineage/hive/table/sales_fact_monthly_mv/schema
{"requestId":"qtp1320338790-79 - a316d9c1-638a-4d26-b297-0772cbdde1fd","tableName":"sales_fact_monthly_mv","results":{"query":"Table where (name = \"sales_fact_monthly_mv\") columns","dataType":{"superTypes":[],"hierarchicalMetaTypeName":"org.apache.atlas.typesystem.types.ClassType","typeName":"Column","attributeDefinitions":[{"name":"name","dataTypeName":"string","multiplicity":{"lower":0,"upper":1,"isUnique":false},"isComposite":false,"isUnique":false,"isIndexable":true,"reverseAttributeName":null},{"name":"dataType","dataTypeName":"string","multiplicity":{"lower":0,"upper":1,"isUnique":false},"isComposite":false,"isUnique":false,"isIndexable":true,"reverseAttributeName":null},{"name":"comment","dataTypeName":"string","multiplicity":{"lower":0,"upper":1,"isUnique":false},"isComposite":false,"isUnique":false,"isIndexable":true,"reverseAttributeName":null}]},"rows":[{"$typeName$":"Column","$id$":{"id":"77448e1e-ef27-4ee3-a35c-f0124b613dd8","$typeName$":"Column","version":0},"comment":"product id","name":"product_id","dataType":"int"},{"$typeName$":"Column","$id$":{"id":"3e647f08-a319-4ca2-8595-17b7734e6d15","$typeName$":"Column","version":0},"comment":"customer id","name":"customer_id","dataType":"int","$traits$":{"PII":{"$typeName$":"PII"}}},{"$typeName$":"Column","$id$":{"id":"eb23c523-d450-44d8-b87b-f9adf75512d8","$typeName$":"Column","version":0},"comment":"product id","name":"sales","dataType":"double","$traits$":{"Metric":{"$typeName$":"Metric"}}},{"$typeName$":"Column","$id$":{"id":"66f7eff9-73f1-4944-af16-03099b201b4d","$typeName$":"Column","version":0},"comment":"time id","name":"time_id","dataType":"int"}]}}
... View more
06-12-2016
10:13 PM
2 Kudos
@Timothy Spann An in-memory data grid is much more than just a cache. Some key capabilities are: Very granular control over the data being stored Technology agnostic serialization that enables access to cached data from several different tools (Java, C#, C++, ect) Loading of data on cache miss from any backing store Write-through/Write-Behind to any backing store Ability to off-load processing of instruction sets on individual cached entries or in map/reduce style batch Eventing framework providing notification of changes to individual entries or job execution Tiered caching (on-heap, off-heap, disk) HBase is an excellent NoSQL columnar data store but when it comes to dealing with data in memory, all it offers is an LRU caching and eviction scheme with no very little control over what data gets and stays cached. In fact the only control knob is how much memory is allocated for caching per region server. Given that HBase actually stores data with durability, it is often a great choice for access for OLTP use cases. In fact, In-memory data grids are rarely used without a backing store like HBase. However, for application acceleration, processing, and functionality offload, an In-memory data grid can provide capabilities that HBase alone cannot.
... View more
05-17-2016
03:02 PM
1 Kudo
@Smart Solutions Check out this article and demo for a full explanation and working example: https://community.hortonworks.com/articles/29928/using-spark-to-virtually-integrate-hadoop-with-ext.html https://community.hortonworks.com/repos/29883/sparksql-data-federation-demo.html
... View more
05-08-2016
02:01 AM
@Benjamin Leonhardi With the release of Yarn.Next, the containers will receive their own IP address and get registered in DNS. The FQDN will be available via a rest call to Yarn. If the current Yarn container die, the docker container will start in a different Yarn container somewhere in the cluster. As long as all clients are pointing at the FQDN of the application, the outage will be nearly transparent. In the mean time, there are several options using only slider but it requires some scripting or registration in Zookeeper. If you run: slider lookup --id application_1462448051179_0002
2016-05-08 01:55:51,676 [main] INFO impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2016-05-08 01:55:53,847 [main] WARN shortcircuit.DomainSocketFactory - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2016-05-08 01:55:53,868 [main] INFO client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050
{
"applicationId" : "application_1462448051179_0002",
"applicationAttemptId" : "appattempt_1462448051179_0002_000001",
"name" : "biologicsmanufacturingui",
"applicationType" : "org-apache-slider",
"user" : "root",
"queue" : "default",
"host" : "sandbox.hortonworks.com",
"rpcPort" : 1024,
"state" : "RUNNING",
"diagnostics" : "",
"url" : "http://sandbox.hortonworks.com:8088/proxy/application_1462448051179_0002/",
"startTime" : 1462454411514,
"finishTime" : 0,
"finalStatus" : "UNDEFINED",
"origTrackingUrl" : "http://sandbox.hortonworks.com:1025",
"progress" : 1.0
}
2016-05-08 01:55:54,542 [main] INFO util.ExitUtil - Exiting with status 0
You do get the host the container is currently bound to. Since the instructions bind the docker container to the host IP, this would allow URL discovery but as I said, not out of the box. This article is merely the harbinger to Yarn.Next as that will integrate the PaaS capabilities into Yarn itself, including application registration and discovery.
... View more
05-03-2016
05:51 AM
@azeltov So I did a little more digging and all you have to do is point the %hive interpreter at the hostname and port where the thriftServer for the target SQLContext was started. So in the case of that you have above you would set the Zeppelin Hive interpreter to point at sandbox.hortonworks.com:10002. Now when you issue a query from the %hive interpreter it will connect to the Hive context you created at runtime, exposing your temp tables and the perm tables in Hive meta store. The other option is to have both a hive and a sql context and just register each temp table on both contexts. That way you can both expose them with thrift and access with $sql context. However, I think the first option is the one you are looking for.
... View more
05-02-2016
11:19 PM
2 Kudos
@azeltov I took a closer look at this. I think the issue is that calling the registerTempTable method on a SQL Context registers it in the application's SQL context. The %hive interpreter in Zeppelin only sees the tables registered in the Hive meta store and not the temp tables registered in the Spark application's SQL context. Beeline shows both temp tables and permanent tables registered in the Hive meta store because the Spark thrift server aggregates table meta data from Hive and from the Spark application's SQL context. This is similar to how the Zeppelin %sql interpreter is only able to see temp tables registered on the default SQLContext created by the Zeppelin session when it starts Spark. As we saw, if you create a second SQL context in the same Zeppelin session, any temp tables registered on that second context are not visible to the Zeppelin %sql interpreter.
... View more
05-02-2016
04:46 PM
@sarfarazkhan pathan You will need the following in your Maven POM: <dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>VERSION</version>
</dependency>
Then create a stream as follows: Map<String, Integer> kafkaTopics = new HashMap<String, Integer>();
kafkaTopics.put("TopicName", 1);
SparkConf sparkConf = new SparkConf();
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(batchSize));
JavaPairReceiverInputDStream<String, String> kafkaStream =
KafkaUtils.createStream(jssc, Constants.zkConnString,"spark-streaming-consumer-group", kafkaTopics);
//kafkaStream.print();
JavaPairDStream<String, String> deviceStream = kafkaStream;
... View more
04-29-2016
01:03 AM
4 Kudos
Ambari is a great tool to manage the Hortonworks Data Platform. However, as the complexity of the applications that run on HDP grows and more and more components from the stack are required, it may be necessary to begin automating tasks like configuration changes. Using the Ambari REST interface it is possible to install, control, interrogate, and even change configuration of HDP services. This example demonstrates how to automate the install and configuration of Nifi. However, the same approach can be applied to any service. This example uses curl to make all REST API request. Make sure that the bits required to actually install the service in Ambari are available on the host running the Ambari server. In the case of Nifi, the bits should be present in /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI. 1. Check Service Status: curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/NIFI
Use regex of some JSON parsing library/tool to parse the response. In this particular case you want to get a status 404 as that means that the NIFI service does not yet exist. 2. Create Service: curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/NIFI A check of Ambari UI should confirm that the NIFI service is visible on the left hand pane. 3. Add Components to Service curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/NIFI/components/NIFI_MASTER
The service is merely the container that holds all of the process that comprise it. When a request to start the service is issued, it actually attempts to start each of the components defined for that service, which are the actual processes that provide functionality. 4. Configure the Components: This is the tricky part. One of the great values of Ambari is that it provides the ability for administrators to add and change configuration for HDP services from a single place, This means that when a service is being installed, the configurations and the files they are contained in must be defined as well. Each component of each service will have its own configurations files each file will have a unique set of properties and formats. This means that each configuration file that the Ambari wrapped service expects must be defined and applied to each component of the target service prior to install. The simplest way to obtain the required configuration files is to install the service via the Ambari UI and then use the Ambari Config utility located at: /var/lib/ambari-server/resources/scripts/configs.sh. This utility will use the REST API to pull the configurations that Ambari requires from and existing service. In the case of an already installed NIFI service it is possible to to get the required configuration files as follows: In Ambari UI, click on the NIFI service and then navigate to the configuration section Take note of each of the configuration sections listed. Use the Ambari Configs utility to export each of the configuration sections to file as follows: (not shown below... script takes a username and password parameter, this should be the Ambari admin user) /var/lib/ambari-server/resources/scripts/configs.sh get sandbox.hortonworks.com Sandbox nifi-ambari-config >> nifi-ambari-config.json
/var/lib/ambari-server/resources/scripts/configs.sh get sandbox.hortonworks.com Sandbox nifi-bootstrap-env >> nifi-bootstrap-env.json
/var/lib/ambari-server/resources/scripts/configs.sh get sandbox.hortonworks.com Sandbox nifi-flow-env /root/CreditCardTransactionMonitor/Nifi/config/nifi-flow-env.json
/var/lib/ambari-server/resources/scripts/configs.sh get sandbox.hortonworks.com Sandbox nifi-logback-env >> nifi-logback-env.json
/var/lib/ambari-server/resources/scripts/configs.sh get sandbox.hortonworks.com Sandbox nifi-properties-env >> nifi-properties-env.json Make sure to strip out the header that gets added to these files (########## Performing 'GET' on (Site:nifi-ambari-config, Tag:version1461337652473983733)). These configuration definitions can now be used to complete the automation the NIFI installation as follow: /var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox nifi-ambari-config >> nifi-ambari-config.json
/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox nifi-bootstrap-env >> nifi-bootstrap-env.json
/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox nifi-flow-env >> nifi-flow-env.json
/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox nifi-logback-env nifi-logback-env.json
/var/lib/ambari-server/resources/scripts/configs.sh set sandbox.hortonworks.com Sandbox nifi-properties-env nifi-properties-env.json 5. Add Role to Member Hosts Since HDP is a distributed platform, most services will be installed across multiple hosts. Each host may host different components or all of the same components. Which component of the service run on which service must be defined prior to install. In this case, we are assigning the NIFI-MASTER role to the same host where Ambari server is running. curl -u admin:admin -H "X-Requested-By:ambari" -i -X POST http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/hosts/sandbox.hortonworks.com/host_components/NIFI_MASTER 6. The NIFI service is now ready to be installed curl -u admin:admin -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context" :"Install Nifi"}, "Body": {"ServiceInfo": {"maintenance_state" : "OFF", "state": "INSTALLED"}}}' http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/NIFI This request will return a task id as one of the return parameters. This task may take a while and is asynchronous, thus it is necessary to get a handle on the task, periodically check to see if it has been completed before, and loop/sleep until the task is complete. Check for task status as follows: curl -u admin:admin -X GET http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/requests/$TASKID Once the task comes back as COMPLETE, the NIFI service has been installed and is ready to be started. 7. Start the Service curl -u admin:admin -H "X-Requested-By:ambari" -i -X PUT -d '{"RequestInfo": {"context" :"Start NIFI"}, "Body": {"ServiceInfo": {"maintenance_state" : "OFF", "state": "STARTED"}}}' http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/NIFI The NIFI service is now up and running and ready to build data flows. This same approach can be used to install, stop, start, and change configuration for any service in Ambari.
... View more
Labels: