Member since
06-05-2019
128
Posts
133
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1792 | 12-17-2016 08:30 PM | |
1334 | 08-08-2016 07:20 PM | |
2375 | 08-08-2016 03:13 PM | |
2475 | 08-04-2016 02:49 PM | |
2280 | 08-03-2016 06:29 PM |
05-31-2016
09:24 PM
Hi @vishal patil How much memory is allocated to your machine? I'm curious to see what kind of GC times you have in /var/log/kafka/kafkaServer-gc.log - what CMS (Concurrent Mark Sweep) are you seeing?
... View more
05-31-2016
06:30 PM
I was wondering - since Ambari manages one cluster - why does the Ambari API require the cluster name? for example: localhost:8080/api/v1/clusters/{clusterName}/hosts Is this because: 1) The future Ambari will manage more than one cluster at a time? 2) It follows an API standard where it is good practice to ask for the cluster? Is it 1 or 2 or both or something else? I'd think localhost:8080/api/v1/cluster/hosts/ makes more sense - because we are only dealing with one cluster at a time per Ambari instance and if the future Ambari manages more than one cluster, I'd do v2 of the api and have localhost:8080/api/v2/clusters/{clusterName}
... View more
Labels:
- Labels:
-
Apache Ambari
05-29-2016
03:22 PM
Hi @Manoj Dhake -> thanks for accepting my answer. 1) We recommend using Ambari "Hive Views" - when using Hive. Hive views will utilize HiveServer2, which will use the Hive Hook with Atlas - so you can view your data lineage in Atlas. 2) Atlas/Ranger will work beyond that demo (they should work properly for all cases). What policy are you trying to add? What does Ranger "Audit" tab show you as your denial error?
... View more
05-27-2016
06:39 PM
10 Kudos
Before completing this tutorial, it is important to understand data lineage.
What is Data Lineage
Data lineage is defined as a data life cycle that conveys data origin and where data moves over time. In Apache Hive, if I create a table (TableA) and then insert data (from another table TableB), the data lineage will display TableA as the target and Table B as the source/origin. These two tables are linked together by a process "insert into Table..", allowing a user to understand the data life cycle. In a Hadoop ecosystem, Apache Atlas contains the data lineage for various systems like Apache Hive, Apache Falcon and Apache Sqoop.
What is Apache Atlas
Apache Atlas is a centralized governance framework that supports the Hadoop ecosystem as a metastore repository. To add metadata to Atlas, libraries called ‘hooks’ are enabled in various systems which automatically capture metadata events in the respective systems and propagate those events to Atlas. (More on Atlas' Architecture).
Prerequisites
Download Atlas-Ranger preview VM here
Once the Atlas-Ranger VM is running, you can login through an SSH shell with user = root, password = hadoop
Atlas UI: http://localhost:21000 (use: Data Lineage), user = admin, password = admin
Ambari UI: http://localhost:8080 (use: Hive View), user = admin, password = admin
Step 1 - Login to Ambari and access Hive View
Step 2 - Create table brancha (database = default)
create table brancha(full_name string, ssn string, location string);
Step 3 - Create table branchb (database = default)
create table branchb(full_name string, ssn string, location string);
Step 4 - Insert data into both tables
insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago');
insert into brancha(full_name,ssn,location) values ('brad', '444-555-666', 'minneapolis');
insert into brancha(full_name,ssn,location) values ('rupert', '000-000-000', 'chicago');
insert into brancha(full_name,ssn,location) values ('john', '555-111-555', 'boston');
insert into branchb(full_name,ssn,location) values ('jane', '666-777-888', 'dallas');
insert into branchb(full_name,ssn,location) values ('andrew', '999-999-999', 'tampa');
insert into branchb(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago');
insert into branchb(full_name,ssn,location) values ('brad', '444-555-666', 'minneapolis');
(Using Atlas-Ranger preview - execute one insert statement at a time)
Step 5 - In a web browser, access Atlas UI at http://localhost:21000 and search for default.brancha
Step 6 - In the Atlas UI, select the hyperlink under the column name "default.brancha@abc"
Step 7 - In the Atlas UI, there should be no lineage for brancha
Step 8 - Create table branch_intersect (database = default) as a join of brancha and branchb where the ssn is equal
create table branch_intersect as select b1.full_name,b1.ssn,b1.location from brancha b1 inner join branchb b2 ON b1.ssn = b2.ssn;
Step 9 - In the Atlas UI, refresh the browser from Step 7
(orange = current table) You can see source brancha had a process of “create table br...” populating the target branch_intersect table
Step 10 - In the Atlas UI, search for default.branch_intersect
... View more
Labels:
05-27-2016
06:13 PM
Hi @Manoj Dhake Great question - You are correct, Hive CLI does not utilize HiveServer2, instead it goes directly to HCatalog for metadata info. Atlas has a hook that runs with HiveServer2, therefore Atlas will not receive any updates from Hive CLI. I would suggest disabling Hive CLI and running Beeline and/or Hive View (in Ambari) which both interact with HiveServer2.
... View more
05-25-2016
02:46 PM
If I want to configure multiple clusters in public cloud offerings (AWS, Azure, GCP) using Cloudbreak and my on premise cluster is using Kerberos / Active Directory - do I need to have my Kerberos / Active directory servers on premise and have the cloud communicate back and forth? Will the Kerberos/Active Directory credentials be cached in the cloud? If so, which components will hold the cache? Cloudbreak / Ambari?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
05-20-2016
09:13 PM
Hi All, In Hive, I created a new database "dbtest" and table "people". I noticed in Atlas, when i searched "hive_tabe", the new table I created did not show up. However, when I searched "hive_db", the new database "dbtest" was one of the results. When I created a new table in an existing database, searching "hive_table" returned the new table. Is there something that has to be configured in Atlas to access a new table within a new database? (I used beeline)
... View more
Labels:
- Labels:
-
Apache Atlas
-
Apache Hive
05-11-2016
04:43 PM
Is there any documentation on the storage layer of Cloudbreak docker containers - how the elasticity/bursting works as containers are added/removed from scaling?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Hortonworks Cloudbreak
05-11-2016
04:28 PM
2 Kudos
If I install Cloudbreak on my linux server - does Cloudbreak install Ambari on a Docker instance (if Ambari is not installed on the server)?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Hortonworks Cloudbreak