About RyanCicak

RyanCicak · ‎05-31-2016

Hi @vishal patil How much memory is allocated to your machine? I'm curious to see what kind of GC times you have in /var/log/kafka/kafkaServer-gc.log - what CMS (Concurrent Mark Sweep) are you seeing?

RyanCicak · ‎05-31-2016

I was wondering - since Ambari manages one cluster - why does the Ambari API require the cluster name? for example: localhost:8080/api/v1/clusters/{clusterName}/hosts Is this because: 1) The future Ambari will manage more than one cluster at a time? 2) It follows an API standard where it is good practice to ask for the cluster? Is it 1 or 2 or both or something else? I'd think localhost:8080/api/v1/cluster/hosts/ makes more sense - because we are only dealing with one cluster at a time per Ambari instance and if the future Ambari manages more than one cluster, I'd do v2 of the api and have localhost:8080/api/v2/clusters/{clusterName}

RyanCicak · ‎05-29-2016

Hi @Manoj Dhake -> thanks for accepting my answer. 1) We recommend using Ambari "Hive Views" - when using Hive. Hive views will utilize HiveServer2, which will use the Hive Hook with Atlas - so you can view your data lineage in Atlas. 2) Atlas/Ranger will work beyond that demo (they should work properly for all cases). What policy are you trying to add? What does Ranger "Audit" tab show you as your denial error?

RyanCicak · ‎05-27-2016

Before completing this tutorial, it is important to understand data lineage. What is Data Lineage Data lineage is defined as a data life cycle that conveys data origin and where data moves over time. In Apache Hive, if I create a table (TableA) and then insert data (from another table TableB), the data lineage will display TableA as the target and Table B as the source/origin. These two tables are linked together by a process "insert into Table..", allowing a user to understand the data life cycle. In a Hadoop ecosystem, Apache Atlas contains the data lineage for various systems like Apache Hive, Apache Falcon and Apache Sqoop. What is Apache Atlas Apache Atlas is a centralized governance framework that supports the Hadoop ecosystem as a metastore repository. To add metadata to Atlas, libraries called ‘hooks’ are enabled in various systems which automatically capture metadata events in the respective systems and propagate those events to Atlas. (More on Atlas' Architecture). Prerequisites Download Atlas-Ranger preview VM here Once the Atlas-Ranger VM is running, you can login through an SSH shell with user = root, password = hadoop Atlas UI: http://localhost:21000 (use: Data Lineage), user = admin, password = admin Ambari UI: http://localhost:8080 (use: Hive View), user = admin, password = admin Step 1 - Login to Ambari and access Hive View Step 2 - Create table brancha (database = default) create table brancha(full_name string, ssn string, location string); Step 3 - Create table branchb (database = default) create table branchb(full_name string, ssn string, location string); Step 4 - Insert data into both tables insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago'); insert into brancha(full_name,ssn,location) values ('brad', '444-555-666', 'minneapolis'); insert into brancha(full_name,ssn,location) values ('rupert', '000-000-000', 'chicago'); insert into brancha(full_name,ssn,location) values ('john', '555-111-555', 'boston'); insert into branchb(full_name,ssn,location) values ('jane', '666-777-888', 'dallas'); insert into branchb(full_name,ssn,location) values ('andrew', '999-999-999', 'tampa'); insert into branchb(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago'); insert into branchb(full_name,ssn,location) values ('brad', '444-555-666', 'minneapolis'); (Using Atlas-Ranger preview - execute one insert statement at a time) Step 5 - In a web browser, access Atlas UI at http://localhost:21000 and search for default.brancha Step 6 - In the Atlas UI, select the hyperlink under the column name "default.brancha@abc" Step 7 - In the Atlas UI, there should be no lineage for brancha Step 8 - Create table branch_intersect (database = default) as a join of brancha and branchb where the ssn is equal create table branch_intersect as select b1.full_name,b1.ssn,b1.location from brancha b1 inner join branchb b2 ON b1.ssn = b2.ssn; Step 9 - In the Atlas UI, refresh the browser from Step 7 (orange = current table) You can see source brancha had a process of “create table br...” populating the target branch_intersect table Step 10 - In the Atlas UI, search for default.branch_intersect

RyanCicak · ‎05-27-2016

Hi @Manoj Dhake Great question - You are correct, Hive CLI does not utilize HiveServer2, instead it goes directly to HCatalog for metadata info. Atlas has a hook that runs with HiveServer2, therefore Atlas will not receive any updates from Hive CLI. I would suggest disabling Hive CLI and running Beeline and/or Hive View (in Ambari) which both interact with HiveServer2.

RyanCicak · ‎05-25-2016

If I want to configure multiple clusters in public cloud offerings (AWS, Azure, GCP) using Cloudbreak and my on premise cluster is using Kerberos / Active Directory - do I need to have my Kerberos / Active directory servers on premise and have the cloud communicate back and forth? Will the Kerberos/Active Directory credentials be cached in the cloud? If so, which components will hold the cache? Cloudbreak / Ambari?

RyanCicak · ‎05-20-2016

Hi All, In Hive, I created a new database "dbtest" and table "people". I noticed in Atlas, when i searched "hive_tabe", the new table I created did not show up. However, when I searched "hive_db", the new database "dbtest" was one of the results. When I created a new table in an existing database, searching "hive_table" returned the new table. Is there something that has to be configured in Atlas to access a new table within a new database? (I used beeline)

RyanCicak · ‎05-11-2016

Is there any documentation on the storage layer of Cloudbreak docker containers - how the elasticity/bursting works as containers are added/removed from scaling?

RyanCicak · ‎05-11-2016

If I install Cloudbreak on my linux server - does Cloudbreak install Ambari on a Docker instance (if Ambari is not installed on the server)?

RyanCicak · ‎05-03-2016

Are we to assume the Supervisor contains Zookeeper?

Online	Offline
Last Visited	‎10-30-2024 03:39 PM

Member Since	‎06-05-2019 07:09 AM
Last Visited	‎10-30-2024 03:39 PM
Posts	128
Kudos received	128

Cloudera Community

Re: HDP 2.5.3 spark-submit sparkSQL, not able to i...

Re: Apache PIG - Script per Table to data cleansin...

Re: Is it possible to convert a text file format t...

Re: Hadoop archive job unsuccessful

Re: GetKafka not getting messages in Apache Nifi

Re: After restarting services again stoped repated...

Ambari API Question

Re: In Atlas-Ranger sandbox machine,atlas not work...

Using Apache Atlas to view Data Lineage

Re: In Atlas-Ranger sandbox machine,atlas not work...

Kerberos / Active Directory in the cloud

Created a new Hive Database and Table, Atlas could...

Storage layer of Cloudbreak's docker containers - ...

Where does the Ambari instance live that powers Cl...

Re: Zookeeper in Metron