<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How Atlas is notified when there is a change in any data source e.g Hive? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/How-Atlas-is-notified-when-there-is-a-change-in-any-data/m-p/209623#M171577</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/17468/sababaig.html" nodeid="17468"&gt;@Saba Baig&lt;/A&gt; &lt;/P&gt;&lt;P&gt; &amp;gt;&amp;gt; How Hive notifies Atlas about any DML/DDL operation in Atlas against which Atlas generates lineage?&lt;/P&gt;&lt;P&gt;Whenever there is any metadata change events in Hive , HiveHook captures it and puts the details of created/updated hive entity to a kafka topic called ATLAS_HOOK. Atlas is the consumer of the ATLAS_HOOK. So Atlas gets the message from ATLAS_HOOK.&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; what is the information that Hive sends to Atlas?&lt;/P&gt;&lt;P&gt;Example :&lt;/P&gt;&lt;PRE&gt;hive &amp;gt; create table emp(id int,name string);&lt;/PRE&gt;&lt;P&gt;1.HiveHook composes a JSON message that contains information about table name , database , columns and other table properties and sends it to ATLAS_HOOK.&lt;/P&gt;&lt;P&gt;2. ATLAS_HOOK queues up the messages from HiveHook and Atlas consumes from it. Atlas consumes the JSON message about table emp and ingests it.&lt;/P&gt;&lt;PRE&gt;hive &amp;gt; create table t_emp as select * from emp;&lt;/PRE&gt;&lt;P&gt;1.HiveHook composes JSON message that contains t_emp details and also the source table name (emp) and sends to ATLAS_HOOK. &lt;/P&gt;&lt;P&gt;2.Atlas understands from the JSON message consumed from ATLAS_HOOK , that it is a CTAS table and it has a source table , ingests the table t_emp and constructs lineage for the tables emp and t_emp.&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt; is Hive DB going to notify Atlas Server on its own or HiveHook is going to check constantly in the Hive DB and pull the changes&lt;/P&gt;&lt;P&gt;HiveHook doesn't check hive constantly all time. Whenever there is any metadata event change ( like when user fires a hive query that involves creation/updation/drop ) , HiveHook notifies ATLAS_HOOK.&lt;/P&gt;&lt;P&gt;NOTE :&lt;/P&gt;&lt;P&gt;If you want to know more about the exact JSON content sent by HiveHook , you can create a table in hive and check the message that lands in  ATLAS_HOOK for that table.&lt;/P&gt;</description>
    <pubDate>Thu, 15 Jun 2017 15:41:34 GMT</pubDate>
    <dc:creator>ssainath</dc:creator>
    <dc:date>2017-06-15T15:41:34Z</dc:date>
  </channel>
</rss>

