<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Unable to upload related entities  separately in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-related-entities-separately/m-p/362322#M238754</link>
    <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I am working with Apache Atlas using PyApacheAtles to work on Azure Purview. I already created custom types (table and column) and their relationship definition. In my system there are about 30k entities to upload, so when I try to push all of them in one batch I receive timeout.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to apply the logic of upload from Atlas Jira &amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/ATLAS-4389" target="_self"&gt;https://issues.apache.org/jira/browse/ATLAS-4389&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Firstly upload all parents (tables in my case), then columns (related with tables). After successful upload of tables batch, I received an error, when columns batch upload started&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;"errorCode":"ATLAS-404-00-00A","errorMessage":"Referenced entity -1001 is not found"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;-1001 is a guid of the table, which already is uploaded. I noticed that in case of upload table and column in one batch everything works fine.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It looks like Atlas checks if relationship exists in uploaded batch, not between batch and already uploaded entities.&amp;nbsp; Is there any way to upload related entities in separate batches or should them be uploaded only in one batch? Do you have another strategy to avoid timeouts during bulk upload?&lt;/P&gt;</description>
    <pubDate>Thu, 26 Jan 2023 08:40:24 GMT</pubDate>
    <dc:creator>nowy19</dc:creator>
    <dc:date>2023-01-26T08:40:24Z</dc:date>
    <item>
      <title>Unable to upload related entities  separately</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-related-entities-separately/m-p/362322#M238754</link>
      <description>&lt;P&gt;Hi Team,&lt;/P&gt;&lt;P&gt;I am working with Apache Atlas using PyApacheAtles to work on Azure Purview. I already created custom types (table and column) and their relationship definition. In my system there are about 30k entities to upload, so when I try to push all of them in one batch I receive timeout.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried to apply the logic of upload from Atlas Jira &amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/ATLAS-4389" target="_self"&gt;https://issues.apache.org/jira/browse/ATLAS-4389&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Firstly upload all parents (tables in my case), then columns (related with tables). After successful upload of tables batch, I received an error, when columns batch upload started&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;"errorCode":"ATLAS-404-00-00A","errorMessage":"Referenced entity -1001 is not found"&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;-1001 is a guid of the table, which already is uploaded. I noticed that in case of upload table and column in one batch everything works fine.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It looks like Atlas checks if relationship exists in uploaded batch, not between batch and already uploaded entities.&amp;nbsp; Is there any way to upload related entities in separate batches or should them be uploaded only in one batch? Do you have another strategy to avoid timeouts during bulk upload?&lt;/P&gt;</description>
      <pubDate>Thu, 26 Jan 2023 08:40:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-related-entities-separately/m-p/362322#M238754</guid>
      <dc:creator>nowy19</dc:creator>
      <dc:date>2023-01-26T08:40:24Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to upload related entities  separately</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-related-entities-separately/m-p/404886#M252384</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/103123"&gt;@nowy19&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Thanks for the posting your query and here is my detailed ans of your query&lt;/P&gt;&lt;DIV class="mx-auto flex flex-1 text-base gap-4 md:gap-5 lg:gap-6 md:max-w-3xl group/turn-messages focus-visible:outline-none"&gt;&lt;DIV class="group/conversation-turn relative flex w-full min-w-0 flex-col agent-turn @xs/thread:px-0 @sm/thread:px-1.5 @md/thread:px-4"&gt;&lt;DIV class="relative flex-col gap-1 md:gap-3"&gt;&lt;DIV class="flex max-w-full flex-col flex-grow"&gt;&lt;DIV class="min-h-8 text-message relative flex w-full flex-col items-end gap-2 whitespace-normal break-words text-start [.text-message+&amp;amp;]:mt-5"&gt;&lt;DIV class="flex w-full flex-col gap-1 empty:hidden first:pt-[3px]"&gt;&lt;DIV class="markdown prose w-full break-words dark:prose-invert light"&gt;&lt;P class=""&gt;It seems that Atlas checks if the relationship exists within the uploaded batch, rather than between the batch and already uploaded entities. There are a couple of approaches you could consider to avoid timeouts during bulk uploads:&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Uploading Related Entities in Separate Batches&lt;/STRONG&gt;: It’s possible to upload related entities in separate batches. However, you need to ensure that dependencies are respected between batches. If relationships between entities need to be established, you may have to upload them in an order that ensures the relationships can be checked and linked after the entities are uploaded.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Batch Size Management&lt;/STRONG&gt;: If timeouts are an issue, you might want to consider reducing the batch size for uploads. Smaller batches can reduce the load on the system and help avoid timeouts. This might involve splitting larger datasets into smaller, more manageable chunks.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Optimize Atlas Configuration&lt;/STRONG&gt;: Adjusting some configurations in Atlas, such as increasing the batch size limit or optimizing the database (e.g., using indexing) might help to handle larger uploads efficiently.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Asynchronous Upload Strategy&lt;/STRONG&gt;: If possible, you can consider uploading entities asynchronously to prevent long-running operations that can lead to timeouts. This allows the system to handle multiple requests in parallel without overwhelming it.&lt;/P&gt;&lt;P class=""&gt;&lt;STRONG&gt;Increase Timeout Settings&lt;/STRONG&gt;: If you're encountering timeouts during bulk uploads, you could also look into adjusting timeout settings for the upload process, either at the Atlas server or API level, if that's a feasible option.&lt;/P&gt;&lt;P class=""&gt;If you want to upload everything in one batch but avoid timeouts, breaking down the process into smaller, logical steps, while maintaining the required relationships, is usually the most effective approach.&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="flex justify-start"&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class="absolute"&gt;&lt;DIV class="flex items-center justify-center"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 26 Mar 2025 20:51:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Unable-to-upload-related-entities-separately/m-p/404886#M252384</guid>
      <dc:creator>vats</dc:creator>
      <dc:date>2025-03-26T20:51:11Z</dc:date>
    </item>
  </channel>
</rss>

