Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi - How to read values from XML and use in putFile flow

avatar
Contributor

I am trying to use a very basic flow. I have input directory and xml config file which has the path of output directory. I would like to use NiFi to take files from input directory and move to output directory which is defined in xml config file. Could you suggest the way to do this by using NiFi?

ex: input path: /var/tmp/input/file1.txt

config file (config.xml)

<config verbose="false" debugMode="false">
<path>/var/tmp/output/</path>
</config>

based on input path and config.xml, I would like to move file1.txt into /var/tmp/output/

8 REPLIES 8

avatar

Hi @Pavan Challa,

I'd recommend to use the EvaluateXPath processor:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache...

You can use the following XPath parameter:

/config/path

Extract it and put it as an attribute of your flow file and then you can use the way you want in the following steps.

Hope this helps.

avatar
Contributor

@Pierre Villard, Many thanks for the quick response. I have tied the below without any luck. Not sure what is wrong in the config but I am getting NullPointerException at PutFile process. Any thoughts?

16512-screen-shot-2017-06-20-at-150745.png

1. GetFile

16507-screen-shot-2017-06-20-at-150841.png

2. FetchFile (for config file)

16509-screen-shot-2017-06-20-at-154121.png

3. EvaluateXPath

16510-screen-shot-2017-06-20-at-154140.png

4. putFile

16511-screen-shot-2017-06-20-at-154155.png

avatar

First of all, you don't need to use both GetFile and FetchFile. GetFile is fine, but if you want to use FetchFile, it must be used in combination with ListFile. See article about List/Fetch pattern.

Then you want to send the path in the flow file attributes, not in the content. And there is a slash missing at the beginning of your XPath expression.

And now I realized that I misunderstood what you are trying to achieve. I didn't understand that you have two different files with one containing the destination path. I thought it was one single file.

So... basically, what I suggested is not going to be OK. But just in case, here is a template with what I had in mind. xpath.xml

Now let's focus on your use case. 🙂

You want to use a Lookup controller service that points to your configuration file. Then you can reference your controller service into a LookupAttribute processor that will extract the value from your configuration file and that will set it as an attribute of your flow file. Then the flow becomes: listFile, FetchFile, LookupAttribute, PutFile. Here is a template that should fulfill your requirements (just change the paths as needed).

Don't forget that controller services are defined at process group level.

Also note, if I'm correct, that this template requires latest version of NiFi to get it working. xmllookup.xml

avatar
Contributor

@Pierre Villard, Many thanks. I am able to use the given xmllookup.xml to complete my use case. I understand that I should give the config file path at Controller Service to extract the path and putFile. To extend my current use case, for example, I will be having 3 input files (emp.txt, dept.txt, sal.txt) at 3 different locations (like /tmp/emp/emp.txt, /tmp/sal/sal.txt, etc ...) and I will have 3 config files (emp-config.xml, sal-config.xml, etc) and the putFile paths will be defined in config files. What would be the recommended approach to achieve this? In my real use case, I will be having 100s of input feeds (with different file name patterns) and 100s of xml files which defines where the input files should be stored (on HDFS).

Many thanks in advance.

avatar
Contributor

@Pierre Villard, I can combine all 3 config files into a single xml file as below.

<configuration verbose="false" debugMode="false">
  <dataFlows>
        <dataFlow configurationId="7c888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="dept" enabled="true">
            <properties>
                <property name="country" value="it" />
                <property name="datasource" value="dept" />
            </properties>
            <to>
                <path>/var/tmp/${country}/${datasource}/csv</path>
            </to>
        </dataFlow>
        <dataFlow configurationId="8a888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="emp" enabled="true">
            <properties>
                <property name="country" value="it" />
                <property name="datasource" value="emp" />
            </properties>
            <to>
                <path>/var/tmp/${country}/${datasource}/csv</path>
            </to>
        </dataFlow>
        <dataFlow configurationId="9b888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="sal" enabled="true">
            <properties>
                <property name="country" value="it" />
                <property name="datasource" value="sal" />
            </properties>
            <to>
                <path>/var/tmp/${country}/${datasource}/csv</path>
            </to>
        </dataFlow>
    </dataFlows>
</configuration>

Could you advise how I can lookup for all 3 feeds attributes (path) and ingest them to local / hdfs path (with putFile)?

avatar
Contributor

@Pierre Villard Sorry for the delay in reply. I was focusing on to solve the other bits and took time to come to you. I think I jump too much. Let me take one at a time. For now, I will focus on a single file. I have the below xml file.

<configuration verbose="false" debugMode="false">
  <dataFlows>
        <dataFlow configurationId="7c888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="dept" enabled="true">
            <properties>
                <property name="country" value="it" />
                <property name="datasource" value="dept" />
            </properties>
            <from>
                <filePattern>department_*.csv.gz</filePattern>
            </from>
            <to>
                <path>/var/tmp/${country}/${datasource}/csv</path>
            </to>
        </dataFlow>
    </dataFlows>
</configuration>

When I try to access path from LookupAttribute, putFile is reading the path as /var/tmp/${country}/${datasource}/csv instead of /var/tmp/it/dept/csv. Do you know the easy way to solve this before I ask further support? Many thanks.

avatar

The XML path must follow the following requirements:

http://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html

I think that's doable. Not sure this is the best approach if you have 100s of input directories though. If you have one input directory for one output directory, is there a way to compute the destination directory based on the path of the input directory? Could be easier to use expression language on the input directory to define the output one.

avatar
New Contributor

Hi, @Shu,

Can you please help me to find the value from xml file, below is the xml and i want to retrieve href value in bold/underlined, below is my response after invokeHttp. Here my requirement is to get next page URL and invoke next url to get new data and i get this new url at the end of xml flow-file. I want to loop into new URL until there is no data present in the source.


<feed xmlns="http://www.w3.org/2005/Atom" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xml:base="https://xxx/">

<id>...</id>

<title type="text">PartnerContactCollection</title>

<updated>2019-05-17T04:28:07Z</updated>

<author>...</author>

<link href="PartnerContactCollection" rel="self" title="PartnerContactCollection"/>

<entry m:etag="W/"datetimeoffset'2019-04-04T17%3A54%3A50.1540090Z'"">

<id>

https://xxx('00163E07D')

</id>

<title type="text">

PartnerContactCollection('00163E07D')

</title>

<updated>2019-05-17T04:28:07Z</updated>

<category term="c4codata.PartnerContact" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>

<link href="PartnerContactCollection('00163E07D')" rel="edit" title="PartnerContact"/>

<link href="PartnerContactCollection('00163E07D')/PartnerContactSalesResponsibility" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/PartnerContactSalesResponsibility" type="application/atom+xml;type=feed" title="PartnerContactSalesResponsibility"/>

<link href="PartnerContactCollection('00163E07D')/PartnerContactBusinessRoleAssignment" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/PartnerContactBusinessRoleAssignment" type="application/atom+xml;type=feed" title="PartnerContactBusinessRoleAssignment"/>

<content type="application/xml">

<m:properties>

<d:UserID/>

<d:CreateUser>false</d:CreateUser>

<d:BusinessPartnerID>xxxx</d:BusinessPartnerID>

<d:Z_CMAT_ID>xxxx</d:Z_CMAT_ID>

</m:properties>

</content>

</entry>

<link rel="next" href="https://my2456.crm.ondemand.com/abc/c4c/xyz/v1/c4mnopqi/PthisChatCollect?$skiptoken=1001%20"/>

</feed>