Created 06-20-2017 01:39 PM
I am trying to use a very basic flow. I have input directory and xml config file which has the path of output directory. I would like to use NiFi to take files from input directory and move to output directory which is defined in xml config file. Could you suggest the way to do this by using NiFi?
ex: input path: /var/tmp/input/file1.txt
config file (config.xml)
<config verbose="false" debugMode="false">
<path>/var/tmp/output/</path>
</config>
based on input path and config.xml, I would like to move file1.txt
into /var/tmp/output/
Created 06-20-2017 01:57 PM
Hi @Pavan Challa,
I'd recommend to use the EvaluateXPath processor:
You can use the following XPath parameter:
/config/path
Extract it and put it as an attribute of your flow file and then you can use the way you want in the following steps.
Hope this helps.
Created on 06-20-2017 02:59 PM - edited 08-17-2019 07:04 PM
@Pierre Villard, Many thanks for the quick response. I have tied the below without any luck. Not sure what is wrong in the config but I am getting NullPointerException at PutFile process. Any thoughts?
1. GetFile
2. FetchFile (for config file)
3. EvaluateXPath
4. putFile
Created 06-20-2017 03:39 PM
First of all, you don't need to use both GetFile and FetchFile. GetFile is fine, but if you want to use FetchFile, it must be used in combination with ListFile. See article about List/Fetch pattern.
Then you want to send the path in the flow file attributes, not in the content. And there is a slash missing at the beginning of your XPath expression.
And now I realized that I misunderstood what you are trying to achieve. I didn't understand that you have two different files with one containing the destination path. I thought it was one single file.
So... basically, what I suggested is not going to be OK. But just in case, here is a template with what I had in mind. xpath.xml
Now let's focus on your use case. 🙂
You want to use a Lookup controller service that points to your configuration file. Then you can reference your controller service into a LookupAttribute processor that will extract the value from your configuration file and that will set it as an attribute of your flow file. Then the flow becomes: listFile, FetchFile, LookupAttribute, PutFile. Here is a template that should fulfill your requirements (just change the paths as needed).
Don't forget that controller services are defined at process group level.
Also note, if I'm correct, that this template requires latest version of NiFi to get it working. xmllookup.xml
Created 06-20-2017 04:23 PM
@Pierre Villard, Many thanks. I am able to use the given xmllookup.xml to complete my use case. I understand that I should give the config file path at Controller Service to extract the path and putFile. To extend my current use case, for example, I will be having 3 input files (emp.txt, dept.txt, sal.txt) at 3 different locations (like /tmp/emp/emp.txt, /tmp/sal/sal.txt, etc ...) and I will have 3 config files (emp-config.xml, sal-config.xml, etc) and the putFile paths will be defined in config files. What would be the recommended approach to achieve this? In my real use case, I will be having 100s of input feeds (with different file name patterns) and 100s of xml files which defines where the input files should be stored (on HDFS).
Many thanks in advance.
Created 06-20-2017 04:54 PM
@Pierre Villard, I can combine all 3 config files into a single xml file as below.
<configuration verbose="false" debugMode="false"> <dataFlows> <dataFlow configurationId="7c888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="dept" enabled="true"> <properties> <property name="country" value="it" /> <property name="datasource" value="dept" /> </properties> <to> <path>/var/tmp/${country}/${datasource}/csv</path> </to> </dataFlow> <dataFlow configurationId="8a888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="emp" enabled="true"> <properties> <property name="country" value="it" /> <property name="datasource" value="emp" /> </properties> <to> <path>/var/tmp/${country}/${datasource}/csv</path> </to> </dataFlow> <dataFlow configurationId="9b888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="sal" enabled="true"> <properties> <property name="country" value="it" /> <property name="datasource" value="sal" /> </properties> <to> <path>/var/tmp/${country}/${datasource}/csv</path> </to> </dataFlow> </dataFlows> </configuration>
Could you advise how I can lookup for all 3 feeds attributes (path) and ingest them to local / hdfs path (with putFile)?
Created 06-22-2017 02:06 PM
@Pierre Villard Sorry for the delay in reply. I was focusing on to solve the other bits and took time to come to you. I think I jump too much. Let me take one at a time. For now, I will focus on a single file. I have the below xml file.
<configuration verbose="false" debugMode="false"> <dataFlows> <dataFlow configurationId="7c888be1-04ac-11e7-8ab3-3c970ee9aa0b" name="dept" enabled="true"> <properties> <property name="country" value="it" /> <property name="datasource" value="dept" /> </properties> <from> <filePattern>department_*.csv.gz</filePattern> </from> <to> <path>/var/tmp/${country}/${datasource}/csv</path> </to> </dataFlow> </dataFlows> </configuration>
When I try to access path from LookupAttribute, putFile is reading the path as /var/tmp/${country}/${datasource}/csv instead of /var/tmp/it/dept/csv. Do you know the easy way to solve this before I ask further support? Many thanks.
Created 06-20-2017 08:17 PM
The XML path must follow the following requirements:
http://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html
I think that's doable. Not sure this is the best approach if you have 100s of input directories though. If you have one input directory for one output directory, is there a way to compute the destination directory based on the path of the input directory? Could be easier to use expression language on the input directory to define the output one.
Created 05-20-2019 12:42 AM
Hi, @Shu,
Can you please help me to find the value from xml file, below is the xml and i want to retrieve href value in bold/underlined, below is my response after invokeHttp. Here my requirement is to get next page URL and invoke next url to get new data and i get this new url at the end of xml flow-file. I want to loop into new URL until there is no data present in the source.
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xml:base="https://xxx/">
<id>...</id>
<title type="text">PartnerContactCollection</title>
<updated>2019-05-17T04:28:07Z</updated>
<author>...</author>
<link href="PartnerContactCollection" rel="self" title="PartnerContactCollection"/>
<entry m:etag="W/"datetimeoffset'2019-04-04T17%3A54%3A50.1540090Z'"">
<id>
</id>
<title type="text">
PartnerContactCollection('00163E07D')
</title>
<updated>2019-05-17T04:28:07Z</updated>
<category term="c4codata.PartnerContact" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/>
<link href="PartnerContactCollection('00163E07D')" rel="edit" title="PartnerContact"/>
<link href="PartnerContactCollection('00163E07D')/PartnerContactSalesResponsibility" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/PartnerContactSalesResponsibility" type="application/atom+xml;type=feed" title="PartnerContactSalesResponsibility"/>
<link href="PartnerContactCollection('00163E07D')/PartnerContactBusinessRoleAssignment" rel="http://schemas.microsoft.com/ado/2007/08/dataservices/related/PartnerContactBusinessRoleAssignment" type="application/atom+xml;type=feed" title="PartnerContactBusinessRoleAssignment"/>
<content type="application/xml">
<m:properties>
<d:UserID/>
<d:CreateUser>false</d:CreateUser>
<d:BusinessPartnerID>xxxx</d:BusinessPartnerID>
<d:Z_CMAT_ID>xxxx</d:Z_CMAT_ID>
</m:properties>
</content>
</entry>
<link rel="next" href="https://my2456.crm.ondemand.com/abc/c4c/xyz/v1/c4mnopqi/PthisChatCollect?$skiptoken=1001%20
"/>
</feed>