About Shu_ashu

Shu_ashu · ‎05-27-2018

@neeraj sharma The issue is because of using the processor id not the parent process groupid while creating template using snippetId, As NiFi not able to find out the parent process groupid that you are having in your curl api call. Make sure you have used the parent process group id in your api call, in your case you are having InvokeHttp processor in NiFi canvas and to get Parent Process group id click on the canvas(not on any processor in the canvas) and get the processor group id then use the id in your api call. curl -i -H 'Content-Type: application/json' -X POST -d '{"name":"Template_24_05_01","description":"","snippetId":"<snippet_id>"}' http://localhost:8080/nifi-api/process-groups/<parent_processor_groupId>/templates I would suggest to use Chrome/Firefox developer tools and create a template using NiFi UI way and see all the API calls that are making while creating template this way gives you more clear picture how to create templates, Then use NiFi RestAPI(curl) to create templates. Let us know if you are still facing issues..!!

Shu_ashu · ‎05-26-2018

@aman mittal Yes, it's possible. Take a look into the below sample flow Flow overview: 1.SelectHiveQL //to list tables from specific database in avro format HiveQL Select Query show tables from default //to list all tables from default database 2.ConvertAvroToJson //to convert the list of tables from avro format to json format 3.SplitJson //split each table into individual flowfiles 4.EvaluateJsonPath //extract tab_name value and keep as attribute to the flowfile 5.RemoteProcessorGroup //as you are going to do for 3k tables it's better to use RPG for distributing the work. if you don't want to use RPG then skip both 5,6 processors feed success relationship from 4 to 7. 6.InputPort //get the RPG flowfiles 7.SelectHiveQL //to pull data from the hive tables 8.EncryptContent 9.RouteOnAttribute //as selecthiveql processor writes query.input.tables attribute, so based on this attribute and NiFi expression language add two properties in the processor. Example: azure ${query.input.tables:startsWith("a")} //only tablenames starts with a gcloud ${query.input.tables:startsWith("e"):or(${query.input.tables:startsWith("a")})} //we are going to route table names starts with e(or)a to gcloud Feed the gcloud relationship to PutGCSobject processor and azure relationship to PutAzureBlobStorage processor. Refer to this link for NiFi expression language and make your expression that can route only the required tables to azure,gcs. In addition i have used only single database to list all the tables but if your 3k tables are coming from different databases then use GenerateFlowfile processor and add all the list of databases.Extract each database name as attribute --> feed the success relationship to SelectHiveQL processor. Refer to this link dynamically pass database attribute to first select hiveql processor. Reference flow.xml load-hivetables-to-azure-gcs195751.xml - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

Shu_ashu · ‎05-24-2018

@neeraj sharma I think the issue is because of time in between snippet and template api calls. in NiFi rest api docs it states snippet will be discarded if not used in a subsequent request after 1 minute. Create snippet and grad the snippet id and make curl call using the snippet id with in 1 minute. **Note: every create snippet api call creates new snippet id and we have to use the newly created snippet id in template creation curl api call. Refer to this link for more information about NiFi rest api. Let us know if you are facing any issues..!!

Shu_ashu · ‎05-24-2018

@Shailesh Bhaskar Yes,by using update attribute processor we can change the filenames. Flow: As shown in the above screenshot use the 2 update attribute processors before PutFile processor and add new property in as filename 2_csv Same way add new property as filename 3_csv in 3_csv relationship feeding update attribute processor. By changing the filenames like described above will keep filenames every time same. For the first run there will be no issues because each file will have different filenames, but for the second run if you are not caring about the already stored file in the directory then use Conflict Resolution Strategy as Replace. if you want to store all the files without any conflicts then use filename property values in update attribute processors as 2_csv_${UUID()} and 3_csv_${UUID()} UUID is unique number by using above expression language we are generating unique filename every time and there will be no conflicts. Refernce flow.xml queryrecord-filenames-191697.xml

Shu_ashu · ‎05-23-2018

@Vaibhav Kumar Make sure you are using correct mysql jdbc jar and check the connectivity between docker nifi instance,Local mysql. Please refer to the below links that having solutions for same exact issue, https://stackoverflow.com/questions/6865538/solving-a-communications-link-failure-with-jdbc-and-mysql?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa https://serverfault.com/questions/89955/unable-to-connect-to-mysql-through-jdbc-connector-through-tomcat-or-externally?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

Shu_ashu · ‎05-23-2018

@Winnie Philip Query Record processor dynamic properties allows expression language, so we can use Distributed Cache for this case. Flow1:Load id's into DistributedCache: Use PutDistributedMapCache server and load all your required id's into Cache. Flow2: Fetch id's from DistributeCache: In the original flow use FetchDistributedMapCache processor to fetch the cached data and keep them as attribute, use the same attribute name in your QueryRecord processor. QueryRecord dynamic property query would be SELECT * FROM FLOWFILE where VBEOID in (${fetch_distributed_cache_attribute_name}) You may have to change MaxCacheEntrysize property in PutDistributedMapCache and Max Length To Put In Attribute property in FetchDistributedMapCache processors as per the size data going to cache and retrieve from cache. Sample Flow: Refer to this link for configuring and usage of Put/FetchDistributedCache processors. By using this method we are not hard coding the values in QueryRecord processor based on the attribute value from the FetchDistributedCache we are running query in QueryRecord processor dynamically.

Shu_ashu · ‎05-23-2018

@Shantanu kumar If you are running Hive joins/queries to populate some tables/directories, then hive doesn't create empty files in HDFS directories.

Shu_ashu · ‎05-22-2018

@Shailesh Bhaskar Sure,i have use generateflowfile processor to create data then use QueryRecord processor and added two dynamic properties. Reference template query-record-191697.xml Let me know if you are facing any issues..

Shu_ashu · ‎05-22-2018

@Shailesh Bhaskar Use Query Record processor by configuring Record Reader as CsvReader and Record Writer as CsvSetWriter controller services. In CsvSetWriter controller service change the property include header line to False. Add dynamic properties to Query Record processor as 2_csv select * from Flowfile where id <3 3_csv select * from Flowfile where id >2 Now use 2_csv,3_csv relationships from QueryRecord processor. Input: id,name,age,location 1,Shailesh,35,Bangalore 2,Ajay,25,Goa 3,Sanjay,30,Chennai 4,Raman,32,Hyderabad Output from QueryRecord Processor: 2_csv relation: 1,Shailesh,35,Bangalore 2,Ajay,25,Goa 3_csv relation: 3,Sanjay,30,Chennai 4,Raman,32,Hyderabad Please refer to this and this links to configure/usage of Query Record processor.

Shu_ashu · ‎05-21-2018

@Saikrishna Tarapareddy Try with this approach once In this flow we are forking once the file is pulled and on right side we are going to have all the contents without header on Left side we are doing head -1 on the flowfile content to get only the header then by using replace text we are going to replacing the special characters. In Both UpdateAttribute processors we are going to add GroupIdentifier and Order Attribute, so that are going to use these attributes in Enforce Order processor. By using EnforceOrder Processor we are waiting for header flowfile(left side) to reach first then only we are going to process without header flowfile(right side). Then change the success queue configurations of EnforceOrder processor prioritizers as FirstInFirstOutPrioritizer . By using MergeContent Processor to merge the header with the flowfile content.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: I am getting 'Unable to find snippet with id #...

Re: Schedule a single nifi process group for multi...

Re: I am getting 'Unable to find snippet with id #...

Re: Split a File into multiple files using line nu...

Re: Connecting to mysql using nifi

Re: Can NIFI QueryRecord Processor configure FlowF...

Re: What is LazyOutputFormat in Hadoop?

Re: Split a File into multiple files using line nu...

Re: Split a File into multiple files using line nu...

Re: NiFi Updating header