About Shu_ashu

Shu_ashu · ‎04-15-2020

Hi @ChineduLB , You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function. val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT") import org.apache.spark.sql.expressions.Window df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()

Shu_ashu · ‎10-15-2019

In EvaluateJsonPath processor add new property to extract event_id value from the flowfile. if flowfile is not having event_id then nifi adds empty value to the attribute. EvaluateJsonPath Configs: Then by using RouteOnAttribute processor we can check the attribute value and route the flowfile accordingly. RouteOnAttribute Configs: not null value ${event_id:isEmpty():not()} null value ${event_id:isEmpty()} Then use null value and not null value relationships for further processing..!!

Shu_ashu · ‎10-12-2019

@Gerva Select count(*) from <table>; Query launches map reduce job and output will be displayed to the console. - If you want to store the output to file then use Insert overwrite directory '<directory_name>' select count(*) from scenariox; Now output of map reduce job will be stored into the given hdfs directory and you can find 00000_0 file in directory..

Shu_ashu · ‎09-21-2019

@budati For this case define your avro schema(with one field) to read incoming flowfile with some delimiter that doesn't exist in flowfile. So that whole row will be read as string then we can filter out the records by using not like (or) using regex operator in apache calicite. Select * from flowfile where col1 not like 'SKIP' Now output flowfile will not having any records that have SKIP in them and this solution will work dynamically for any number of columns.

Shu_ashu · ‎09-20-2019

@budati Define Avro schema for record reader as col1 and col2...etc. Treat first line as header property value as false Add new query in QueryRecord processor as select * from FLOWFILE where col1 != "SKIP" (or) select * from FLOWFILE where col1 <> "SKIP" **NOTE** assuming col1 has "SKIP" in it. For record writer define avro schema with your actual fileldnames. Now queryrecord will exclude all the records that have "SKIP" in them and writes the flowfile with actual fieldnames in mentioned format.

Shu_ashu · ‎09-19-2019

@budati i don't think there is a way to combine all 3 processors into one. We still need to use ExecuteSQL -> ConvertAvroToJson -> EvaluateJsonPath to extract the values from the flowfile. If the answer was helpful to resolve your issue, Accept the answer to close the thread 🙂

Shu_ashu · ‎09-17-2019

@budati Did you tried using NiFi DBCPConnectionLookup service and we can make Dynamic lookup from RDBMS. - Please refer to this link for more details regards to LookupService.

Shu_ashu · ‎09-17-2019

@budati You can use QueryRecord processor and add new SQL query to select only the records that don't have value "SKIP" for the field by using Apache Calicite SQL parser. - For more reference regards to QueryRecord processor refer to this link.

Shu_ashu · ‎09-11-2019

@VijayM Try to run msck repair table hive> msck repair table <db_name>.<table_name>; then run select and filter queries on the table. For more details regards to msck repair table please refer to this link.

Shu_ashu · ‎09-09-2019

@ANMAR Try with this regex (?:"b"\s*:\s*)"(.*?)", This will extract only 24 - ny value from the given event

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: NIFI - SQL Server Lookup

Re: CSV file with Duplicate Headers

Re: Select query on Hive partitioned table not wor...

Re: ExtractText picking up the first match.