Member since
06-08-2017
1049
Posts
517
Kudos Received
312
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9882 | 04-15-2020 05:01 PM | |
5928 | 10-15-2019 08:12 PM | |
2410 | 10-12-2019 08:29 PM | |
9556 | 09-21-2019 10:04 AM | |
3501 | 09-19-2019 07:11 AM |
04-15-2020
05:01 PM
Hi @ChineduLB , You can use `.groupBy` and `concat_ws(",",collect_list)` functions and to generate `ID` use `row_number` window function. val df=Seq(("1","User1","Admin"),("2","User1","Accounts"),("3","User2","Finance"),("4","User3","Sales"),("5","User3","Finance")).toDF("ID","USER","DEPT") import org.apache.spark.sql.expressions.Window df.groupBy("USER"). agg(concat_ws(",",collect_list("DEPT")).alias("DEPARTMENT")). withColumn("ID",row_number().over(w)). select("ID","USER","DEPARTMENT").show()
... View more
10-15-2019
08:12 PM
1 Kudo
In EvaluateJsonPath processor add new property to extract event_id value from the flowfile. if flowfile is not having event_id then nifi adds empty value to the attribute. EvaluateJsonPath Configs: Then by using RouteOnAttribute processor we can check the attribute value and route the flowfile accordingly. RouteOnAttribute Configs: not null value ${event_id:isEmpty():not()} null value ${event_id:isEmpty()} Then use null value and not null value relationships for further processing..!!
... View more
10-12-2019
08:29 PM
1 Kudo
@Gerva Select count(*) from <table>; Query launches map reduce job and output will be displayed to the console. - If you want to store the output to file then use Insert overwrite directory '<directory_name>' select count(*) from scenariox; Now output of map reduce job will be stored into the given hdfs directory and you can find 00000_0 file in directory..
... View more
09-21-2019
10:04 AM
@budati For this case define your avro schema(with one field) to read incoming flowfile with some delimiter that doesn't exist in flowfile. So that whole row will be read as string then we can filter out the records by using not like (or) using regex operator in apache calicite. Select * from flowfile where col1 not like 'SKIP' Now output flowfile will not having any records that have SKIP in them and this solution will work dynamically for any number of columns.
... View more
09-20-2019
09:42 AM
@budati Define Avro schema for record reader as col1 and col2...etc. Treat first line as header property value as false Add new query in QueryRecord processor as select * from FLOWFILE where col1 != "SKIP" (or) select * from FLOWFILE where col1 <> "SKIP" **NOTE** assuming col1 has "SKIP" in it. For record writer define avro schema with your actual fileldnames. Now queryrecord will exclude all the records that have "SKIP" in them and writes the flowfile with actual fieldnames in mentioned format.
... View more
09-19-2019
07:11 AM
1 Kudo
@budati i don't think there is a way to combine all 3 processors into one. We still need to use ExecuteSQL -> ConvertAvroToJson -> EvaluateJsonPath to extract the values from the flowfile. If the answer was helpful to resolve your issue, Accept the answer to close the thread 🙂
... View more
09-17-2019
09:40 PM
@budati Did you tried using NiFi DBCPConnectionLookup service and we can make Dynamic lookup from RDBMS. - Please refer to this link for more details regards to LookupService.
... View more
09-17-2019
09:31 PM
@budati You can use QueryRecord processor and add new SQL query to select only the records that don't have value "SKIP" for the field by using Apache Calicite SQL parser. - For more reference regards to QueryRecord processor refer to this link.
... View more
09-11-2019
09:44 AM
@VijayM Try to run msck repair table hive> msck repair table <db_name>.<table_name>; then run select and filter queries on the table. For more details regards to msck repair table please refer to this link.
... View more
09-09-2019
09:07 PM
1 Kudo
@ANMAR Try with this regex (?:"b"\s*:\s*)"(.*?)", This will extract only 24 - ny value from the given event
... View more