Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11220 | 04-15-2020 05:01 PM | |
| 7124 | 10-15-2019 08:12 PM | |
| 3107 | 10-12-2019 08:29 PM | |
| 11475 | 09-21-2019 10:04 AM | |
| 4336 | 09-19-2019 07:11 AM |
04-28-2019
02:37 AM
1 Kudo
@Rohit Bhattacharya Try with \[EUtranCellRelation\].*\.csv ,\[EUtranCellFDD\] *\.csv in GetSFTP processors. (or) Use RouteOnAttribute processor and filter out the files by matching filenames. Flow: 1.GetSFTP processor
2.RouteOnAttribute //keep matching nifi expression to match files Use NiFi expression language add two properties in RouteOnAttribute processor ${filename:startsWith('\[EUtranCellRelation\]')} (or) ${filename:contains('\[EUtranCellRelation\]')} Then matching files will be routed to the corresponding properties. Refer this and this links for more details.
... View more
04-26-2019
09:53 PM
@Bala S It seems to be NiFi user is not having access to run the commands, make sure you have given execute permissions to NiFi user. Refer to this and this links for similar kind of issues..!!
... View more
04-26-2019
01:34 AM
@Denis Sokol Here are my thoughts around competitors from Hortonworks.. Using Hive Transactional tables: 1.if we are getting full dump every time then you can try with Hive-Merge functionality(only in hortonworks) which data availability will be in less than a minute(depends on how much data we scanning and cluster resources..etc). Using HBase: 2.If you are thinking about only the latest version of each record then by Using Hbase we can handle all updates(but scanning a non row key will not give you any performance), use Phoenix on top HBase to get SQL on top of Nosql table. - Both approaches will server for updating the existing data and available only the latest version of the record. - Refer to this and this links about more details about these approaches. Using Druid: Refer to this link for druid. - It would be great if you comment out which way performed better (or) you have chosen for this case 🙂
... View more
04-20-2019
09:04 PM
@James Fowler We need to use ReplaceText processor after GenerateTableFetch processor and replace select * with: select col1, UCR_COST_IN_$ as UCR_COST_IN from table //replace * with column names and add an alias
... View more
04-19-2019
01:58 PM
@James Fowler In executesql processor change Normalize Table/Column Names property value to true (or) in your select query add an alias name to the special character field names without special character.
... View more
04-19-2019
01:56 AM
@Barath Natarajan Check out how many executors and memory that spark-sql cli has been initialized(it seems to be running on local mode with one executor). To debug the query run an explain plan on the query. Check out how many files in hdfs directory for each table, if too many files then consolidate them to smaller number. Another approach would be: -> Run spark-shell (or) pyspark with local mode/yarn-client mode with more number of executors/more memory -> Then load the tables into dataframe and then registerTempTable(spark1.X)/createOrReplaceTempView(if using spark2) -> Run your join using spark.sql("<join query>") -> Check out the performance of the query.
... View more
04-19-2019
01:28 AM
@Jeff Watson Could you try using GetHDFSFileInfo processor, as this processor accepts incoming connections and regex to match only the required directories/files/exclude files..!
... View more
04-19-2019
12:53 AM
1 Kudo
@Mahendiran Palani Samy Try with .option instead of hc.setConf Example: dataframe.write()
.format("parquet")
.option("compression","snappy")
.saveAsTable("<table_name>")
... View more
04-16-2019
12:42 AM
@Karthik Gullapalli You can use ExtractText processor to extract elements of Array then using Update attribute and ReplaceText processor we can create the final json. Flow: 1.ExtractText //add new property with regex expression to extract a,1
2.ReplaceText //always replace as replacement strategy and use nifi expression language to prepare json. (or) Use the approach specified in this article 2.2 to iterate through array of elements and then use nifi expression language to create output flowfile in json format using ReplaceText processor.
... View more
04-09-2019
03:16 AM
1 Kudo
@Kevin Lahey Not sure if you are using NiFi cluster (or) not, could you try to run ListS3 processor only on Primary Node only. As this processor intended to run only on primary node as per documentation.
... View more