Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11220 | 04-15-2020 05:01 PM | |
| 7124 | 10-15-2019 08:12 PM | |
| 3108 | 10-12-2019 08:29 PM | |
| 11475 | 09-21-2019 10:04 AM | |
| 4336 | 09-19-2019 07:11 AM |
04-05-2019
01:57 AM
@Nera Majer I tried on my local instance and everything works as expected. - If you have .gz file in local FS then try to fetch the file ListFile+FetchFile from your local FS(instead of HDFS) and check are you able to fetch the whole file without any issues?. - Move Local file to HDFS using the below command. hadoop fs -put <local_File_path> <hdfs_path> then check are u able to get the file size as 371kb in hdfs? - If yes then try to run ListHDFS+FetchHDFS processors to fetch the newly moved file into HDFS directory. - Some threads related to similar issue. https://community.hortonworks.com/questions/106925/error-when-sending-data-to-api-from-nifi.html https://issues.apache.org/jira/browse/NIFI-5879
... View more
04-03-2019
11:37 PM
@Nikhil Belure You can use either: NiFi for this case by using List+Fetch File[Sftp] processors and use PutHDFS processor (or) Try using hadoop distcp to copy local files into HDFS as described in this thread. (or) If your directory have a lot of files in it then it would be much more faster if you tar (or) zip the files and then run copyFromLocal command.
... View more
04-03-2019
11:26 PM
@Nera Majer Make sure your fetched file have .gz extention in filename, if yes then check is the .gz file is in valid, by uncompress in the shell using gunzip command.
... View more
03-22-2019
01:09 AM
@sri chaturvedi Instead of using UpdateAttribute processor's state use DistributedMapCache and you can fetch the stored value across the cluster. Use PutDistributedMapCache processor to store the value that got assigned recently then use FetchDistributedMapCache processor to Fetch the store value then apply your logic(increment..etc) to assign new value then overwrite the already stored value in DistributedMapCache using PutDistributedMapCache processor. Use this and this links as references for configuring Distributedmapcache processors/controllers.
... View more
03-03-2019
02:39 AM
@Bala S Use QueryRecord processor with Record Reader and Record Writer controller services, and QueryRecord processor will result out the records that matches your Query. Use ApacheCalcite sql syntax to filter out the required records from the flowfile content. Some useful links link1, link2
... View more
02-14-2019
02:58 AM
@ujvala reddy Reason is The first Week of Year is the first week with 4 or more days in the new year. First day of week is Monday and last day of week is Sunday Refer to this thread for more details regards to this week of year.
... View more
01-23-2019
01:52 AM
@john y We can directly access filesize with ${fileSize} and this attribute expression will result the actual filesize value of flowfile.
... View more
01-21-2019
02:31 PM
1 Kudo
@Satya G AFAIK Yes, by using databricks spark-xml package, we can parse the xml file and create Dataframe on top of Xml data. Once we create dataframe then by using DataframeAPI functions we can analyze the data. Refer to this and this link for more details regards to usage/source code of Spark XML package.
... View more
01-18-2019
02:43 AM
@Manish Parab
Sure, In NiFi processors that triggers the flow(scheduled to run in cron) we need to run the processors on primary node only and running on all nodes means we are triggering n times the same processor on each node. That means NiFi each node works with data specifically that receives, in case of Getmongo processor(triggers the flow in this case) when running on all nodes will pull same data. - Run GetMongo(source processor) to run on primary node then distribute the load using RemoteProcessorGroups (or) connectionloadbalancing across the cluster.
... View more
01-18-2019
01:03 AM
@Bharath Good articles regards to tune Hive performance: Hive_performance_tune Tez_Performance_Tune . ExplainPlan This is too broad question to answer, here are my thoughts: 1.Check is your HiveJob is getting started running in Resource manager(not in queue waiting for resources i.e Accepted state..etc) 2.Check in HDFS how many files are there in the table pointed directory, too many small files will result poor performance. 3.Try running hive console in debug mode to see where the job is taking time to execute. 4.Check is there any skew's in the data and create table stating all these skewed columns in the table properties.
... View more