About Shu_ashu

Shu_ashu · ‎04-05-2019

@Nera Majer I tried on my local instance and everything works as expected. - If you have .gz file in local FS then try to fetch the file ListFile+FetchFile from your local FS(instead of HDFS) and check are you able to fetch the whole file without any issues?. - Move Local file to HDFS using the below command. hadoop fs -put <local_File_path> <hdfs_path> then check are u able to get the file size as 371kb in hdfs? - If yes then try to run ListHDFS+FetchHDFS processors to fetch the newly moved file into HDFS directory. - Some threads related to similar issue. https://community.hortonworks.com/questions/106925/error-when-sending-data-to-api-from-nifi.html https://issues.apache.org/jira/browse/NIFI-5879

Shu_ashu · ‎04-03-2019

@Nikhil Belure You can use either: NiFi for this case by using List+Fetch File[Sftp] processors and use PutHDFS processor (or) Try using hadoop distcp to copy local files into HDFS as described in this thread. (or) If your directory have a lot of files in it then it would be much more faster if you tar (or) zip the files and then run copyFromLocal command.

Shu_ashu · ‎04-03-2019

@Nera Majer Make sure your fetched file have .gz extention in filename, if yes then check is the .gz file is in valid, by uncompress in the shell using gunzip command.

Shu_ashu · ‎03-22-2019

@sri chaturvedi Instead of using UpdateAttribute processor's state use DistributedMapCache and you can fetch the stored value across the cluster. Use PutDistributedMapCache processor to store the value that got assigned recently then use FetchDistributedMapCache processor to Fetch the store value then apply your logic(increment..etc) to assign new value then overwrite the already stored value in DistributedMapCache using PutDistributedMapCache processor. Use this and this links as references for configuring Distributedmapcache processors/controllers.

Shu_ashu · ‎03-03-2019

@Bala S Use QueryRecord processor with Record Reader and Record Writer controller services, and QueryRecord processor will result out the records that matches your Query. Use ApacheCalcite sql syntax to filter out the required records from the flowfile content. Some useful links link1, link2

Shu_ashu · ‎02-14-2019

@ujvala reddy Reason is The first Week of Year is the first week with 4 or more days in the new year. First day of week is Monday and last day of week is Sunday Refer to this thread for more details regards to this week of year.

Shu_ashu · ‎01-23-2019

@john y We can directly access filesize with ${fileSize} and this attribute expression will result the actual filesize value of flowfile.

Shu_ashu · ‎01-21-2019

@Satya G AFAIK Yes, by using databricks spark-xml package, we can parse the xml file and create Dataframe on top of Xml data. Once we create dataframe then by using DataframeAPI functions we can analyze the data. Refer to this and this link for more details regards to usage/source code of Spark XML package.

Shu_ashu · ‎01-18-2019

@Manish Parab Sure, In NiFi processors that triggers the flow(scheduled to run in cron) we need to run the processors on primary node only and running on all nodes means we are triggering n times the same processor on each node. That means NiFi each node works with data specifically that receives, in case of Getmongo processor(triggers the flow in this case) when running on all nodes will pull same data. - Run GetMongo(source processor) to run on primary node then distribute the load using RemoteProcessorGroups (or) connectionloadbalancing across the cluster.

Shu_ashu · ‎01-18-2019

@Bharath Good articles regards to tune Hive performance: Hive_performance_tune Tez_Performance_Tune . ExplainPlan This is too broad question to answer, here are my thoughts: 1.Check is your HiveJob is getting started running in Resource manager(not in queue waiting for resources i.e Accepted state..etc) 2.Check in HDFS how many files are there in the table pointed directory, too many small files will result poor performance. 3.Try running hive console in debug mode to see where the job is taking time to execute. 4.Check is there any skew's in the data and create table stating all these skewed columns in the table properties.

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: Decompress the GZ file

Re: how to improve performance of copyfromlocal

Re: Decompress the GZ file

Re: how to generate sequence number in nifi cluste...

Re: How to filter the incoming JSON payload based ...

Re: Week Aggregation in Hive

Re: about getsftp processor's attributes

Re: XML processing using Spark

Re: EvaulateJSONPath processor configuration

Re: Hive Tez query is taking long time to run than...