Member since
09-24-2020
23
Posts
1
Kudos Received
0
Solutions
03-22-2021
12:22 AM
Thank you. I found the same but I wanted to know if I can use control rate processor in my scenario. I want to reduce the bandwidth coming from upstream to PUTHDFS. Can we limit the bandwidth in nifi?
... View more
03-21-2021
06:50 AM
Dear, Could you please explain me how the control rate processor works. Can we use control rate processor while writing data(large file:80GB) into hdfs(using PUTHDFS) to reduce the network bandwidth. The data rate should be 300mbps as we are using all the bandwidth of the network now while writing data into hdfs. Thank you
... View more
Labels:
- Labels:
-
Apache NiFi
12-18-2020
01:25 AM
I have a set of tables (20 tables) which fetches the last loaded date and count of the tables. INSERT OVERWRITE TABLE dbo.table PARTITION (last_load_date )
select 'table' as tablename,current_date,count(*) as count,last_load_date from table1 where last_load_date in (select max(last_load_date) from table1)
group by last_load_date
union
select 'table' as tablename,current_date,count(*) as count,last_load_date from table2 where last_load_date in (select max(last_load_date) from table2)
group by last_load_date
.
.
.
Union of 20 TABLES and configured the script like below and ran the script with command sh table.sh Table.sh
hive --hiveconf tez.queue.name=Last_date --hiveconf hive.session.id=data_xxx -f /mypath/union_query.sql I could see the error: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:876)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1091)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:890)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:263)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:87)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:1373)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setMapWork(GenMapRedUtils.java:684)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.setupMapWork(GenTezUtils.java:212)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.createMapWork(GenTezUtils.java:195)
at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:131)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10598)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) Could you please let me know how to optimize the query to avoid the above error as I use only union of tables. I tried to find the solution but I didn't find it in my case. Thank you in advance for the reply
... View more
Labels:
- Labels:
-
Apache Hive
11-18-2020
02:44 AM
Thank you for the reply @Kezia . I was able to filter the duplicates using detect duplicate processor. This is the error I'm getting when getsftp processor was scheduled on primary node GetFTP[id=xxxx] Unable to fetch listing from remote server due to java.net.ConnectException: Connection timed out (Connection timed out): Connection timed out (Connection timed out)
... View more
11-13-2020
05:43 AM
I have getSFTP processor which runs on 3 nodes. The getSFTP previously was running on primary node. As my node was not working properly I had to set to schedule to run on "all nodes" as a result I am receiving duplicate files.
Could you please let me know how to filter to take only one file from these 2 files(both are same files) and load into HDFS. Which means I have to put one file out of two duplicates to the data lake
Thank you
@PVVK @Kezia
... View more
Labels:
- Labels:
-
Apache NiFi
-
HDFS
11-06-2020
12:07 AM
IN NiFi I have a flow which fetches files from remote server and post data into HDFS. PUTHDFS relationship was set to both success and Failure. So, the files which are loaded to fail exists in same processor. Could you please let me know how to view list of files failed other than from data provenance?
... View more
Labels:
- Labels:
-
Apache NiFi
-
HDFS
10-02-2020
01:23 AM
Hello, Suddenly few processors in NiFi is not recording the data provenance information of its runs. Could you please let me know what are the checks to be done to check this issue? Thank you in advance
... View more
09-30-2020
08:55 AM
Thank you @PVVK . My NiFi version is 1.5.0 If I set execute on all nodes in FetchSFTP, Sometimes I am getting duplicate files fetched by different nodes
... View more
09-30-2020
02:08 AM
Thank you @PVVK for your solution, I am unable to view the option load strategy in the queue before the mergecontent processor So I have done this. Previously the fetchSFTP was set to execute on all nodes and I changed the option to execute on Primary node. As a result I am getting single file now. Please correct if if I am wrong Also There is a delay while data is loading into HDFS using PUTHDFS Processor.After compression, while there is a change in size from MB to GB, it is being loaded after 1 hour. Please find the screenshot below for your reference
... View more
09-29-2020
02:11 AM
Hello All, I could see error messages being displayed every minute on top most corner of Processor groups.The errors being displayed is regarding Processor FETCHSFTP trying to fetch an old file which doesn't exist in source, I don't want that error to be displayed. Could you please let me know what settings to be changed. Thank you in advance
... View more
Labels:
- Labels:
-
Apache NiFi