About Sru111

Sru111 · ‎12-18-2020

I have a set of tables (20 tables) which fetches the last loaded date and count of the tables. INSERT OVERWRITE TABLE dbo.table PARTITION (last_load_date ) select 'table' as tablename,current_date,count(*) as count,last_load_date from table1 where last_load_date in (select max(last_load_date) from table1) group by last_load_date union select 'table' as tablename,current_date,count(*) as count,last_load_date from table2 where last_load_date in (select max(last_load_date) from table2) group by last_load_date . . . Union of 20 TABLES and configured the script like below and ran the script with command sh table.sh Table.sh hive --hiveconf tez.queue.name=Last_date --hiveconf hive.session.id=data_xxx -f /mypath/union_query.sql I could see the error: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3664) at java.lang.String.<init>(String.java:207) at java.lang.StringBuilder.toString(StringBuilder.java:407) at java.lang.String.valueOf(String.java:2994) at java.lang.StringBuilder.append(StringBuilder.java:131) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:876) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1091) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:890) at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:263) at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:87) at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:1373) at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setMapWork(GenMapRedUtils.java:684) at org.apache.hadoop.hive.ql.parse.GenTezUtils.setupMapWork(GenTezUtils.java:212) at org.apache.hadoop.hive.ql.parse.GenTezUtils.createMapWork(GenTezUtils.java:195) at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:131) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109) at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72) at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368) at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10598) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) Could you please let me know how to optimize the query to avoid the above error as I use only union of tables. I tried to find the solution but I didn't find it in my case. Thank you in advance for the reply

Sru111 · ‎11-06-2020

IN NiFi I have a flow which fetches files from remote server and post data into HDFS. PUTHDFS relationship was set to both success and Failure. So, the files which are loaded to fail exists in same processor. Could you please let me know how to view list of files failed other than from data provenance?

Sru111 · ‎09-30-2020

Thank you @PVVK . My NiFi version is 1.5.0 If I set execute on all nodes in FetchSFTP, Sometimes I am getting duplicate files fetched by different nodes

Sru111 · ‎09-30-2020

Thank you @PVVK for your solution, I am unable to view the option load strategy in the queue before the mergecontent processor So I have done this. Previously the fetchSFTP was set to execute on all nodes and I changed the option to execute on Primary node. As a result I am getting single file now. Please correct if if I am wrong Also There is a delay while data is loading into HDFS using PUTHDFS Processor.After compression, while there is a change in size from MB to GB, it is being loaded after 1 hour. Please find the screenshot below for your reference

Sru111 · ‎09-29-2020

Hello All, I could see error messages being displayed every minute on top most corner of Processor groups.The errors being displayed is regarding Processor FETCHSFTP trying to fetch an old file which doesn't exist in source, I don't want that error to be displayed. Could you please let me know what settings to be changed. Thank you in advance

Sru111 · ‎09-28-2020

Hi, I have created a post and I got few replies but suddenly I have my posted marked as spam. I tried to create new post but still I could see all my questions are marking as spam

Sru111 · ‎09-28-2020

I'm using GetDateandServer processor to fetch file names , decompress the files , remove header using executestreamcommand Process Command arguments- 1d Command path- sed IgnoreSTDIN_ False MergeContent Processor to merge files Merge strategy: Bin-Packing algorithm Merge format: Bin concatenation Merge data strategy: Do not merge uncommon metadata Min no of entries: 180 Max no of entries: 1000 Minimum Group Size: 60GB Max Bin age: 5 min Max no of bins: 1 UpdateAttribute to Create a files name Compress files putHDFS-To put data into HDFSMy problem is after ExecuteStreamcommand Processor is triggered in the queue I could see two positions with same value For every half hour I should get 1 output file but I could see 2 files as output. NIFi is running on 3 nodes, and list queue on 2 nodes. Files running on node1 is merged as 1 file and files running on node 2 is fetched as 2nd file so I'm getting 2 files. Could you please let me know how to get one file Thank you in advance for your help Note: I have posted the same question previously but I couldn't see it again so I posted again @Nifi

Online	Offline
Last Visited	‎03-23-2022 05:02 AM

Member Since	‎09-24-2020 09:35 AM
Last Visited	‎03-23-2022 05:02 AM
Posts	23
Kudos received	1

Cloudera Community

Java heap space:Hive

PUTHDFS- Files ingested into HDFS but I don't see ...

Re: NiFi MergeContent generating 2 output files in...

Re: NiFi MergeContent generating 2 output files in...

Clear errors in Bulletin- NiFi

My post marked as spam

NiFi MergeContent generating 2 output files instea...