Member since
09-24-2020
23
Posts
1
Kudos Received
0
Solutions
12-18-2020
01:25 AM
I have a set of tables (20 tables) which fetches the last loaded date and count of the tables. INSERT OVERWRITE TABLE dbo.table PARTITION (last_load_date )
select 'table' as tablename,current_date,count(*) as count,last_load_date from table1 where last_load_date in (select max(last_load_date) from table1)
group by last_load_date
union
select 'table' as tablename,current_date,count(*) as count,last_load_date from table2 where last_load_date in (select max(last_load_date) from table2)
group by last_load_date
.
.
.
Union of 20 TABLES and configured the script like below and ran the script with command sh table.sh Table.sh
hive --hiveconf tez.queue.name=Last_date --hiveconf hive.session.id=data_xxx -f /mypath/union_query.sql I could see the error: Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at java.lang.String.valueOf(String.java:2994)
at java.lang.StringBuilder.append(StringBuilder.java:131)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDDLFromFieldSchema(MetaStoreUtils.java:876)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getSchema(MetaStoreUtils.java:1091)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getPartitionMetadata(MetaStoreUtils.java:890)
at org.apache.hadoop.hive.ql.metadata.Partition.getMetadataFromPartitionSchema(Partition.java:263)
at org.apache.hadoop.hive.ql.plan.PartitionDesc.<init>(PartitionDesc.java:87)
at org.apache.hadoop.hive.ql.exec.Utilities.getPartitionDesc(Utilities.java:1373)
at org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setMapWork(GenMapRedUtils.java:684)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.setupMapWork(GenTezUtils.java:212)
at org.apache.hadoop.hive.ql.parse.GenTezUtils.createMapWork(GenTezUtils.java:195)
at org.apache.hadoop.hive.ql.parse.GenTezWork.process(GenTezWork.java:131)
at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:90)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.walk(GenTezWorkWalker.java:109)
at org.apache.hadoop.hive.ql.parse.GenTezWorkWalker.startWalking(GenTezWorkWalker.java:72)
at org.apache.hadoop.hive.ql.parse.TezCompiler.generateTaskTree(TezCompiler.java:368)
at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10598)
at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:219)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:474)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:330)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1233)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1274)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1170) Could you please let me know how to optimize the query to avoid the above error as I use only union of tables. I tried to find the solution but I didn't find it in my case. Thank you in advance for the reply
... View more
Labels:
- Labels:
-
Apache Hive
11-06-2020
12:07 AM
IN NiFi I have a flow which fetches files from remote server and post data into HDFS. PUTHDFS relationship was set to both success and Failure. So, the files which are loaded to fail exists in same processor. Could you please let me know how to view list of files failed other than from data provenance?
... View more
Labels:
- Labels:
-
Apache NiFi
-
HDFS
09-30-2020
08:55 AM
Thank you @PVVK . My NiFi version is 1.5.0 If I set execute on all nodes in FetchSFTP, Sometimes I am getting duplicate files fetched by different nodes
... View more
09-30-2020
02:08 AM
Thank you @PVVK for your solution, I am unable to view the option load strategy in the queue before the mergecontent processor So I have done this. Previously the fetchSFTP was set to execute on all nodes and I changed the option to execute on Primary node. As a result I am getting single file now. Please correct if if I am wrong Also There is a delay while data is loading into HDFS using PUTHDFS Processor.After compression, while there is a change in size from MB to GB, it is being loaded after 1 hour. Please find the screenshot below for your reference
... View more
09-29-2020
02:11 AM
Hello All, I could see error messages being displayed every minute on top most corner of Processor groups.The errors being displayed is regarding Processor FETCHSFTP trying to fetch an old file which doesn't exist in source, I don't want that error to be displayed. Could you please let me know what settings to be changed. Thank you in advance
... View more
Labels:
- Labels:
-
Apache NiFi
09-28-2020
07:05 AM
Hi, I have created a post and I got few replies but suddenly I have my posted marked as spam. I tried to create new post but still I could see all my questions are marking as spam
... View more
Labels:
- Labels:
-
Apache NiFi
09-28-2020
06:43 AM
1 Kudo
I'm using GetDateandServer processor to fetch file names , decompress the files , remove header using executestreamcommand Process Command arguments- 1d
Command path- sed
IgnoreSTDIN_ False MergeContent Processor to merge files Merge strategy: Bin-Packing algorithm
Merge format: Bin concatenation
Merge data strategy: Do not merge uncommon metadata
Min no of entries: 180
Max no of entries: 1000
Minimum Group Size: 60GB
Max Bin age: 5 min
Max no of bins: 1 UpdateAttribute to Create a files name Compress files putHDFS-To put data into HDFSMy problem is after ExecuteStreamcommand Processor is triggered in the queue I could see two positions with same value For every half hour I should get 1 output file but I could see 2 files as output. NIFi is running on 3 nodes, and list queue on 2 nodes. Files running on node1 is merged as 1 file and files running on node 2 is fetched as 2nd file so I'm getting 2 files. Could you please let me know how to get one file Thank you in advance for your help Note: I have posted the same question previously but I couldn't see it again so I posted again @Nifi
... View more
Labels:
- Labels:
-
Apache NiFi