Hi All, We need some help on a error that we face very rare in a job running on hive through beeline . Please suggest the reason for failure and how we can overcome it. Error Message: “Container is running beyond physical memory limits. Current usage: 2.0 GB of 2 GB physical memory used; 3.6 GB of 4.2 GB virtual memory used. Killing container.” Now when the job gets failed after we rerun the job the load is getting successfull. We are trying to load data of around 26 million after doing consolidation between delta records of 751721 and full records of 25774647. In both the successfull and unsuccessfull run the number of records processed is same. Query which is running is below: SET mapreduce.reduce.memory.mb=4096; SET mapreduce.reduce.java.opts=-Xmx3072M; set hive.auto.convert.join=false; drop table if exists A; create external table A (column_names <DATATYPES>); ---- Total 180 columns. insert into table A as select * from (select c.*, row_number() over (partition by c.<primary_key_columns> order by cast(c.SEQUENCE_NUMBER as bigint) desc) as rnk from (select * from delta_table union all select a.* from full_table a where year >= '2017' ) c )d where d.rnk=1; SEQUENCE_NUMBER is unique for each record. So to avoid multiple records of same primary key combination and to get latest updates we are identifying the latest updates using max(SEQUENCE_NUMBER ) which is comming from source.
... View more
Ensure that you are using the following SerDe: org.apache.hive.hcatalog.data.JsonSerDe No mapping required: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RowFormats&SerDe You should also consider trying to testi with the Hive built-in UDF: ' get_json_object'. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object
... View more