Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Why is LazySimpleSerde faster than JsonSerde

Why is LazySimpleSerde faster than JsonSerde

New Contributor

I have a pretty nested Json file stored in s3 bucket that I want to parse and store as an external table in Hive. 

I tried two Serde:

1. org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

2. org.openx.data.jsonserde.JsonSerDe

The first one wraps everything in a string and I parse it via Lateral View every time I query from the data.

The second is pretty well-know and parse the data on the lowest level. 

Both work but I have noticed the table using LazySerde pulls data faster than the one using JsonSerDe! Only took half of the time. 

My intuition is LazySerde should be slower as it uses Lateral View in each query while JsonSerde has already normalized everything when the table is created. 

Can somebody explain? Or has anybody encountered a similar situation?

Or is it merely my intuition is false LOL...

Due to confidentiality I cannot share the content of the file. Also if u see it, I bet there is a 99% chance u will lose interest in answering my question lol - super nested. 

Anyway, if there is an explanation I deeply appreciate it!

p.s. I have a workable solution. This post is not for troubleshooting but more for optimization. Thanks.

Don't have an account?
Coming from Hortonworks? Activate your account here