Only a partial answer but in general I do not think REGEX_REPLACE cuts large strings. It will be hard to figure this out in more detail unless you can share a reproducible example.
Here is what i tested just now:
1. Create a table that contains a string of 60000+ characters (lorem ipsum)
2. Create a new table by selecting the regex replace of that string (i replaced every a with b)
3. Counting the length of the field in the new table
---
As said, it may well be that you are using a very specific string or regex that together create this problem, it would be interesting to see if this could be reduced to a minimal example. -- Also keep in mind that though they are very similar, there are many ways a regex itself can be parsed, perhaps the test you did is simply slightly different than the implementation in Hive.
- Dennis Jaheruddin
If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.