Member since
09-28-2015
3
Posts
3
Kudos Received
0
Solutions
05-09-2016
10:00 PM
Thanks Benjamin - assume the data is sqooped in from an EDW and we don't have the flexibility to add a timestamp/data column to the source table in the EDW. Would comparing column hashes be the most performant way to figure out what records changed?
... View more
05-09-2016
05:04 PM
1 Kudo
I have a hive table to which new partitions get added (say daily).
And I want to write a daily hive query that tells me which records changed or were added that day. A unique record is a combination of multiple columns. Would using hive's hash or sha (with 256) udf be the best and most performant route to writing such a query? And will using a 256 hash be good enough to prevent collisions?
... View more
Labels:
- Labels:
-
Apache Hive
09-28-2015
10:19 PM
2 Kudos
is distcp the best/only mechanism for moving files into an encryption zone? The documentation says so. But would an hdfs mv or cp also work just fine?
... View more
Labels: