03-20-2018 07:24 AM
I am preparing for CCP-DE575. I am following the objectives provided on https://www.cloudera.com/more/training/certification/ccp-data-engineer.html however there are few items that confused me.
1) Import and export data between an external RDBMS and your cluster, including the ability to import specific subsets, change the delimiter and file format of imported data during ingest, and alter the data access pattern or privileges [bold part is not cleared to me]
2) Deduplication and merge data (what do we mean by this?)
3) Tune data for optimal query performance [Have to apply DML in hive ?](what comes in the scope of this items)
Thanks in advance.