question Re: Record count and Duplicate check - using Spark in Support Questions

question Re: Record count and Duplicate check - using Spark in Support Questions https://community.cloudera.com/t5/Support-Questions/Record-count-and-Duplicate-check-using-Spark/m-p/231113#M192957 <P>Hi Sandeep Nemuri </P><P>Thanks for the answer, will try this. but should the data be in rdd? or can I use text files in hdfs? can I do record count and duplicate check using files? or just data frames, instead of having both RDD and Data frames. we are going to have a huge number of files and a huge data volume, so performance is very important. can you comment on that please.</P> Fri, 20 Oct 2017 20:52:56 GMT kpk_ds 2017-10-20T20:52:56Z