question Re: Record count and Duplicate check - using Spark in Support Questions

question Re: Record count and Duplicate check - using Spark in Support Questions https://community.cloudera.com/t5/Support-Questions/Record-count-and-Duplicate-check-using-Spark/m-p/231114#M192958 <A rel="user" href="https://community.cloudera.com/users/15722/kpkds.html" nodeid="15722">@karthick baskaran</A><P>Here is the command to get number of lines in a file. Spark will internally load your text file and keep it in RDD/dataframe/dataset.</P><PRE>spark-shell (spark 1.6.x) scala> val textFile = sc.textFile("README.md") scala> textFile.count() // Number of items in this RD</PRE> Fri, 20 Oct 2017 22:16:05 GMT sandyy006 2017-10-20T22:16:05Z