About sandyy006

sandyy006 · ‎12-18-2017

@venkateswara reddy bukkasamudram Please refer: https://community.hortonworks.com/questions/33690/hdpcd-exam-network-issues.html

sandyy006 · ‎12-18-2017

@Ashnee Sharma: Do we know what is the data size of this table? "select * from rasdb.dim_account" gets complete data to driver and we need to make sure table data fits into driver.

sandyy006 · ‎12-15-2017

@Ashnee Sharma What is your driver memory? java.lang.OutOfMemoryError: GC overhead Try increasing the Driver memory according to the data size.

sandyy006 · ‎11-12-2017

Try updating your local /etc/hosts with the sandbox hostname and IP. @Aditya Srivastava

sandyy006 · ‎11-12-2017

@Swaapnika Guntaka : 9092 is not a default port in HDP rather it is 6667, Could you please check the port and re-run the producer.

sandyy006 · ‎10-20-2017

@Guilherme Colla Click on each host and start the datanode service.

sandyy006 · ‎10-20-2017

@karthick baskaran Here is the command to get number of lines in a file. Spark will internally load your text file and keep it in RDD/dataframe/dataset. spark-shell (spark 1.6.x) scala> val textFile = sc.textFile("README.md") scala> textFile.count() // Number of items in this RD

sandyy006 · ‎10-20-2017

@Guilherme Colla: Looks like you don't have any live datanodes. Can you check the status of datanodes and start them if they are down/haven't started?

sandyy006 · ‎10-20-2017

@pratik vagyani What is your HDP version?

sandyy006 · ‎10-20-2017

@karthick baskaranFor Part 1: Record counts: A simple rdd.count() or df.count() should give you the records count. For Part 2: Duplicate Check: You could load the data into a dataframe and run a distinct against it or use dropDuplicates [https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/DataFrame.html#dropDuplicates()]

Online	Offline
Last Visited	‎01-23-2020 02:33 AM

Member Since	‎02-01-2019 10:51 AM
Last Visited	‎01-23-2020 02:33 AM
Posts	650
Kudos received	142

Cloudera Community

Re: Distributed I/O Benchmark of HDFS

Re: How to reset Ambari admin password in Ambari 2...

Re: Discovering existing Hive tables in Atlas

Re: Does Distcp use Tez now in HDP 3.0 instead of ...

Re: hive server2 interactive logs

Re: Refund in case of network issues

Re: getting issue from spark-sql.

Re: getting issue from spark-sql.

Re: Not able to connect zookeeper from phoenix

Re: Kafka on HDP 2.6,2 doesn't consume messages fr...

Re: HDFS capacity is 0

Re: Record count and Duplicate check - using Spark

Re: HDFS capacity is 0

Re: Error while accessing s3 from spark

Re: Record count and Duplicate check - using Spark