About sandyy006

sandyy006 · ‎11-26-2016

@Marco Chou: Use the ip allocated to the box (ip from ifconfig) instead of using 127.0.0.1 to connect through browser.

sandyy006 · ‎11-25-2016

@Oliver Meyn This is the correct jira https://issues.apache.org/jira/browse/SPARK-12177 and yes, SASL_SSL is only available from Spark 2.0 and not in HDP2.4.2 which has Spark 1.6.1.

sandyy006 · ‎11-22-2016

@Fernando Lopez Bello You need to have a hivecontext to access hive tables. from pyspark.sql import HiveContext sqlCtx = HiveContext(sc1)

sandyy006 · ‎11-20-2016

@Jayanna TM You can use ignorePattern property to ignore .tmp files from a spool directory (https://flume.apache.org/FlumeUserGuide.html) ignorePattern=\.*tmp$

sandyy006 · ‎10-25-2016

@bigdata.neophyte You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html) If you want a simple solution you could try something like: 1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01 2) Get the app status using : yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}' FINISHED SUCCEEDED

sandyy006 · ‎09-27-2016

@Mourad Chahri Can you check if you have enough disk available on the node ?

sandyy006 · ‎09-27-2016

@Muthukumar S : You need to either add the aws keys in the hadoop command or permanently add them in core-site.xml. Are you able to do a hadoop fs -ls s3a://${BUCKET_NAME}/ [feel free to add keys accordingly] (This is to isolate authentication and connectivity issue)?

sandyy006 · ‎09-21-2016

@Mats Johansson By spark on R, I mean Running Spark on R server. Which is the recommended one ? Spark on R vs SparkR ? Would also like to know the performance between both of them.

sandyy006 · ‎08-29-2016

@Ryan Spring Please use the dependencies mentioned here : https://community.hortonworks.com/questions/27966/kafkaspout-fails-with-zookeeper-socket-issues-in-k.html securityProtocol property is available in hortonworks repo.

sandyy006 · ‎08-26-2016

@Roberto Sancho : From the trace it looks like connection is timing out. Can you check ? Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out

Online	Offline
Last Visited	‎01-23-2020 02:33 AM

Member Since	‎02-01-2019 10:51 AM
Last Visited	‎01-23-2020 02:33 AM
Posts	650
Kudos received	142

Cloudera Community

Re: Distributed I/O Benchmark of HDFS

Re: How to reset Ambari admin password in Ambari 2...

Re: Discovering existing Hive tables in Atlas

Re: Does Distcp use Tez now in HDP 3.0 instead of ...

Re: hive server2 interactive logs

Re: -bash: ambari-admin-password-reset: command no...

Re: Is reading Kafka SASL_SSL from Spark streaming...

Re: spark-submit and hive tables - 'Table not foun...

Re: How to ignore .tmp files in flume spool dire...

Re: Job ID for a scheduled job

Re: how to repair Unhealthy Nodemanager ??

Re: HDFS Backup to AWS S3 without Keys error

Re: Spark on R vs R on Spark (SparkR) ?

Re: Connect Storm to secured Kafka with Kafka Spou...

Re: spark com.databricks.spark.csv doesnt work