Member since
02-01-2019
650
Posts
143
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2390 | 04-01-2019 09:53 AM | |
1262 | 04-01-2019 09:34 AM | |
5835 | 01-28-2019 03:50 PM | |
1347 | 11-08-2018 09:26 AM | |
3355 | 11-08-2018 08:55 AM |
11-26-2016
06:08 PM
@Marco Chou: Use the ip allocated to the box (ip from ifconfig) instead of using 127.0.0.1 to connect through browser.
... View more
11-25-2016
08:30 PM
2 Kudos
@Oliver Meyn This is the correct jira https://issues.apache.org/jira/browse/SPARK-12177 and yes, SASL_SSL is only available from Spark 2.0 and not in HDP2.4.2 which has Spark 1.6.1.
... View more
11-22-2016
05:51 PM
1 Kudo
@Fernando Lopez Bello You need to have a hivecontext to access hive tables. from pyspark.sql import HiveContext
sqlCtx = HiveContext(sc1)
... View more
11-20-2016
02:34 PM
@Jayanna TM You can use ignorePattern property to ignore .tmp files from a spool directory (https://flume.apache.org/FlumeUserGuide.html) ignorePattern=\.*tmp$
... View more
10-25-2016
11:54 AM
@bigdata.neophyte You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html) If you want a simple solution you could try something like: 1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01 2) Get the app status using : yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'
FINISHED SUCCEEDED
... View more
09-27-2016
09:35 AM
@Mourad Chahri Can you check if you have enough disk available on the node ?
... View more
09-27-2016
09:26 AM
@Muthukumar S : You need to either add the aws keys in the hadoop command or permanently add them in core-site.xml. Are you able to do a hadoop fs -ls s3a://${BUCKET_NAME}/ [feel free to add keys accordingly] (This is to isolate authentication and connectivity issue)?
... View more
09-21-2016
03:37 PM
@Mats Johansson By spark on R, I mean Running Spark on R server. Which is the recommended one ? Spark on R vs SparkR ? Would also like to know the performance between both of them.
... View more
08-29-2016
01:22 PM
1 Kudo
@Ryan Spring
Please use the dependencies mentioned here : https://community.hortonworks.com/questions/27966/kafkaspout-fails-with-zookeeper-socket-issues-in-k.html securityProtocol property is available in hortonworks repo.
... View more
08-26-2016
09:12 AM
@Roberto Sancho : From the trace it looks like connection is timing out. Can you check ? Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out
... View more