Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3374 | 05-03-2017 05:13 PM | |
2802 | 05-02-2017 08:38 AM | |
3082 | 05-02-2017 08:13 AM | |
3012 | 04-10-2017 10:51 PM | |
1528 | 03-28-2017 02:27 AM |
10-12-2016
01:07 PM
I doubt it is that easy. I'd let OS handle that.
... View more
10-11-2016
04:31 PM
@Amal Babu this is my take, I'm sure there's better ways %spark
import sqlContext.implicits._
val data = sc.wholeTextFiles("hdfs://sandbox.hortonworks.com:8020/user/guest/")
val dataDF = data.toDF()
dataDF.select("_1").show()
result import sqlContext.implicits._
data: org.apache.spark.rdd.RDD[(String, String)] = hdfs://sandbox.hortonworks.com:8020/user/guest/ MapPartitionsRDD[64] at wholeTextFiles at <console>:68
dataDF: org.apache.spark.sql.DataFrame = [_1: string, _2: string]
+--------------------+
| _1|
+--------------------+
|hdfs://sandbox.ho...|
|hdfs://sandbox.ho...|
|hdfs://sandbox.ho...|
+--------------------+
as long as you use wholeTextFiles you should be able to maintain filenames. From the documentation SparkContext.wholeTextFiles lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with textFile , which would return one record per line in each file.
... View more
10-10-2016
11:37 PM
UID and GID get assigned in order of availability, it probably means you added some services before adding new hosts and therefore new hosts have unmatched IDs. Ambari does not guarantee the IDs will match across nodes, one way to do so is to create service IDs and groups beforehand manually rather then defaulting to Ambari.
... View more
10-09-2016
12:52 AM
Simply said, any time the use case is not covered by the 170+ built-in processors and you have something really specific you're trying to process. additionally, something that cannot be covered by ExecuteScript processor, which is pretty hard to imagine is even the case as it's pretty powerful already. I guess if you're more comfortable with Java than Groovy, jython, lua or JavaScript, go with building a custom processor but always make sure you exhausted your search by anything built-in.
... View more
10-09-2016
12:44 AM
It depends on what release of HDP you're using, in 2.5 the steps to install Kafka ranger plugin are here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_security/content/kafka_plugin.html Kafka version is 0.10.0.1 and ranger is 0.6 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.html
... View more
10-07-2016
12:53 PM
@Marc Schriever do ls on the directory, I believe it already exists, you just need to give it more permissions rather than try to create it.
... View more
10-06-2016
03:15 PM
@Marc Schriever can you check permissions on /tmp directory
... View more