About aervits

aervits · ‎10-12-2016

I doubt it is that easy. I'd let OS handle that.

aervits · ‎10-11-2016

@Amal Babu this is my take, I'm sure there's better ways %spark import sqlContext.implicits._ val data = sc.wholeTextFiles("hdfs://sandbox.hortonworks.com:8020/user/guest/") val dataDF = data.toDF() dataDF.select("_1").show() result import sqlContext.implicits._ data: org.apache.spark.rdd.RDD[(String, String)] = hdfs://sandbox.hortonworks.com:8020/user/guest/ MapPartitionsRDD[64] at wholeTextFiles at <console>:68 dataDF: org.apache.spark.sql.DataFrame = [_1: string, _2: string] +--------------------+ | _1| +--------------------+ |hdfs://sandbox.ho...| |hdfs://sandbox.ho...| |hdfs://sandbox.ho...| +--------------------+ as long as you use wholeTextFiles you should be able to maintain filenames. From the documentation SparkContext.wholeTextFiles lets you read a directory containing multiple small text files, and returns each of them as (filename, content) pairs. This is in contrast with textFile , which would return one record per line in each file.

aervits · ‎10-10-2016

UID and GID get assigned in order of availability, it probably means you added some services before adding new hosts and therefore new hosts have unmatched IDs. Ambari does not guarantee the IDs will match across nodes, one way to do so is to create service IDs and groups beforehand manually rather then defaulting to Ambari.

aervits · ‎10-10-2016

zookeeper has been 3.4.6 for the last few HDP versions.

aervits · ‎10-09-2016

Simply said, any time the use case is not covered by the 170+ built-in processors and you have something really specific you're trying to process. additionally, something that cannot be covered by ExecuteScript processor, which is pretty hard to imagine is even the case as it's pretty powerful already. I guess if you're more comfortable with Java than Groovy, jython, lua or JavaScript, go with building a custom processor but always make sure you exhausted your search by anything built-in.

aervits · ‎10-09-2016

It depends on what release of HDP you're using, in 2.5 the steps to install Kafka ranger plugin are here http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_security/content/kafka_plugin.html Kafka version is 0.10.0.1 and ranger is 0.6 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch_relnotes_v250.html

aervits · ‎10-07-2016

@Marc Schriever do ls on the directory, I believe it already exists, you just need to give it more permissions rather than try to create it.

aervits · ‎10-07-2016

@Lester Martin @William Gonzalez

aervits · ‎10-07-2016

@Lester Martin @William Gonzalez

aervits · ‎10-06-2016

@Marc Schriever can you check permissions on /tmp directory

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: Adding hosts to cluster, found different UID a...

Re: how to get input file name of a record in spar...

Re: Adding hosts to cluster, found different UID a...

Re: ranger-kafka-plugin

Re: What kind of scenario custom processor will be...

Re: ranger-kafka-plugin

Re: Is there a chance to bring the nfs gateway on ...

Re: HDPCD: Java Exam Issues

Re: HDPCA exam not launched in examslocal.com

Re: Is there a chance to bring the nfs gateway on ...