Member since
03-04-2014
22
Posts
4
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13665 | 05-26-2016 12:58 PM | |
16549 | 02-26-2015 11:52 PM | |
2865 | 01-07-2015 11:59 PM |
07-11-2018
11:07 PM
While reading parquet file, How to convert Parquet DECIMAL datatype to String.
... View more
05-26-2016
12:58 PM
1 Kudo
Thanks @hubbarja. Spent the afternoon trying this out on the CDH 5.7.0 QuickStart VM, with a kerberos enabled cluster and Cloudera Kafka 2.0.0. I think perhaps I didn't quite phrase my question clearly, but what I was trying to ask was whether the spark-streaming-kafka client would support consuming from a Kafka cluster that has client SSL authentication required enabled. For anyone else who tries this, the summary is it won't work due to upstream Spark issue [SPARK-12177], which deals with support for the new Kafka 0.9 consumer / producer API. SSL, SASL_PLAINTEXT or SASL_SSL connections to Kafka all require use of the new API. In fact, this issue is referenced in the known issues released with CDH 5.7.0, I just didn't spot it in time. There's a pull request which appears to support SSL (but no form of Kerberos client authentication) in Github here, if anyone feels brave. Looking at the comments on the Spark ticket, it's going to be at least post Spark 2.0.0 release that this feature gets merged in, and probably not until 2.1.0. Back to the drawing board for me!
... View more
11-21-2015
05:38 AM
Hi, I am trying to schedule a spark job using cron. I have made a shell script and it executes well on the terminal. However, when I execute the script using cron it gives me insufficient memory to start JVM thread error. Every time I start the script using terminal there is no issue. This issue comes when the script starts with cron. Kindly if you could suggest something.
... View more
01-07-2015
11:59 PM
Ah, this got me on the right track. I've switched the 'hostname' command for 'hostname -f', redeployed the CSD jar and it works now. I think I'm a bit confused about how this works then. CM deploys a fresh config, it supplies the principal name to this initialisation script (presumably it's got the right one, as CM is aware of the keytabs it's distributing). That principal is regexed to remove the hostname and replace it with the template value of _HOST. Accumulo starts and replaces _HOST with the principal again. I guess the problem here is that 'hostname' returns an unqualified hostname whereas CM knows the server running the service by is FQDN. Maybe I'm missing something, but it seems like all the switching of hostnames an template values is what's causing the problem here. Thanks for pointing me in the right direction.
... View more