Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1669 | 09-11-2019 10:19 AM | |
9197 | 11-26-2018 07:04 PM | |
2387 | 11-14-2018 12:10 PM | |
5089 | 11-14-2018 12:09 PM | |
3051 | 11-12-2018 01:19 PM |
08-10-2018
12:58 PM
@Girish Khole How did you installed the spark client that is not part of the cluster? There are few considerations if the node is not managed by ambari such as: 1. The spark client version should be same as the one in the cluster 2. You need to make sure all the configuration files for hdfs/yarn/hive are copied from the cluster 3. When you launch a client in spark master mode this does not run in the cluster. This is running in standalone mode. To test cluster you need to use --master yarn (which can be used with client or cluster deployment modes) HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-10-2018
12:15 PM
@Mark sure, here is the link to the pyspark network word count example: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py HTH
... View more
08-09-2018
08:15 PM
@Harald Berghoff I checked the docker-deploy script for hdp 2.6.5 and we are doing docker pull from the hortonworks/sandbox-hdp in dockerhub. However this deploy script is not just doing that. Having that said you might want to wait until the sandbox for 3.0 is added on the hortonworks portal along with the corresponding scripts & instructions. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-09-2018
08:03 PM
1 Kudo
@Matt Krueger you should look at spark history server/spark ui to see the correct environment settings being used. Setting executor cores to 3 is actually going to use 3 concurrent threads in each executor. AFAIK this might not be same as yarn v-core concept. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-09-2018
06:16 PM
1 Kudo
@Harun
Zengin
By default Livy will launch an application on yarn, and usually the default master is set to yarn-cluster. This means and authenticated user could push code that could potentially run on any cluster worker nodes that have a running node manager. This containers are lunched by yarn, and the container process is always owned by the caller user (on this case the user that made the request to livy) So this container process will be running as the caller user and only have access to the caller users authorized resources. There is no way a user could read a keytab from the /etc/security/keytab directory. Same happens with HDFS data, unless this user has permissions to the files, user wont be able to access those. And this is valid also without Livy, as a user could use hdfs/webhdfs client to read data directly. At the same time there are other ways to push application code which are not limited to Livy. Like using spark-submit/spark-shell. Which work in similar fashion except perhaps those tend to be used from edge nodes on which only few users have access to. Having all that said, if you like to restrict access to Livy and not only rely on authentication. Look for Knox, Livy and Ranger integration to achieve this. This way you could reduce the number of users that use Livy's rest api by authorizing only specific groups/users. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-09-2018
05:44 PM
@Takefumi Oide No, but you can have multiple hiveserver2 processes configured with different authentication mechanisms. Lets say you need to have all the auth mechanisms listed above, then you add 1 hiveserver 2 process and configure it with SIMPLE+LDAP and then add another hiveserver2 process and configure it with LDAP+Kerberos. With ambari this can be done using config groups. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-09-2018
05:41 PM
@Sudharsan
Ganeshkumar if my answer has helped you please remember to login and mark it as accepted.
... View more
08-09-2018
01:51 PM
@Is Ta
null means the conversion failed. I think this is due your initial creationDate is actually a timestamp not a date. The following code is scala-spark as I'm not used to java-spark so much hopefully you can change it for java:
//dataframe is the original dataframe containing the creationDate column
val ds = dataframe.withColumn("timestamp",to_timestamp($"creationDate","dd/MM/yyyy HH:mm:ss"))
val result = ds.withColumn("date_formatted",date_format($"timestamp","dd/MM/yyyy HH:mm:ss"))
result.show
This is some example of the output:
+-------------------+-------------------+-------------------+ | input_date| timestamp| date_formatted| +-------------------+-------------------+-------------------+ |15/06/2018 09:15:28|2018-06-15 09:15:28|15/06/2018 09:15:28| |03/06/1982 09:15:28|1982-06-03 09:15:28|03/06/1982 09:15:28| +-------------------+-------------------+-------------------+
This also is saved correctly when you write to a file since the actual date_formatted column is a string.
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-08-2018
03:34 PM
@harish you can use webhdfs to save necessary files to hdfs. Then use the oozie rest api over knox and run your oozie workflows: https://oozie.apache.org/docs/4.0.1/WebServicesAPI.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-07-2018
01:13 PM
@Sudharsan
Ganeshkumar
Out of the box spark provides the fileStream. You can read more here: https://spark.apache.org/docs/latest/streaming-programming-guide.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more