Member since
09-08-2025
4
Posts
1
Kudos Received
0
Solutions
09-08-2025
09:01 PM
1 Kudo
Thank you for the response.
... View more
09-08-2025
09:00 PM
In a pyspark code which reads a hdfs path in it's respective format(text, orc, parquet) and writes it in parquet format in ozone path. Data is huge. 1) How to do resource calculations for the pyspark job Like no. of cores, no. of executors, memory allocation 2) Is there a way we can dynamically read the data from hdfs by adjusting according to it's file type. 3) What should be the optimal solution and approach for the data movement and what input mappings the approach should use.
... View more
Labels:
09-08-2025
05:04 AM
How to run spark df.write inside UDF called in rdd.foreach or rdd.foreachpartition I.e. spark session object inside executor.
... View more
Labels:
- Labels:
-
Apache Spark