Member since
05-18-2022
11
Posts
0
Kudos Received
0
Solutions
12-12-2022
05:06 AM
@wbivp If you are using kerberos authentication, you also need to provide "kerberos_service_name". Try setting kerberos_service_name: impala
... View more
08-31-2022
10:53 PM
Hi @Yosieam Please avoid calling read_file_log.collect() method. It will bring whole data to the driver and the driver needs to have more memory to hold that much data. Please check the modified code: move_to_rdd = sc.textFile("datalog2.log").map(lambda row : row.split("time=")).filter(lambda x : x != "")
ReSymbol = move_to_rdd.map(lambda x : re.sub(r'\t', ' ', x)).map(lambda x : re.sub(r'\n', ' ', x)).map(lambda x : re.sub(r' +', ' ', x))
... View more
08-31-2022
09:20 PM
Hi @Yosieam Thanks for sharing the code. You forgot to share the spark-submit/pyspark command. Please check what is executor/driver memory is passed to the spark-submit. Could you please confirm file is in local system/hdfs system.
... View more