Member since
08-05-2018
73
Posts
0
Kudos Received
0
Solutions
01-02-2024
03:09 AM
To save a DataFrame as a text file in PySpark, you need to convert it to an RDD first, or use DataFrame writer functions. Using DataFrame writer: df.write.format("text").save("path_to_output_directory") Converting to RDD and then using saveAsTextFile rdd = df.rdd.map(lambda row: str(row)) rdd.saveAsTextFile("path_to_output_directory")
... View more
06-15-2020
12:07 AM
@shubh As this is an older post that has been marked solved in 2018. You would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity to provide details specific to your environment that could aid others in providing a more accurate answer to your question.
... View more
10-21-2018
06:28 AM
Hi Shu, thanks for responding. The solution you provided appears a little difficult for something that I thought would be relatively simple. I will try your solution and let you know how I get on. In the meantime, have you seen the solution provided here: https://forums.databricks.com/questions/2848/how-do-i-create-a-single-csv-file-from-multiple-pa.html?childToView=12091
... View more
10-19-2018
10:13 AM
Hi guys, I'm sorry if the question seems a little confusing. Basically, I would just like to be able to save to a single file and the file to be overwritten each time it is saved. Thanks
... View more
08-13-2018
07:26 PM
@Carlton Patterson Looks like you have accepted another comment. I've made this reply as comment and this should be the correct one to accept as it helped in resolving your issue. 🙂
... View more
02-02-2018
09:58 PM
I assume this returns a limited result set, though, for large tables?
... View more
06-14-2018
07:26 AM
This could be because you are parsing actual data in the place of header,supposing your first row has header and second row onwards has data. Hence it can't parse data(int, string) as header(string). So try changing it to ("skip.header.line.count"="1"); Hope this helps.
... View more
02-01-2018
04:26 PM
Hi rtrivedi, I added the additional code as suggested, but I get the following error: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 2:0 cannot recognize input near 'set' 'hive' '.' in statement
... View more
01-28-2018
03:41 AM
Take a look at this guide: https://cwiki.apache.org/confluence/display/hive/languagemanual+dml#LanguageManualDML-Loadingfilesintotables You should either try INSERT INTO TABLE '${hiveconf:inputtable}' SELECT * FROM datafactory7 limit 14; or LOAD DATA INPATH '<HDFS PATH WHERE FILES LOCATED>' INTO TABLE ${hiveconf:inputtable};
... View more
02-01-2018
10:11 PM
Hi Jay, can you please let me know why I'm suddenly not able to access the Sandbox on port 2222? I was able before, but now I can't.
... View more