About barlow

barlow · ‎08-13-2018

The syntax error is with 'currentdate'

barlow · ‎08-13-2018

Sandeep, Thanks for reaching out. I'm getting the following error from the import function Append ResultsClear Results File "<ipython-input-7-3dab170099f6>", line 3 import datetime currentdate = datetime.datetime.now().strftime("%Y-%m-%d") ^SyntaxError: invalid syntax

barlow · ‎08-13-2018

Hello community, I have created the following pyspark query: from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/HumanResources_vEmployeeDepartment.csv',inferSchema=True,header=True) df.createOrReplaceTempView('HumanResources_vEmployeeDepartment') counts = spark.sql("""SELECT FirstName ,LastName ,JobTitle FROM HumanResources_vEmployeeDepartment ORDER BY FirstName, LastName DESC""") counts.coalesce(1).write.csv("/home/packt/Downloads/myresults3.csv") I would like to add the current date and time to the file called myresults3. I think the code would look something like the following: counts.coalesce(1).write.csvCONCAT("/home/packt/Downloads/'myresults3'-CURRENTDATE.csv") I'm sure I'm way off the mark with the above attempt, but I'm sure you can see what I'm trying to achieve. Any help will be appreciated. Cheers Carlton

barlow · ‎08-06-2018

Is there a way to get the results with the header info?

barlow · ‎08-06-2018

Felix, thank you so much. It worked like a dream

barlow · ‎08-06-2018

Hello community, The output from the pyspark query below produces the following output The pyspark query is as follows: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/HumanResources_vEmployeeDepartment.csv',inferSchema=True,header=True) df.createOrReplaceTempView('HumanResources_vEmployeeDepartment') myresults = spark.sql("""SELECT FirstName ,LastName ,JobTitle FROM HumanResources_vEmployeeDepartment ORDER BY FirstName, LastName DESC""") myresults.show() Can someone show me how to save the results to a text / csv file ( or any file please) Thanks Carlton

barlow · ‎08-05-2018

ok, as I'm not getting much assistance with my original question I thought I would try and figure out the problem myself. So I rewrote the pyspark.sql as follows: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ops').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/Person_Person.csv',inferSchema=True,header=True) df.createOrReplaceTempView('Person_Person') myresults = spark.sql("""SELECT PersonType ,COUNT(PersonType) AS `Person Count` FROM Person_Person GROUP BY PersonType""") myresults.collect() result = myresults.collect() result result.saveAsTextFile("test") However, I'm now getting the following error message: AttributeError: 'list' object has no attribute 'saveAsTextFile' I think this could be an easier situation to help resolve. So, if someone could help resolve this issue that would be most appreciated Thanks

barlow · ‎08-05-2018

Hello community, My first post here, so please let me know if I'm not following protocol. I have written a pyspark.sql query as shown below. I would like the query results to be sent to a textfile but I get the error: AttributeError: 'DataFrame' object has no attribute 'saveAsTextFile' Can someone take a look at the code and let me know where I'm going wrong: #%% import findspark findspark.init('/home/packt/spark-2.1.0-bin-hadoop2.7') from pyspark.sql import SparkSession def main(): spark = SparkSession.builder.appName('aggs').getOrCreate() df = spark.read.csv('/home/packt/Downloads/Spark_DataFrames/sales_info.csv',inferSchema=True,header=True) df.createOrReplaceTempView('sales_info') example8 = spark.sql("""SELECT * FROM sales_info ORDER BY Sales DESC""") example8.saveAsTextFile("juyfd") main() Any help would be appreciated carlton

barlow · ‎02-02-2018

Sankaru, I just realised that ..

barlow · ‎02-01-2018

Hi Jay, can you please let me know why I'm suddenly not able to access the Sandbox on port 2222? I was able before, but now I can't.

Online	Offline
Last Visited	‎08-14-2018 02:05 PM

Member Since	‎08-05-2018 02:01 AM
Last Visited	‎08-14-2018 02:05 PM
Posts	73

Cloudera Community

Re: How to concatenate a date to a filename in pys...

Re: How to concatenate a date to a filename in pys...

How to concatenate a date to a filename in pyspark

Re: How to save all the output of pyspark sql quer...

Re: How to save all the output of pyspark sql quer...

How to save all the output of pyspark sql query in...

Re: Pyspark issue AttributeError: 'DataFrame' obje...

Pyspark issue AttributeError: 'DataFrame' object h...

Re: How to store Query Results to Local Drive

Re: How to access HDFS and Hive database.