Support Questions

Find answers, ask questions, and share your expertise

Error while submitting PySpark Application through Livy REST API

avatar
Explorer

Hi,

    I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in the community

https://community.cloudera.com/t5/Community-Articles/How-to-Submit-Spark-Application-through-Livy-RE...

I created a test1.json as follows

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"files": ["test1.py"]
}

 

I use GetFile to get the json file and call InvokeHTTP to POST the request. I get the following error in HTTP Response

This is HTTP response
invokehttp.response.body
"requirement failed: File is required."
 
I am not sure whether the field "files" should have the pyspark script or whether the pyspark script should be listed against 'pyFiles'. I tried this also but I get the same error. Please let me know what I am missing?
 
Thanks
Ganesh

 

6 REPLIES 6

avatar
Explorer

I found that there is a file option in the REST API and changed the option to 'file'

 

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"file": ["test1.py"]
}

Now I get the error ""Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 224] (through reference chain: org.apache.livy.server.batch.CreateBatchRequest[\"file\"

 

I think the REST API does not expect a .py file for the 'file' field. Can you let me know how this can be fixed?

 

avatar
Master Mentor

@TVGanesh 

Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. 

 

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}

You sort of interchanged them below is a comparison between  spark-submit command and it's equivalent in  Livy REST JSON protocol

livy2.PNG

Please  revert

 

 

 

 

 

 

avatar
Explorer

@Shelton No that does not work. I created a simple pyspark file which does not require HiveWarehouse connector and I tried to test1.py in pyFiles as you suggested

    {
      "pyFiles": ["test1.py"]
   }

 

But I get the error ""requirement failed: File is required.". HTTP Response 400:Bad Request.

Could you let me know what I am missing. Stuck here for days.

 

Thanks

Ganesh

 

 

 


@Shelton wrote:

@TVGanesh 

Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. 

 

 

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}

 

You sort of interchanged them below is a comparison between  spark-submit command and it's equivalent in  Livy REST JSON protocol

livy2.PNG

Please  revert

 

 

 

 

 

 



@Shelton wrote:

@TVGanesh 

Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. 

 

 

{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}

 

You sort of interchanged them below is a comparison between  spark-submit command and it's equivalent in  Livy REST JSON protocol

livy2.PNG

Please  revert

 

 

 

 

 

 


 

 

avatar
Master Mentor

@TVGanesh 

Isn't the Pyspark file expected in hdfs if using YARN instead of LOCAL? What is the configuartion of your livy.conf if you dont have it in place do the following.

{
"pyFiles": ["/user/tvganesh/test1.py"]
}

Copy the template file is rename it by stripping off .template in livy.conf.template.Then make sure that the following configurations are present in it. Make sure that forward slash is present in the end of the path. py files you should add the test1.py to hdfs and point to the hdfs location instead of from file system level since that won't be present for Livy locally.

Go to the the livy conf directory

cp /usr/hdp/3.1.0.0-78/etc/livy2/conf.dist/conf then copy livy.conf.template to livy.conf

Check the below parameters and set them accordingly

# What spark master Livy sessions should use.

 

livy.spark.master = local

 

# What spark deploy mode Livy sessions should use.

 

livy.spark.deploy-mode =

 

# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.

 

livy.repl.enable-hive-context =

 

# List of local directories from where files are allowed to be added to user sessions. By
# default it's empty, meaning users can only reference remote URIs when starting their
# sessions.

 

livy.file.local-dir-whitelist =

 

For local execution

 

livy.spark.master = local
livy.file.local-dir-whitelist =/home/tvganesh/ LOCAL

 

 

For YARN execution

 

livy.spark.master = yarn
livy.file.local-dir-whitelist =/user/tvganesh/ HDFS

 

Please do that and revert

 

 

avatar
Explorer

@Shelton Thanks for your response .

 

I found that for basic Hive access the following works

{

   "file":"hdfs-path/test1.py"

}

 

For Hive LLAP the JSON that works is

{

"jars": ["<path-to-jar>/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],

"pyFiles": ["<path-to-zip>/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip"],

"file": "<path-to-file>/test3.py"

}

Interesting when I put the zip in the "archives" field it was giving error. It works as "pyFiles" though.

 

avatar
Master Mentor

@TVGanesh 

Great, it worked out for you.  So if you think my answer helped resolve the issue then accept it to close the thread.

 

Happ hadooping.