Hi,
I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in the community
I created a test1.json as follows
{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"files": ["test1.py"]
}
I use GetFile to get the json file and call InvokeHTTP to POST the request. I get the following error in HTTP Response
Created 01-10-2020 08:00 PM
I found that there is a file option in the REST API and changed the option to 'file'
{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"file": ["test1.py"]
}
Now I get the error ""Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 224] (through reference chain: org.apache.livy.server.batch.CreateBatchRequest[\"file\"
I think the REST API does not expect a .py file for the 'file' field. Can you let me know how this can be fixed?
Created 01-11-2020 05:15 AM
Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work.
{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}
You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol
Please revert
Created 01-13-2020 10:32 PM
@Shelton No that does not work. I created a simple pyspark file which does not require HiveWarehouse connector and I tried to test1.py in pyFiles as you suggested
{
"pyFiles": ["test1.py"]
}
But I get the error ""requirement failed: File is required.". HTTP Response 400:Bad Request.
Could you let me know what I am missing. Stuck here for days.
Thanks
Ganesh
@Shelton wrote:Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work.
{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}
You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol
Please revert
@Shelton wrote:Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work.
{
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
}
You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol
Please revert
Created on 01-14-2020 01:11 AM - edited 01-14-2020 01:42 AM
Isn't the Pyspark file expected in hdfs if using YARN instead of LOCAL? What is the configuartion of your livy.conf if you dont have it in place do the following.
{
"pyFiles": ["/user/tvganesh/test1.py"]
}
Copy the template file is rename it by stripping off .template in livy.conf.template.Then make sure that the following configurations are present in it. Make sure that forward slash is present in the end of the path. py files you should add the test1.py to hdfs and point to the hdfs location instead of from file system level since that won't be present for Livy locally.
Go to the the livy conf directory
cp /usr/hdp/3.1.0.0-78/etc/livy2/conf.dist/conf then copy livy.conf.template to livy.conf
Check the below parameters and set them accordingly
# What spark master Livy sessions should use.
livy.spark.master = local
# What spark deploy mode Livy sessions should use.
livy.spark.deploy-mode =
# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.
livy.repl.enable-hive-context =
# List of local directories from where files are allowed to be added to user sessions. By
# default it's empty, meaning users can only reference remote URIs when starting their
# sessions.
livy.file.local-dir-whitelist =
For local execution
livy.spark.master = local
livy.file.local-dir-whitelist =/home/tvganesh/ LOCAL
For YARN execution
livy.spark.master = yarn
livy.file.local-dir-whitelist =/user/tvganesh/ HDFS
Please do that and revert
Created 01-14-2020 07:35 AM
@Shelton Thanks for your response .
I found that for basic Hive access the following works
{
"file":"hdfs-path/test1.py"
}
For Hive LLAP the JSON that works is
{
"jars": ["<path-to-jar>/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["<path-to-zip>/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip"],
"file": "<path-to-file>/test3.py"
}
Interesting when I put the zip in the "archives" field it was giving error. It works as "pyFiles" though.
Created 01-14-2020 07:42 AM
Great, it worked out for you. So if you think my answer helped resolve the issue then accept it to close the thread.
Happ hadooping.