Member since
11-15-2019
14
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2684 | 03-04-2020 06:11 AM | |
2319 | 11-15-2019 05:35 AM |
03-04-2020
06:11 AM
Mattwho, Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as one of the input and processes using multiple threads. This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins. So many thanks for your comments, I now understand how to use concurrent tasks.
... View more
03-03-2020
05:44 AM
Hi,
I have an ExecuteStreamCommand processor which executes a Python script. This takes a long time to execute ~ 5 mins. So I increased the number of concurrent tasks from 1, 4 and 8 but this had. no impact on the performance. I have an 8 core Intel i9 Mac machine with 32 GB RAM. I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores. Could you let me know why there is improvement? How can I improve the performance?
Thanks
Ganesh
... View more
Labels:
- Labels:
-
Apache NiFi
01-14-2020
07:35 AM
@Shelton Thanks for your response . I found that for basic Hive access the following works { "file":"hdfs-path/test1.py" } For Hive LLAP the JSON that works is { "jars": ["<path-to-jar>/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["<path-to-zip>/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "file": "<path-to-file>/test3.py" } Interesting when I put the zip in the "archives" field it was giving error. It works as "pyFiles" though.
... View more
01-13-2020
10:32 PM
@Shelton No that does not work. I created a simple pyspark file which does not require HiveWarehouse connector and I tried to test1.py in pyFiles as you suggested { "pyFiles": ["test1.py"] } But I get the error ""requirement failed: File is required.". HTTP Response 400:Bad Request. Could you let me know what I am missing. Stuck here for days. Thanks Ganesh @Shelton wrote: @TVGanesh Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. {
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
} You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol Please revert @Shelton wrote: @TVGanesh Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. {
"jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"],
"pyFiles": ["test1.py"]
"archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"]
} You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol Please revert
... View more
01-10-2020
08:00 PM
I found that there is a file option in the REST API and changed the option to 'file' { "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "file": ["test1.py"] } Now I get the error ""Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 224] (through reference chain: org.apache.livy.server.batch.CreateBatchRequest[\"file\" I think the REST API does not expect a .py file for the 'file' field. Can you let me know how this can be fixed?
... View more
01-10-2020
07:24 AM
Hi,
I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in the community
https://community.cloudera.com/t5/Community-Articles/How-to-Submit-Spark-Application-through-Livy-REST-API/ta-p/247502
I created a test1.json as follows
{ "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "files": ["test1.py"] }
I use GetFile to get the json file and call InvokeHTTP to POST the request. I get the following error in HTTP Response
This is HTTP response
invokehttp.response.body
"requirement failed: File is required."
I am not sure whether the field "files" should have the pyspark script or whether the pyspark script should be listed against 'pyFiles'. I tried this also but I get the same error. Please let me know what I am missing?
Thanks
Ganesh
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
11-15-2019
05:59 AM
@cjervis - The popup would occur after I login and go to any Cloudera tutorials. Then it would keep popping every 20 secs. I solved it by deleting the history and going to the tutorials again. Then I saw the popup which required me to put in my details. After that it went away 🙂
... View more
11-15-2019
05:35 AM
I was able to fix this by removing cookies and getting the popup where I had fill in my details like name phone number. Then it went away
... View more
11-15-2019
03:10 AM
Hi, I registered to Cloudera and posted 2 questions earlier. Now whenever I login to Cloudera I get the following annoying popup our form submission has failed. This may have been caused by one of the following: Your request timed out A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page." I removed all my extensions in Chrome. It still kept coming. Then I uninstalled and reinstalled Chrome. No success. I installed Firefox and I get the popup. I don't know what I should do Any suggestions? Thanks Ganesh
... View more