About TVGanesh

TVGanesh · ‎03-04-2020

Mattwho, Thanks for your comments. After reading your mail, I spent a lot of time thinking. I saw that 8 threads were created, and there was no performance improvement because all the threads were doing the same thing and executing the script on all files that were unpacked. Later your comment that the threads actually operate on the flowfiles, I changed the code so that it accepts the flow files as one of the input and processes using multiple threads. This improved the performance by 30%. The time taken dropped from 13 mins to 3-4 mins. So many thanks for your comments, I now understand how to use concurrent tasks.

TVGanesh · ‎03-03-2020

Hi, I have an ExecuteStreamCommand processor which executes a Python script. This takes a long time to execute ~ 5 mins. So I increased the number of concurrent tasks from 1, 4 and 8 but this had. no impact on the performance. I have an 8 core Intel i9 Mac machine with 32 GB RAM. I read that typically the number of concurrent tasks is roughly equal to 2 or 4 times the cores. Could you let me know why there is improvement? How can I improve the performance? Thanks Ganesh

TVGanesh · ‎01-14-2020

@Shelton Thanks for your response . I found that for basic Hive access the following works { "file":"hdfs-path/test1.py" } For Hive LLAP the JSON that works is { "jars": ["<path-to-jar>/hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["<path-to-zip>/hive_warehouse_connector/pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "file": "<path-to-file>/test3.py" } Interesting when I put the zip in the "archives" field it was giving error. It works as "pyFiles" though.

TVGanesh · ‎01-13-2020

@Shelton No that does not work. I created a simple pyspark file which does not require HiveWarehouse connector and I tried to test1.py in pyFiles as you suggested { "pyFiles": ["test1.py"] } But I get the error ""requirement failed: File is required.". HTTP Response 400:Bad Request. Could you let me know what I am missing. Stuck here for days. Thanks Ganesh @Shelton wrote: @TVGanesh Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. { "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["test1.py"] "archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"] } You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol Please revert @Shelton wrote: @TVGanesh Indeed the problem is with the input values you have for zip and .py file can you change it like below and try it should work. { "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["test1.py"] "archives": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"] } You sort of interchanged them below is a comparison between spark-submit command and it's equivalent in Livy REST JSON protocol Please revert

TVGanesh · ‎01-10-2020

I found that there is a file option in the REST API and changed the option to 'file' { "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "file": ["test1.py"] } Now I get the error ""Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 224] (through reference chain: org.apache.livy.server.batch.CreateBatchRequest[\"file\" I think the REST API does not expect a .py file for the 'file' field. Can you let me know how this can be fixed?

TVGanesh · ‎01-10-2020

Hi, I want to submit a Pyspark application to Livy through REST API to invoke HiveWarehouse Connector. Based on this answer in the community https://community.cloudera.com/t5/Community-Articles/How-to-Submit-Spark-Application-through-Livy-REST-API/ta-p/247502 I created a test1.json as follows { "jars": ["hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar"], "pyFiles": ["pyspark_hwc-1.0.0.3.1.0.0-78.zip"], "files": ["test1.py"] } I use GetFile to get the json file and call InvokeHTTP to POST the request. I get the following error in HTTP Response This is HTTP response invokehttp.response.body "requirement failed: File is required." I am not sure whether the field "files" should have the pyspark script or whether the pyspark script should be listed against 'pyFiles'. I tried this also but I get the same error. Please let me know what I am missing? Thanks Ganesh

TVGanesh · ‎11-15-2019

@cjervis - The popup would occur after I login and go to any Cloudera tutorials. Then it would keep popping every 20 secs. I solved it by deleting the history and going to the tutorials again. Then I saw the popup which required me to put in my details. After that it went away 🙂

TVGanesh · ‎11-15-2019

I was able to fix this by removing cookies and getting the popup where I had fill in my details like name phone number. Then it went away

TVGanesh · ‎11-15-2019

Hi, I registered to Cloudera and posted 2 questions earlier. Now whenever I login to Cloudera I get the following annoying popup our form submission has failed. This may have been caused by one of the following: Your request timed out A plugin/browser extension blocked the submission. If you have an ad blocking plugin please disable it and close this message to reload the page." I removed all my extensions in Chrome. It still kept coming. Then I uninstalled and reinstalled Chrome. No success. I installed Firefox and I get the popup. I don't know what I should do Any suggestions? Thanks Ganesh

Online	Offline
Last Visited	‎03-04-2020 09:58 PM

Member Since	‎11-15-2019 12:45 AM
Last Visited	‎03-04-2020 09:58 PM
Posts	14

Cloudera Community

Re: Increasing concurrent tasks not improving perf...

Re: Annoying popup when I come to Cloudera

Re: Increasing concurrent tasks not improving perf...

Increasing concurrent tasks not improving performa...

Re: Error while submitting PySpark Application thr...

Re: Error while submitting PySpark Application thr...

Re: Error while submitting PySpark Application thr...

Error while submitting PySpark Application through...

Re: Annoying popup when I come to Cloudera

Re: Annoying popup when I come to Cloudera

Annoying popup when I come to Cloudera