08-22-2018 10:21 AM
I've noticed that some of my outputted query statements from utilizing the get_impala_queries() function in one of my Python scripts are cut off after reaching 10,000 characters in length. I don't want to post the actual query statements themselves since it is for my company, but I was curious if there are limitations set on the JSON files for these apiImpalaQuery objects. If so, is there any way to work around this?
Steps to re-create situation:
1. call get_impala_queries() in Python script
2. access queries to obtain apiImpalaQuery objects
3. within each query, access the statement property
4. any query that is over 10,000 characters in length is cut-off, but ends with an added "..." to signify it isn't the end of the query
08-23-2018 01:56 AM
Hi I had a similar problem but not related to the Query size, but the query list. There were queries executed on the cluster and not returned by the API. I have found out that I need to timefilter and offeset to parse all the paginated data.
08-23-2018 10:39 AM
09-05-2018 11:53 AM
I'd really appreciate a response on this. I understand if there's a hard limit on the size of the JSON files I'm extracting, but I'm wondering if it's something on my end. We would really like to access these queries for measuring performance.
09-05-2018 03:12 PM
I checked the code and it seems that you are correct and this truncation was by design. The API calls you are making are also used internally to return search results for Impala Queries in Cloudera Manager's UI.
When the queries got long, this hurt Cloudera Manager performance, so the decision was made to truncate the query string at 10000 characters. You can get the full query string by looking at that query itself.
I think the idea for the Python API is that you would return the list of queries and then call get_query_details(self, query_id, format='text') to get the details if you want to see the full query.
If you want to, you can try adjusting an internal configuration value in Cloudera manager to verify that the limit you are hitting is indeed the one I've asserted.
Back up your /etc/default/cloudera-scm-server file
Edit that file and add the following to the CMF_JAVA_OPTS environment variable:
If you have not made any previous edits to the file, the result will look like this:
export CMF_JAVA_OPTS="-Xmx2G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -Dcom.cloudera.server.web.cmf.impala.components.ImpalaDao.IMPALA_SQL_STATEMENT_MAX_DISPLAY_LENGTH=20000"
Restart Cloudera Manager with:
# service cloudera-scm-server restart
If you now see trucation at 20000 characters, it seems we have our cause.
NOTE: this will also impact performance inside of Cloudera Manager's impala queries UI. Just be aware that this is not dangerous but a bit experimental, so use due caution if working on a production system.
Another note is that you can adjsut the max display length as you desire, but it only takes integer values (which should be plenty big enough for most queries).
09-07-2018 01:32 PM
Thanks for the detailed response!
I'm nearing the end of my internship, but my team is going to look into this. We'll be experimenting it outside of production at first to be safe, but overall it looks like a relatively simple fix. I'll update your response as the solution if it works.
10-17-2018 04:50 PM
This didn't work. I tested a query with 10,250 characters after making the change and restarting the service, but the query was still truncated at the exact same spot. Are there any alternatives or other ways to fix this?