Created on 03-05-2018 08:26 AM - edited 09-16-2022 05:56 AM
I have seen this error reported both in the context of Cloudera ecosystem and in the general python context. But the fact that this error is popping up even when I am not doing any string comparison seems a little strange; I thought that basic output of unicode was supported.
This error is coming from a basic select query:
Query: select my_column from my_table limit 1000000 Unknown Exception : 'ascii' codec can't encode character u'\xe8' in position 20483: ordinal not in range(128)
Has this been addressed in later versions? I happen to be on a rather old version:
$ impala-shell -v Impala Shell v2.1.0-cdh4 (11a45f8) built on Thu Dec 18 07:45:47 PST 2014
Also, please note that this does not appear related to my terminal, as I can print that character otherwise:
$ python -c 'print(u"\xe8");' è
Created on 03-09-2018 07:14 AM - edited 03-09-2018 07:18 AM
Hi Lars,
this is exactly the issue, at least for CDH5.13.2.
I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py
map(self.prettytable.add_row, rows) - return self.prettytable.get_string() + return self.prettytable.get_string().encode('utf-8') except Exception, e:
and now the command runs as appropriate.
Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."
Why is impala-shell affected when output is other than stdout?
To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is @epowell issue.
In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.
Created 03-12-2018 09:45 AM
Thank you @GeKas and @Lars Volker.
This has been incredibly helpful to know that it is a documented bug and, further, that patching the bug will resolve the issue.
As @GeKas pointed out, I too find it confusing that redirecting standard output triggers the python bug. At the point that stdout is being directed, isn't python out of the picture? Yet it triggers the same bug as if python had been used to write to the file.
In my case, I am trying to pipe the result to a hadoop process. What is strange is that I am almost certain that the script is using the -B flag, in which case I shouldn't be affected by this bug at all. I am going to investigate today.
Thanks again for all your help. This has been a really, really good experience on the Cloudera forum 🙂
Created 03-12-2018 12:18 PM
For Python it makes a difference whether output gets printed to the terminal (which in this case likely supports unicode) or output is redirected to a file (which means it needs to be encoded in ASCII).
This post on StackOverflow seems to describe the issue well. I linked the post in the JIRA for future reference.
Cheers, Lars
Created 03-15-2018 10:23 AM