Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

'ascii' codec can't encode character u'\xe8' in position 326681: ordinal not in range(128)

avatar
Expert Contributor

I have seen this error reported both in the context of Cloudera ecosystem and in the general python context.  But the fact that this error is popping up even when I am not doing any string comparison seems a little strange; I thought that basic output of unicode was supported.

 

This error is coming from a basic select query:

Query: select my_column from my_table limit 1000000
Unknown Exception : 'ascii' codec can't encode character u'\xe8' in position 20483: ordinal not in range(128)

 

Has this been addressed in later versions?  I happen to be on a rather old version:

$ impala-shell -v
Impala Shell v2.1.0-cdh4 (11a45f8) built on Thu Dec 18 07:45:47 PST 2014

Also, please note that this does not appear related to my terminal, as I can print that character otherwise:

$ python -c 'print(u"\xe8");'
è
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi Lars,

this is exactly the issue, at least for CDH5.13.2.

I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py

       map(self.prettytable.add_row, rows)
-      return self.prettytable.get_string()
+      return self.prettytable.get_string().encode('utf-8')
     except Exception, e:

and now the command runs as appropriate.

Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."

Why is impala-shell affected when output is other than stdout?

To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is  @epowell issue.

 

In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.

View solution in original post

12 REPLIES 12

avatar
Expert Contributor

Thank you @GeKas and @Lars Volker.

 

This has been incredibly helpful to know that it is a documented bug and, further, that patching the bug will resolve the issue.

 

As @GeKas pointed out, I too find it confusing that redirecting standard output triggers the python bug.  At the point that stdout is being directed, isn't python out of the picture?  Yet it triggers the same bug as if python had been used to write to the file.

 

In my case, I am trying to pipe the result to a hadoop process.  What is strange is that I am almost certain that the script is using the -B flag, in which case I shouldn't be affected by this bug at all.  I am going to investigate today. 

 

Thanks again for all your help.  This has been a really, really good experience on the Cloudera forum 🙂

avatar
Super Collaborator

For Python it makes a difference whether output gets printed to the terminal (which in this case likely supports unicode) or output is redirected to a file (which means it needs to be encoded in ASCII).

 

This post on StackOverflow seems to describe the issue well. I linked the post in the JIRA for future reference.

 

Cheers, Lars

avatar
Expert Contributor

Thank you, @Lars Volker.

 

Definitely learned something new there!