Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

'ascii' codec can't encode character u'\xe8' in position 326681: ordinal not in range(128)

avatar
Expert Contributor

I have seen this error reported both in the context of Cloudera ecosystem and in the general python context.  But the fact that this error is popping up even when I am not doing any string comparison seems a little strange; I thought that basic output of unicode was supported.

 

This error is coming from a basic select query:

Query: select my_column from my_table limit 1000000
Unknown Exception : 'ascii' codec can't encode character u'\xe8' in position 20483: ordinal not in range(128)

 

Has this been addressed in later versions?  I happen to be on a rather old version:

$ impala-shell -v
Impala Shell v2.1.0-cdh4 (11a45f8) built on Thu Dec 18 07:45:47 PST 2014

Also, please note that this does not appear related to my terminal, as I can print that character otherwise:

$ python -c 'print(u"\xe8");'
è
1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi Lars,

this is exactly the issue, at least for CDH5.13.2.

I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py

       map(self.prettytable.add_row, rows)
-      return self.prettytable.get_string()
+      return self.prettytable.get_string().encode('utf-8')
     except Exception, e:

and now the command runs as appropriate.

Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."

Why is impala-shell affected when output is other than stdout?

To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is  @epowell issue.

 

In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.

View solution in original post

12 REPLIES 12

avatar
Super Collaborator

This is supposed to be fixed since impala 1.4

https://issues.apache.org/jira/browse/IMPALA-489

You can check also https://issues.cloudera.org/browse/IMPALA-607 it may help you deal with it.

 

Are you really using cdh4?

avatar
Expert Contributor

The fact that this was fixed in v1.4 is what is confusing to me, espcecially because I appear to be on version 2.1.

 

Also, I am able to run the command that was failing in that forum thread.  That is also a bit confusing.

 

 impala-shell -i 10.0.0.1 -d myDB -q "select 'Ѳ'"
Starting Impala Shell without Kerberos authentication
Connected to 10.0.0.1:21000
Server version: impalad version 2.1.0-cdh4 RELEASE (build 11a45f84eb1f0d441ebad72cf9d65262f6cc2391)
Query: use `myDB`
Query: select 'Ѳ'
Query submitted at: 2018-03-06 16:52:51 (Coordinator: None)
Query progress can be monitored at: None/query_plan?query_id=5040fadc08d0c83e:64b2514f9d270fb5
+-----+
| 'ѳ' |
+-----+
| Ѳ   |
+-----+

Yes, I am really using CDH4.  Do you think that I wouldn't be facing this issue on CDH5?

 

Thanks a bunch for your help!

avatar
Super Collaborator

CDH4 is 4 years old (last updated was on 2014 I think).

So even if Impala supports it, HDFS and Hive are too old and you may encounter various issues.

If you have such option, then upgrade to CDH5.14.

avatar
Expert Contributor

Thank you for your reply and your advice, @GeKas.

 

The issue has developed a bit more since my last reply, and has become even more puzzling.  Maybe this is just an old bug that I'm tripping on.

 

It now appears that the issue only occurs when I write the results to a file AND do not include the -B parameter.  I did not detect this pattern until now but it appears consistent with my prior accounts.

 

Here is everything working when output to the console:

$ impala-shell --print_header -i 10.0.0.1 -d myDB -q 'select phenolist from myTable where startpos = 225609789'
Starting Impala Shell without Kerberos authentication
Connected to 10.0.0.1:21000
Server version: impalad version 2.1.0-cdh4 RELEASE (build 11a45f84eb1f0d441ebad72cf9d65262f6cc2391)
Query: use `myDB`
Query: select phenolist from myTable where startpos = 225609789
+---------------------+
| phenolist           |
+---------------------+
| Pelger-Huët anomaly |
+---------------------+
Fetched 1 row(s) in 0.35s

If I direct results to a file, I get the error:

$ impala-shell ....  > tmp
Unknown Exception : 'ascii' codec can't encode character u'\xeb' in position 83: ordinal not in range(128)

And finally, if I add the -B parameter, things work again:

$ impala-shell -B ...  > tmp
Fetched 1 row(s) in 0.35s

avatar
Super Collaborator

Just quickly tested your query.

The same error appears on impalad version 2.10.0-cdh5.13.1 RELEASE (build 1e4b23c4eb52dac95c5be6316f49685c41783c51)

The same error appears if you use:

$ impala-shell ....  -o tmp

which is the same as redirecting output to the file.

You can always use -B, or if you want headers as well, you can use:

$ impala-shell ..... --print_header -B -o tmp

you can also specify the delimiter with "--output_delimiter"

avatar
Super Collaborator

This looks like IMPALA-2717 to me. The Jira has a patch attached to it, but no-one ever seems to have pushed a code review for this. Unfortunately there's no targeted release for this issue. Contributions are always welcome, let me know if you want to give it a shot.

 

Cheers, Lars

avatar
Super Collaborator

Hi Lars,

this is exactly the issue, at least for CDH5.13.2.

I have applied the patch to /opt/cloudera/parcels/CDH/lib/impala-shell/lib/shell_output.py

       map(self.prettytable.add_row, rows)
-      return self.prettytable.get_string()
+      return self.prettytable.get_string().encode('utf-8')
     except Exception, e:

and now the command runs as appropriate.

Something not clear to me, I understand this patch to affect command when output file is specified with "-o file", but I do not understand it when output is redirected with "> file". It fails even you want to parse output , e.g. "| grep ...."

Why is impala-shell affected when output is other than stdout?

To be honest, normally there is almost no reason to parse the results (or store) without "-B" for me. On the other hand, this is  @epowell issue.

 

In my opinion, you should definitely add this patch, as everything should work with utf-8 encoding.

avatar
Super Collaborator

Hi GeKas,

 

I'm not sure I understood your question. In general, writing to stdout should respect the local language settings of your shell:

 

$ echo $LANG
en_US.UTF-8

 

Writing to a file however does not need to respect these, so it's behavior may be different.

avatar
Super Collaborator

Hi Lars,

I totally agree with you on that. I was not clear, so let me rephrase it.

Without the fix, the below command runs successfully:

 

$ impala-shell -i localhost:21000 -q "select 'Ѳ'"

While, the below command, fails:

 

 

$ impala-shell -i localhost:21000 -q "select 'Ѳ'" -o tmp
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.10.0-cdh5.13.2 RELEASE (build dc867db57915f55b697ef8cd9e00404a9385231a)
Query: select 'Ѳ'
Query submitted at: 2018-03-12 10:20:07 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=7c476a1defbdad29:de8ff88700000000
Unknown Exception : 'ascii' codec can't encode character u'\u0473' in position 11: ordinal not in range(128)
Could not execute command: select 'Ѳ'

But when I use the first command with redirect, fails also, which is not expected:

$ impala-shell -i localhost:21000 -q "select 'Ѳ'" > tmp
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.10.0-cdh5.13.2 RELEASE (build dc867db57915f55b697ef8cd9e00404a9385231a)
Query: select 'Ѳ'
Query submitted at: 2018-03-12 10:22:12 (Coordinator: http://localhost:25000)
Query progress can be monitored at: http://localhost:25000/query_plan?query_id=f74a6fde13e4760a:a323131d00000000
Unknown Exception : 'ascii' codec can't encode character u'\u0473' in position 11: ordinal not in range(128)
Could not execute command: select 'Ѳ'

But, as I mentioned before, this is @epowell's post. I just expressed my concern.