This has been incredibly helpful to know that it is a documented bug and, further, that patching the bug will resolve the issue.
As @GeKas pointed out, I too find it confusing that redirecting standard output triggers the python bug. At the point that stdout is being directed, isn't python out of the picture? Yet it triggers the same bug as if python had been used to write to the file.
In my case, I am trying to pipe the result to a hadoop process. What is strange is that I am almost certain that the script is using the -B flag, in which case I shouldn't be affected by this bug at all. I am going to investigate today.
Thanks again for all your help. This has been a really, really good experience on the Cloudera forum :)
For Python it makes a difference whether output gets printed to the terminal (which in this case likely supports unicode) or output is redirected to a file (which means it needs to be encoded in ASCII).
This post on StackOverflow seems to describe the issue well. I linked the post in the JIRA for future reference.