<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Screwing up Python compatibility: unicode(), str(), and bytes()</title>
	<atom:link href="http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/feed" rel="self" type="application/rss+xml" />
	<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes</link>
	<description>by Gustavo Niemeyer</description>
	<lastBuildDate>Wed, 10 Mar 2010 15:15:34 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Gustavo Niemeyer</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67726</link>
		<dc:creator>Gustavo Niemeyer</dc:creator>
		<pubDate>Sun, 05 Jul 2009 15:25:27 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67726</guid>
		<description>Serge,

Indeed introducing bytes in a good way in 2.6 would take some work, but I disagree that the main points I&#039;m bringing up here were not simple.  Renaming &lt;i&gt;unicode&lt;/i&gt; to &lt;i&gt;str&lt;/i&gt; requires actually &lt;i&gt;more&lt;/i&gt; work than just keeping &lt;i&gt;unicode&lt;/i&gt; as-is.  If the goal of the &lt;i&gt;bytes&lt;/i&gt; type in 2.6 was simply to serve as a marker and there&#039;s no one willing to pay the bill for porting &lt;i&gt;bytes&lt;/i&gt; properly, then make it more obvious that this is a marker like I suggested above rather than introducing a built-in with the same name as the 3.0 type.

Michael,

&quot;all strings are unicode in Python 3.0 so the string type is called str. bytes literals are allowed in Python 2.6 but are just an alias for bytestrings and useful for conversion of code by 2to3&quot;

No, they&#039;re not all &lt;i&gt;unicode&lt;/i&gt; just like they were not all &lt;i&gt;unicode&lt;/i&gt; in 2.6. &quot;abc&quot; wasn&#039;t &lt;i&gt;unicode&lt;/i&gt; in 2.6. b&quot;abc&quot; is not unicode in 3.0.  The renaming from &lt;i&gt;unicode&lt;/i&gt; to &lt;i&gt;str&lt;/i&gt; is unjustified, the fact that we have a marker with the name of a new type in 3.0 is unjustified.  If you want me to admit being wrong stop ignoring the facts and look for some reasonable argumentation for this mess. 

&quot;In Python 3 your facetious “Remove str. Add bytes. Go home.” is exactly what happened. Removing str from Python 2 is obviously not possible (do you honestly not see that?).&quot;

No, that&#039;s not what happened either. Read the post again if you seriously don&#039;t know what happened.</description>
		<content:encoded><![CDATA[<p>Serge,</p>
<p>Indeed introducing bytes in a good way in 2.6 would take some work, but I disagree that the main points I&#8217;m bringing up here were not simple.  Renaming <i>unicode</i> to <i>str</i> requires actually <i>more</i> work than just keeping <i>unicode</i> as-is.  If the goal of the <i>bytes</i> type in 2.6 was simply to serve as a marker and there&#8217;s no one willing to pay the bill for porting <i>bytes</i> properly, then make it more obvious that this is a marker like I suggested above rather than introducing a built-in with the same name as the 3.0 type.</p>
<p>Michael,</p>
<p>&#8220;all strings are unicode in Python 3.0 so the string type is called str. bytes literals are allowed in Python 2.6 but are just an alias for bytestrings and useful for conversion of code by 2to3&#8243;</p>
<p>No, they&#8217;re not all <i>unicode</i> just like they were not all <i>unicode</i> in 2.6. &#8220;abc&#8221; wasn&#8217;t <i>unicode</i> in 2.6. b&#8221;abc&#8221; is not unicode in 3.0.  The renaming from <i>unicode</i> to <i>str</i> is unjustified, the fact that we have a marker with the name of a new type in 3.0 is unjustified.  If you want me to admit being wrong stop ignoring the facts and look for some reasonable argumentation for this mess. </p>
<p>&#8220;In Python 3 your facetious “Remove str. Add bytes. Go home.” is exactly what happened. Removing str from Python 2 is obviously not possible (do you honestly not see that?).&#8221;</p>
<p>No, that&#8217;s not what happened either. Read the post again if you seriously don&#8217;t know what happened.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Serge</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67722</link>
		<dc:creator>Serge</dc:creator>
		<pubDate>Sun, 05 Jul 2009 11:40:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67722</guid>
		<description>While your plan looks simple, but in reality it&#039;s pretty complex. It&#039;s not enough to just introduce bytes type into 2.6, you would have to make it actually be accepted everywhere if you want bytes type to be useful beyond toy applications. Then what about functions that return str in 2.6, would you keep them all unchanged or make them somehow return bytes? In the first case you would end up with some frustrating messy half-str/half-bytes world. In the second case I&#039;m not even sure how would you do it in all cases? I could go on, but the point is that it&#039;s really not simple.</description>
		<content:encoded><![CDATA[<p>While your plan looks simple, but in reality it&#8217;s pretty complex. It&#8217;s not enough to just introduce bytes type into 2.6, you would have to make it actually be accepted everywhere if you want bytes type to be useful beyond toy applications. Then what about functions that return str in 2.6, would you keep them all unchanged or make them somehow return bytes? In the first case you would end up with some frustrating messy half-str/half-bytes world. In the second case I&#8217;m not even sure how would you do it in all cases? I could go on, but the point is that it&#8217;s really not simple.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Foord</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67717</link>
		<dc:creator>Michael Foord</dc:creator>
		<pubDate>Sun, 05 Jul 2009 09:58:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67717</guid>
		<description>What is so hard about:

&quot;all strings are unicode in Python 3.0 so the string type is called str. bytes literals are allowed in Python 2.6 but are just an alias for bytestrings and useful for conversion of code by 2to3&quot;

*Anything* can be made confusing if you deliberately belabour a point rather than admit you were wrond...

In Python 3 your facetious &quot;Remove str. Add bytes. Go home.&quot; is exactly what happened. Removing str from Python 2 is obviously not possible (do you honestly not see that?).</description>
		<content:encoded><![CDATA[<p>What is so hard about:</p>
<p>&#8220;all strings are unicode in Python 3.0 so the string type is called str. bytes literals are allowed in Python 2.6 but are just an alias for bytestrings and useful for conversion of code by 2to3&#8243;</p>
<p>*Anything* can be made confusing if you deliberately belabour a point rather than admit you were wrond&#8230;</p>
<p>In Python 3 your facetious &#8220;Remove str. Add bytes. Go home.&#8221; is exactly what happened. Removing str from Python 2 is obviously not possible (do you honestly not see that?).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gustavo Niemeyer</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67702</link>
		<dc:creator>Gustavo Niemeyer</dc:creator>
		<pubDate>Sun, 05 Jul 2009 01:09:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67702</guid>
		<description>Yes, I understand perfectly what Christian nicely pointed out.  I just think it&#039;s a bad idea.

&lt;i&gt;bytes&lt;/i&gt; is not a marker.  It&#039;s the name of a new type in 3.0.  If you want a marker, do something like &quot;from __future__ import strtobytes&quot;, or &quot;from 2to3 import bytesmarker&quot; or whatever else.  

I&#039;m puzzled to see so many smart people saying that it&#039;s totally fine that we&#039;ll have to explain to people &lt;i&gt;&quot;Oh, yeah, unicode is actually str.. no, I mean, unicode is still unicode in 3.0, but it&#039;s named str, and str in 2.6 is actually what used to be bytes, but bytes was really str, because there was that 2to3 migration thing.&quot;&lt;/i&gt;

Sorry, but that mess &lt;i&gt;really&lt;/i&gt; wasn&#039;t necessary.  Remove &lt;i&gt;str&lt;/i&gt;. Add &lt;i&gt;bytes&lt;/i&gt;. Go home.</description>
		<content:encoded><![CDATA[<p>Yes, I understand perfectly what Christian nicely pointed out.  I just think it&#8217;s a bad idea.</p>
<p><i>bytes</i> is not a marker.  It&#8217;s the name of a new type in 3.0.  If you want a marker, do something like &#8220;from __future__ import strtobytes&#8221;, or &#8220;from 2to3 import bytesmarker&#8221; or whatever else.  </p>
<p>I&#8217;m puzzled to see so many smart people saying that it&#8217;s totally fine that we&#8217;ll have to explain to people <i>&#8220;Oh, yeah, unicode is actually str.. no, I mean, unicode is still unicode in 3.0, but it&#8217;s named str, and str in 2.6 is actually what used to be bytes, but bytes was really str, because there was that 2to3 migration thing.&#8221;</i></p>
<p>Sorry, but that mess <i>really</i> wasn&#8217;t necessary.  Remove <i>str</i>. Add <i>bytes</i>. Go home.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Michael Foord</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67701</link>
		<dc:creator>Michael Foord</dc:creator>
		<pubDate>Sun, 05 Jul 2009 00:25:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67701</guid>
		<description>I&#039;m afraid Christian is right when he says that you don&#039;t &#039;get&#039; the intent of the bytes literal alias in Python 2.6.

As you have pointed out yourself there are important semantic differences between the bytes type in Python 3.0 and the bytestring in Python 2. Indexing, iterating and the in operator being amongst them.

If the bytes type were to be fully ported then every builtin function and the builtin types would need to be modified to support them. What is worse the standard library would also need to be modified and case-by-case decision made as to if / how to support bytes.

If it were not done fully and only the basic type backported then you wouldn&#039;t be able to use the bytes type in Python 2.X code as you do in Python 3. This means that 2to3 could no longer reliably convert coded using bytes as you have to special case it in your Python 2 code.

As the *purpose* of the alias is to be a hint for 2to3 it would negate the purpose entirely and be a much worse change...</description>
		<content:encoded><![CDATA[<p>I&#8217;m afraid Christian is right when he says that you don&#8217;t &#8216;get&#8217; the intent of the bytes literal alias in Python 2.6.</p>
<p>As you have pointed out yourself there are important semantic differences between the bytes type in Python 3.0 and the bytestring in Python 2. Indexing, iterating and the in operator being amongst them.</p>
<p>If the bytes type were to be fully ported then every builtin function and the builtin types would need to be modified to support them. What is worse the standard library would also need to be modified and case-by-case decision made as to if / how to support bytes.</p>
<p>If it were not done fully and only the basic type backported then you wouldn&#8217;t be able to use the bytes type in Python 2.X code as you do in Python 3. This means that 2to3 could no longer reliably convert coded using bytes as you have to special case it in your Python 2 code.</p>
<p>As the *purpose* of the alias is to be a hint for 2to3 it would negate the purpose entirely and be a much worse change&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gustavo Niemeyer</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67679</link>
		<dc:creator>Gustavo Niemeyer</dc:creator>
		<pubDate>Sat, 04 Jul 2009 11:31:39 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67679</guid>
		<description>Robin,

There are actually some important differences on the semantics of the &quot;real&quot; &lt;i&gt;bytes&lt;/i&gt; when compared to &lt;i&gt;str&lt;/i&gt;.  Here is a hint:
&lt;code&gt;
&gt;&gt;&gt; list(b&quot;asd&quot;)
[97, 115, 100]
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>Robin,</p>
<p>There are actually some important differences on the semantics of the &#8220;real&#8221; <i>bytes</i> when compared to <i>str</i>.  Here is a hint:<br />
<code><br />
>>> list(b"asd")<br />
[97, 115, 100]<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Robin Munn</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67670</link>
		<dc:creator>Robin Munn</dc:creator>
		<pubDate>Sat, 04 Jul 2009 02:57:43 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67670</guid>
		<description>Here&#039;s what I don&#039;t understand about your post:

&quot;This means that if you write code to support &lt;i&gt;bytes&lt;/i&gt; in 2.6, you are actually not writing code which is compatible with Python 3.0.&quot;

Huh? Explain this to me, because I&#039;m not seeing it.

If you write code that uses &lt;i&gt;bytes&lt;/i&gt; in 2.6, you&#039;re clearly not intending to treat it as equivalent to &lt;i&gt;str&lt;/i&gt; (i.e., character data), or you&#039;d just use &lt;i&gt;str&lt;/i&gt;. Instead, you&#039;re intending to treat it as an 8-sit string, which is the same way it will behave in 3.0.

Now, if your code looks like:

b = bytes()
if isinstance(b, str): print &quot;Let&#039;s mess up our forward compatibility! It&#039;ll be fun!&quot;

then yes, you&#039;ll have problems. But I cannot come up with a sane use case for this kind of check.

So the thing that you&#039;re decrying as a major problem, I cannot see why it would be a problem. Yes, the type that &lt;i&gt;bytes&lt;/i&gt; is an alias for will change, but it retains its semantics: 8-bit strings not intended to be used as character data. So why, exactly, is this change a problem?

P.S. Since tone of voice is hard to communicate, I should state that I&#039;m not being ironic or sarcastic. I&#039;m truly puzzled by why you think this is a problem.</description>
		<content:encoded><![CDATA[<p>Here&#8217;s what I don&#8217;t understand about your post:</p>
<p>&#8220;This means that if you write code to support <i>bytes</i> in 2.6, you are actually not writing code which is compatible with Python 3.0.&#8221;</p>
<p>Huh? Explain this to me, because I&#8217;m not seeing it.</p>
<p>If you write code that uses <i>bytes</i> in 2.6, you&#8217;re clearly not intending to treat it as equivalent to <i>str</i> (i.e., character data), or you&#8217;d just use <i>str</i>. Instead, you&#8217;re intending to treat it as an 8-sit string, which is the same way it will behave in 3.0.</p>
<p>Now, if your code looks like:</p>
<p>b = bytes()<br />
if isinstance(b, str): print &#8220;Let&#8217;s mess up our forward compatibility! It&#8217;ll be fun!&#8221;</p>
<p>then yes, you&#8217;ll have problems. But I cannot come up with a sane use case for this kind of check.</p>
<p>So the thing that you&#8217;re decrying as a major problem, I cannot see why it would be a problem. Yes, the type that <i>bytes</i> is an alias for will change, but it retains its semantics: 8-bit strings not intended to be used as character data. So why, exactly, is this change a problem?</p>
<p>P.S. Since tone of voice is hard to communicate, I should state that I&#8217;m not being ironic or sarcastic. I&#8217;m truly puzzled by why you think this is a problem.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gustavo Niemeyer</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67654</link>
		<dc:creator>Gustavo Niemeyer</dc:creator>
		<pubDate>Fri, 03 Jul 2009 19:55:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67654</guid>
		<description>As you know I&#039;ve been involved in Python for a while, and I hope you perceive that this post where you commented upon is an attempt to bring things to the attention of Python developers and influence the process.  Since you&#039;re here (in an anonymous attempt, arguably), I guess it&#039;s working.</description>
		<content:encoded><![CDATA[<p>As you know I&#8217;ve been involved in Python for a while, and I hope you perceive that this post where you commented upon is an attempt to bring things to the attention of Python developers and influence the process.  Since you&#8217;re here (in an anonymous attempt, arguably), I guess it&#8217;s working.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: right</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67653</link>
		<dc:creator>right</dc:creator>
		<pubDate>Fri, 03 Jul 2009 19:41:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67653</guid>
		<description>And where and when did it occur to you to try and bring your superior ideas to the attention of python developers and attempt to influence the process?</description>
		<content:encoded><![CDATA[<p>And where and when did it occur to you to try and bring your superior ideas to the attention of python developers and attempt to influence the process?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Allen Short</title>
		<link>http://blog.labix.org/2009/07/02/screwing-up-python-compatibility-unicode-str-bytes/comment-page-1#comment-67645</link>
		<dc:creator>Allen Short</dc:creator>
		<pubDate>Fri, 03 Jul 2009 16:41:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.labix.org/?p=133#comment-67645</guid>
		<description>&gt; There was an obvious chance to make the migration smoother and straightforward without the help of a code migration tool, and it was dropped in favor of a convoluted choice which breaks the language backwards compatibility in an awkward way gratuitously.

While this is true, they&#039;d already made this decision for other areas of the standard library. So doing it again for bytes/unicode doesn&#039;t make things (much) worse.</description>
		<content:encoded><![CDATA[<p>&gt; There was an obvious chance to make the migration smoother and straightforward without the help of a code migration tool, and it was dropped in favor of a convoluted choice which breaks the language backwards compatibility in an awkward way gratuitously.</p>
<p>While this is true, they&#8217;d already made this decision for other areas of the standard library. So doing it again for bytes/unicode doesn&#8217;t make things (much) worse.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
