About this blog…

I am employed by Netnod as head of engineering, research and development and am among other things chair of the Security and Stability Advisory Committee at ICANN. You can find CV and photos of me at this page.

As I wear so many hats, I find it being necessary to somewhere express my personal view on things. This is the location where that happens. Postings on this blog, or at Facebook, Twitter etc, falls under this policy.

The views expressed on this post are mine and do not necessarily reflect the views of Netnod or any other of the organisations I have connections to.

Third example of Bidi issues

I have written earlier about Bidi issues here and here. I have here two more examples, and after these examples I hope people understand this is a real issue.

In the first example, you see that although the normal domain name 2est.com is a legitimate domain name that can be delegated, if one delegate from that zone a domain name that ends with a character with Left to Right directionality, it will change place with the number).

In the second example, you see two different strings (different logical order) that are rendered the same way. So after rendering you can not know which one of the two logical strings that was the origin.

ambiguity

The reason we have all of these problems is because the separator between the labels (‘.’) is not immutable. If it was, characters would not “jump” as there are across the boundary.

What is interesting to see is that the boundary in the DNS protocol is not a period as the labels are stored completely independently as length prefix strings. Further, in operating systems the labels are also in many cases separated from each others. With scroll bars, dialog boxes etc.

So what to do? And why does this happen? We have to go back to the Unicode Bidirectional Algorithm. If we look closely it is even recommended to include an explicit direction mark if a separator like period is to be really be a separator. In this case, adding U+200E LEFT-TO-RIGHT MARK before the period do the trick.

But to do this as part of the rendering algorithm, one have to know that it is a domain name that is displayed, and that is by far not always the case. In free flow text for example. How many are adding the angle-brackets ” around URLs in email?

And on top of this, we have the question on the overall order of labels. Should a domain name www.example.com in a right to left context be written as com.example.www or www.example.com?

All hard questions, and while resolving them, I think we can change the speed of light as well.

Thanks specifically to Stuart Cheshire for the discussions leading to this specific post.