Latest Vulnerability Suggests Compilers Should Learn Unicode
Written by Mike James   
Wednesday, 10 November 2021

There is a fuss at the moment about a security problem that could allow a Trojan to enter the code of any language. It is the "any language" part that seems to be the scary bit. What is going on? In fact, it's all very simple.

unicodelogo

Unicode is great with the possible exception of the existence of homoglyphs - that is two characters which look the same to a human, but one can be a single character and the other a composite of different partial characters. This causes no end of problems for simple tasks such as comparing two strings. It is also a potential security problem, but more on this in a moment.

The current security concern isn't about homoglyphs, but about being able to change the order of text using directionality overrides. We tend to think that text runs left to right, but of course some scripts are right to left. Unicode supports setting the direction of text, but it also allows you to embed codes which modify the global direction. For example, LRI and RLI set local Left to Right and Right to Left for the following characters until a PDI - Pop Directional Isolate is encountered. So for example:

RLI LRI a b c PDI LRI d e f PDI PDI

displays

d e f a b c

You can also use LRO and RLP to override the direction of all the text following.

You can see that careful use of the direction codes could allow you to swap the order of words. If you are clever enough, or have the time to think deeply about how to use this to good effect, you could invent the following:

/*RLO } LRIif (isAdmin)PDI LRI begin admins only */
printf("you are an admin. \n);
/* end admin only RLO { LRI */

which, when you take the directions into account, displays as:

/begin admins only */  if (isAdmin) {
printf("you are an admin. \n);
/* end admin only RLO */ }

This looks perfectly good for restricting access to just admins and, if this was in a pull request, you might well let it go. However, most compilers simply ignore control codes and the code that the compiler sees is:

/* } if (isAdmin) begin admins only */
printf("you are an admin. \n);
/* end admin only { */

What the compiler sees is code that has no if statement at all and so admits everyone to the admin section of the program.

Once you have seen this sort of thing it is relatively easy to think up other uses of direction codes and while the example is in C you can easily find similar mechanisms in other languages. Here is one in JavaScript:

if(acesslevel != "userRLO LRI// check if adminPDI LRI"){
}

which the user reads as:

if(acesslevel != "user"){// check if admin
}

but the compiler reads as

if(acesslevel != "user// check if admin")
}

i.e. the check is going to fail and the acesslevel defaults to admin.

In fact there probably isn't a language which isn't vulnerable to such tricks. Or is it the maintainer who is vulnerable? Or is it the compiler?

For my money it's the compiler that is the problem. Since when did compilers decide not to enter the 21st century and recognize that Unicode not only exists but can modify the meaning of a program. Compilers and other language tools either have to reject dangerous Unicode or they have to read it like a human would.

To quote from the paper by Nicholas Boucher and Ross Anderson from the University of Cambridge disclosing the problem:

"About half of the compiler maintainers we contacted during the disclosure period are working on patches or have committed to do so. As the others are dragging their feet, it is prudent to deploy other controls in the meantime where this is quick and cheap, or relevant and needful."

Not so much an exploit, more an oversight.

unicodelogo

More Information

Trojan Source: Invisible Vulnerabilities

Nicholas Boucher, Ross Anderson

Related Articles

Open Source Insights Into The Software Supply Chain

New Spectre-Like Vulnerability - Is The Era Of Fast Clever Computers Over?

How Spectre Works

How Meltdown Works

ROP Mitigations Bypassed

Rowhammer - Changing Memory Without Accessing It

ShellShock - Yet Another Code Injection Vulnerability

Heartbleed - The Programmer's View

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Microsoft Introduces Vector Abstractions Library For .NET
21/11/2024

Microsoft has announced a preview release of the Microsoft Extensions VectorData Abstractions library, which can be used to help integrate vector stores into .NET applications and libraries.



Apache Releases Tomcat 11
07/11/2024

Apache has announced the release of Tomcat 11, as well as marking the 25th anniversary of the first commit to the Apache Tomcat source code repository since becoming an ASF project.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

 

Last Updated ( Wednesday, 10 November 2021 )