Getting Started With jQuery - Advanced Ajax Characters & Encoding
Written by Ian Elliot   
Tuesday, 20 June 2017
Article Index
Getting Started With jQuery - Advanced Ajax Characters & Encoding
JavaScript problems
Ajax and Encoding from the Server
Conclusion

Round Trips

The fact that the data that goes to the server is always UTF-8 has an effect on the data that the sever sends back to the browser. This is something we have already examined in terms of using get to retrieve a file, but in the case of a post there is an additional consideration. The data transfer is two-way and there are two sources of data that can be sent back to the client, the data that the client sent and the data the server retrieves or generates. 

Data that is generated by the server is very varied and can range from retrieving a file, retrieving data from a database or using a language such as PHP.  In the case of PHP the system is complicated, but very flexible when used with Apache. In PHP strings are a sequence of bytes and it doesn't make any attempt to change any encoding. If you use a multi-byte encoding then each byte is treated as a character.

There are a set of functions that work with multibyte characters and encodings and these can be used to programmatically generate output to the browser in any of the supported encodings. However, if you don't make any effort to generate particular encodings, then PHP will return any data it receives in the encoding it was received in. It will also send any string literal in the program using the encoding of the file the program is saved in. 

For example, if you assume that the file containing the program is saved in ISO-8859-2 encoding, then that is the encoding used for string literals. You can do this if you use Notepad++ and select encoding Ansi and character set Eastern European. So if you have an instruction:

echo("Ł");

Then 0xA3 is sent to the browser and no attempt to change its encoding is made. When the browser receives this byte it is interpreted as a UTF-8 encoding and shows as a replacement character because, as we have seen many times, it is an illegal code. 

If you add a header, remember meta tags are ignored, to the data sent to the browser that defines the charset correctly, you get a slightly different result:

header('Content-Type: text/html; charset=ISO-8859-2');
echo("Ł");

With this in place, the browser interprets the 0xA3 as an ISO-8859-2 character and replaces it by Unicode x0141 which is the correct character. 

Now it looks as if everything works as long as we include an appropriate header in the response, but there is yet another twist. 

Consider the data that the server received as part of the post or get. If we assume that the data in the earlier example: 

var sendData={test:"Ł"};

is sent to the server then:

echo($_POST["test"]);

will display the correct character in the web page without the header. The reason is simply that the data sent to the server is UTF-8 encoded which means test contains 0xC5 0x81 i.e. as far as PHP is concerned it is a two character string. When you send this string back to the browser it is interpreted as UTF-8 and hence the browser displays the correct character.

However, if you add the header defining the charset as ISO-8859, then things go wrong. The data sent to the browser is 0xC5 0x81 as this is what was received. The browser thinks that is an ISO-8859 encoding and 0xC5 is an L with a dash and 0x81 is an undefined character that displays as an open square. 

Thus, if you don't include a header, data that is sent to the server is correctly sent back to the client, but data from the server might not be. If you do include the header, the data from the server is correctly sent to the client, but any data originating from the client isn't.

There is nothing you can do to stop the browser sending UTF-8 and encoding to UTF-8 anything it receives.

Trying to work with AJAX with anything other than UTF-8 seems like fighting nature. It can be done, but you will have to do it in code. You can treat UTF-8 as a "transport" encoding and write code on the client or the server to convert to the encoding that you want to work with.

For example, if you want to send the data from the server in ISO-8859 encoding, assuming you are sending a correct ContentType header, and you want to echo data back to the client you need to use:

$test=$_POST["test"];
$test=mb_convert_encoding($test,"ISO-8859-2","UTF-8");
echo($test);

This converts the UTF-8 string into ISO-8859, which is now echoed back to the client correctly as long as there is an appropriate Content-Type header.

You have to be aware at all times what encoding is being used and make sure you use just one encoding in a single page.

Conclusion

The whole subject of encodings and web pages, is huge and well beyond the limits of a single chapter. Even a book would fail to cover every possibility.

If you can, opt to work with nothing but UTF-8. This is the only easy route.

Make sure all files are stored in UTF-8 and that all servers, web and datbase default to UTF-8. If you do this then things are as simple as they can be. If you have to use legacy encodings, then consider converting to UTF-8 before spending a lot of time trying to work with them as they are encoded. There are too many ways that things can go wrong when you change encodings on the fly. 

 

Summary

  • The original ASCII code only defines the basic 127 characters including some control codes. 

  • To cater for more language ISO 8859-n and similar encodings were developed. These typically use ASCI for values up to 127 and then custom characters fro 128 to 255. 

  • To make sense of an ISO 8859-n encoding you need to know which one is in use to get the right character set.

  • Unicode is a list of characters indexed by the codepoint, a 32-bit value. It has enough code points to represent all the world's languages. 

  • UTF-8 is a Unicode encoding commonly used on the web. It is a multbyte encoding varying from one to four bytes. 

  • JavaScript uses the UTF-16 encoding which is variable in  length and can have one or two 16-bit words. However it only handles a single 16-bit word properly and 2-word characters need special treatment.

  • You can enter a unicode character using hex escape sequences. Use \xHH for characters that have codes up to xFF i.e. 0 to 255 and \uHHHH for characters that have codes up to xFFFF. 

  • When JavaScript sends UTF-16 to a browser it is automatically converted to UTF-8 and vice-versa

  • You can set the character encoding to be used via an HTTP header or an HTML meta tag. This doesn't perform any encoding, but simply tells the browser what encoding has been used in the file used to store the web page or however the data was generated. 

  • If it supports the encoding, the browser will correctly convert the encoding of a web page, as it is loaded, to UTF-8. This means that the JavaScript always interacts with the same UTF-16 codes, irrespective of the indicated encoding.  

  • For files that are downloaded by AJAX, only Chrome takes any notice of an included HTML meta tag. To ensure that the encoding is converted to UTF-8 correctly you have to use an HTTP header.

  • For data that is sent to the server, the page encoding is correctly converted by the browser, when it was loaded, to UTF-8. The ajax call then simply sends this UTF-8 to the server. 

  • The fact that the data received by the server is always in UTF-8 can cause a problem if it then wants to send data back to the browser in some other encoding. You either have to convert the UTF-8 to the encoding or the encoding to UTF-8. 

  • It is much simpler to use UTF-8 for everything.

 

 

 

Just jQuery
Events, Async & AJAX

Is now available as a print book: Amazon 

jquery2cover

 

Contents

  1. Events, Async & Ajax (Book Only)
  2. Reinventing Events
  3. Working With Events
  4. Asynchronous Code
  5. Consuming Promises
  6. Using Promises 
  7. WebWorkers
  8. Ajax the Basics - get
  9. Ajax the Basics -  post
  10. Ajax - Advanced Ajax To The Server
  11. Ajax - Advanced Ajax To The Client
  12. Ajax - Advanced Ajax Transports And JSONP
  13. Ajax - Advanced Ajax The jsXHR Object
  14. Ajax - Advanced Ajax Character Coding And Encoding 

Also Available:

buy from Amazon

smallcoverjQuery

 

Advanced Attributes

 

Banner


JavaScript Jems - Objects Are Anonymous Singletons

JavaScript should not be judged as if it was a poor version of the other popular languages - it isn't a Java or a C++ clone. It does things its own way.  In particular, every object can be regard [ ... ]



JavaScript Canvas - Typed Arrays

Working with lower level data is very much part of graphics. This extract from Ian Elliot's book on JavaScript Graphics looks at how to use typed arrays to access graphic data.


Other Articles

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info



Last Updated ( Thursday, 05 May 2022 )