Have fun with URL Encoding!
FridayMar 27,2009
I had some trouble while playing with the Google Language API regarding the number of characters that can be sent using a GET request: Although the specification of the HTTP protocol does not specify any maximum length, URLs over 2,000 characters will not work in the most popular web browser.
Sounds okay for me, but we had also to remember that query strings have to be URLEncoded: I have large texts that need to be split in several parts so that it could sent using a GET request. So I need to calculate where to cut the text to have less than 2000 characters.
FLEX/AS3
I wanted to use encodeURI() function to calculate the size of my URLEncoded string to split it wisely. In the documentation, you can read :
| Characters not encoded |
|---|
0 1 2 3 4 5 6 7 8 9 |
a b c d e f g h i j k l m n o p q r s t u v w x y z |
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
; / ? : @ & = + $ , # |
- _ . ! ~ * ' ( ) |
Why not…
Using HTTPService, unless you specify the contentType to be application/xml, the parameters are urlencoded which makes sense, of course.
Now the fun part:
source String – 59chars :
Phrase_choc – j’adore quand un plan se déroule sans accroc!
encodeURI() – 82chars :
Phrase_choc%20-%20j’adore%20quand%20un%20plan%20se%20d%C3%A9roule%20sans%20accroc!
HTTPService parameter – 90chars:
Phrase%5Fchoc%20%2D%20j%27adore%20quand%20un%20plan%20se%20d%C3%A9roule%20sans%20accroc%21
So there’s a difference between the encoding done by HTTPService and using encodeURI function?! HTTPService is encoding also the characters from the last two rows in the above table… is that “normal” ? More fun ?
If you’re sending the encodedURI String via the HTTPService you will get a nice 112 chars long string :
Phrase%5Fchoc%2520%2D%2520j%27adore%2520quand%2520un%2520plan
%2520se%2520d%25C3%25A9roule%2520sans%2520accroc%21
Let’s compare with other languages.
COLDFUSION
Using URLEncodedFormat() I got the following encoded string:
UntitledPhrase%5Fchoc%20%2D%20j%27adore%20quand%20un%20plan%20se%20d%C3%A9roule%20sans%20accroc%21
same as the one from HTTPService.
JAVASCRIPT
Using encodeURI() function I got the following encoded string:
Phrase_choc%20-%20j’adore%20quand%20un%20plan%20se%20d%C3%A9roule%20sans%20accroc!
same as the one from AS3 encodeURI().
Well, I’m finding this a bit disturbing, and the W3C is not really helping.
The same encoding method may be used for encoding characters whose use, although technically allowed in a URL, would be unwise due to problems of corruption by imperfect gateways or misrepresentation due to the use of variant character sets, or which would simply be awkward in a given environment. Because a % sign always indicates an encoded character, a URL may be made safer simply by encoding any characters considered unsafe, while leaving already encoded characters still encoded. Similarly, in cases where a larger set of characters is acceptable, % signs can be selectively and reversibly expanded.
The reserved characters shall however never be arbitrarly encoded and decoded.
Any thoughts ?
Posted in 







March 27th, 2009 at 4:29 pm
My blog post about URLEncoding weirdness – Flex/CF/javascript/W3C http://bit.ly/12GZG
March 27th, 2009 at 6:00 pm
RT @Fitzchev: My blog post about URLEncoding weirdness – Flex/CF/javascript/W3C http://bit.ly/12GZG
March 27th, 2009 at 6:27 pm
nilBlog : Have fun with URL Encoding! http://tinyurl.com/cdfxrb
March 27th, 2009 at 9:04 pm
@ryanstewart Hi Ryan, can you have a quick look at this: http://bit.ly/12GZG ? Your thoughts ? Sorry for the bad english in advance
April 29th, 2009 at 10:14 am
Have done a blog post about this problem before but got no reaction http://bit.ly/12GZG So if you think I(m right, vote http://bit.ly/Uelv
April 29th, 2009 at 10:15 am
RT @fitzchev: … blog post about this problem before but got no reaction http://bit.ly/12GZG … vote http://bit.ly/Uelv