Archive for August 14th, 2007

Unicode Illuminated

Tuesday, August 14th, 2007

I generated a memory dump with plenty of Unicode and ASCII strings “Hello World!” to see how they look on a picture. I assume you know the difference between Unicode (UTF-16) and ASCII encodings: wide characters from the former occupy two bytes:

0:000> db 008c7420 l20
008c7420  48 00 65 00 6c 00 6c 00-6f 00 20 00 57 00 6f 00  H.e.l.l.o. .W.o.
008c7430  72 00 6c 00 64 00 21 00-00 00 00 00 00 00 00 00  r.l.d.!.........

and characters from the latter occupy one byte of memory:

0:000> db 008c72b4 l10
008c72b4  48 65 6c 6c 6f 20 57 6f-72 6c 64 21 00 00 00 00  Hello World!....

You can see that the second byte for Unicode English characters is zero. I converted that memory dump into 8 bits-per-pixel bitmap using Dump2Picture and after zooming it sufficiently in Vista Photo Viewer until pixels become squares I got the following picture that illustrates the difference between Unicode and ASCII strings:

Incidentally the same memory dump converted to 32 bits-per-pixel bitmap shows Unicode “Hello World!” strings in green colors :-)

- Dmitry Vostokov @ DumpAnalysis.org -