The Smallest Program
Can the smallest program that crashes be smaller than the smallest program that doesn’t? Depends on a platform and a compiler/linker set. I’ve chosen x64 and MASM64 for my experiments. The smallest working program I came up first was this:
; ml64 /Zi TheSmallestProgram64.asm /link ; /entry:main /SUBSYSTEM:CONSOLE _text SEGMENT main PROC ret main ENDP _text ENDS END
It compiles and links to an executable with only one byte instruction in its main function:
0:000> u main TheSmallestProgram64!main: 00000000`00401010 c3 ret 00000000`00401011 cc int 3 00000000`00401012 cc int 3 00000000`00401013 cc int 3 00000000`00401014 cc int 3 00000000`00401015 cc int 3 00000000`00401016 cc int 3 00000000`00401017 cc int 3
Then I thought about removing ret instruction and supposed that if we compile and link and try to execute the program with 0 bytes we get straight to int 3 instruction and in my case (I have NTSD set as a default postmortem debugger) a dump will be saved. So I did that but I found that unfortunately compiler inserts ret instruction if the procedure body is empty. So I cheated them by putting nop instruction (which is also one byte) and got my dump!
; ml64 /Zi TheSmallestProgramWithBug64.asm /link ; /entry:main /SUBSYSTEM:CONSOLE _text SEGMENT main PROC nop main ENDP _text ENDS END Loading Dump File [new_2006-10-25_12-40-06-500_076C.dmp] … 0:000> kL TheSmallestProgramWithBug64!main+0×1 kernel32!BaseProcessStart+0×29 0:000> u main TheSmallestProgramWithBug64!main: 00000000`00401010 90 nop 00000000`00401011 cc int 3 00000000`00401012 cc int 3 00000000`00401013 cc int 3 00000000`00401014 cc int 3 00000000`00401015 cc int 3 00000000`00401016 cc int 3 00000000`00401017 cc int 3
So one answer to my question: The smallest working program and the smallest crashing program have the same size unless we use some binary editors
Then I tried MS Visual C++ (this time 32-bit project) and came up with the following C or C++ program without any prolog and epilog code:
__declspec(naked) void Main ()
{
}
I changed entry point from standard main function to my own capitalized Main function and compiler/link options:
Compiler: /Od /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /D "_AFXDLL" /FD /MD /GS- /Fo"Release\" /Fd"Releasevc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt Linker: /OUT:"SmallestProgram.exe" /INCREMENTAL:NO /NOLOGO /MANIFEST:NO /NODEFAULTLIB /DEBUG /PDB:"SmallestProgram.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /ENTRY:"Main" /ERRORREPORT:PROMPT
The program crashes immediately because the body is empty:
Loading Dump File [new_2006-10-25_15-18-03-109_13B0.dmp] 0:000> u Main SmallestProgram!Main: 00401000 cc int 3 00401001 0000 add byte ptr [eax],al 00401003 0000 add byte ptr [eax],al 00401005 0000 add byte ptr [eax],al 00401007 0000 add byte ptr [eax],al 00401009 0000 add byte ptr [eax],al 0040100b 0000 add byte ptr [eax],al 0040100d 0000 add byte ptr [eax],al 0:000> kL ChildEBP RetAddr 002cfff0 00000000 SmallestProgram!Main 0:000> dds esp 002cffc4 7d4e992a kernel32!BaseProcessStart+0x28 002cffc8 00000000 002cffcc 00000000 002cffd0 7efdf000 002cffd4 80000003 002cffd8 002cffc8 002cffdc 002cfbbc 002cffe0 ffffffff 002cffe4 7d4d8998 kernel32!_except_handler3 002cffe8 7d4e9938 kernel32!`string'+0x28 002cffec 00000000 002cfff0 00000000 002cfff4 00000000 002cfff8 00401000 SmallestProgram!Main 002cfffc 00000000
So another answer to my question: The smallest crashing program can be less than the smallest working program and is actually 0 bytes
- Dmitry Vostokov -
October 27th, 2006 at 5:20 am
Cool. So here’s a challenge: write a program that is in multiple languages - that is, it will successfully compile under two or more different language compilers, and will run without causing an error. Ideally, when run, it should do something smart like print the name of the language it was compiled as.
I have seen this done before - the same piece of code compiled under three different language compilers.
There’s a pint offered for the best solution
October 27th, 2006 at 5:48 am
Cool challenge! Reminds me Obfuscated C Code contests. I devised this algorithm in pseudo code (might require a supercomputer if we assess its complexity for a non trivial program that prints a language it is written in ):
main()
{
input max_length;
test(”", 0, max_length);
}
bool test(string prg, int level, int max)
{
if (level == max) return false;
for (char in [allowed symbols for both languages])
{
prg += char;
if (both compilers compile successfully) { print prg; return true; }
if (test(prg, level+1, max) == true) return true;
prg -= char;
}
return false;
}
I’ll try to implement it in C and let you know Although you have to wait for about 2*10,000,000,000 compilations in the worst case for simple languages (10 letters in alphabet and the length of the program is 10 symbols) because the complexity is O(N^M) where N is alphabet cardinality and M is the maximum program length to try
October 27th, 2006 at 9:57 am
Here is the program written in Visual C++ and C# and compiles under both compilers. When compiled as C# by C# compiler it prints “Written in C#”. When compiled under Visual C++ compiler it prints nothing because there is not need to say that C++ is the best language ever
#if !__cplusplus
class Program
{
static void Main()
{
System.Console.WriteLine(”Written in C#”);
}
}
#else
void main() {}
#endif
October 30th, 2006 at 6:50 am
I saw it done many years ago with Fortran IV, AlgolW and BCPL. It was extremely devious and involved lines such as
COMMENT = 3;
which in Fortran is a comment (capital C in column 6) and is a variable assignment in the other two. There was generally much abuse of the comment facility to hide things and everything was in capitals. I think I’m showing my age
September 28th, 2009 at 4:22 pm
[…] The earliest opcodism binary was created on October 25th, 2006 that I now call Nothingness and Crash: The Smallest Program. […]
January 18th, 2010 at 6:37 am
Actually they are of the same size: 0 bytes. The problem is that they depend on what’s already in the memory. In your case it’s a bunch of zeoes which cause a crash. But this value depends on the OS. Vanilla DOS doesn’t really clear memory so if the segment where the program is loaded already contains a RET at 100h then it’ll work.
The RET command actually also depends on the environment. A correct program would be 2 bytes long, containing “INT 20h” (EXIT in DOS 1, but also supported by later versios, altough using INT 21h with the “exit” value is prefferred). Your current implementation depends on the correct value being in the stack, which IIRC is not always the case.