If you're ever curious as to how your C# code turns into executable code, sharplab.io is a very good place to start.
The website, created by Andrey Shcheckin, follows the pattern of other online code compilers like Tim Jones' shader playground which handles HLSL/GLSL/others, or Matt Godbolt's Compiler Explorer with multiple languages and compilers (C and C++ for sure, but also CUDA, Go, Python, Rust, Swift, and a few others).
There's a pane where you can select a language and write or paste your code (you can see Roslyn-generated errors and warnings like unused using
directives, which is nice), and then another pane where you can select the output you want to see.
The outputs at the moment of this writing are the following.
Run
mode to look at assigned values.While you may not think that compiling to C# and then decompiling into C# is a very useful thing, it's interesting to see what transformations are done by the compiler itself.
A classic example is the state machine for iterators. For example, this bit of code:
using System.Collections.Generic;
public class C {
public IEnumerable<int> f() {
yield return 1;
yield return 2;
}
}
Will produce this output
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Reflection;
using System.Runtime.CompilerServices;
using System.Security;
using System.Security.Permissions;
[assembly: CompilationRelaxations(8)]
[assembly: RuntimeCompatibility(WrapNonExceptionThrows = true)]
[assembly: Debuggable(
DebuggableAttribute.DebuggingModes.Default |
DebuggableAttribute.DebuggingModes.DisableOptimizations |
DebuggableAttribute.DebuggingModes.IgnoreSymbolStoreSequencePoints |
DebuggableAttribute.DebuggingModes.EnableEditAndContinue)]
[assembly: SecurityPermission(SecurityAction.RequestMinimum, SkipVerification = true)]
[assembly: AssemblyVersion("0.0.0.0")]
[module: UnverifiableCode]
public class C
{
[CompilerGenerated]
private sealed class <f>d__0 : IEnumerable<int>, IEnumerable, IEnumerator<int>, IDisposable, IEnumerator
{
private int <>1__state;
private int <>2__current;
private int <>l__initialThreadId;
public C <>4__this;
int IEnumerator<int>.Current
{
[DebuggerHidden]
get
{
return <>2__current;
}
}
object IEnumerator.Current
{
[DebuggerHidden]
get
{
return <>2__current;
}
}
[DebuggerHidden]
public <f>d__0(int <>1__state)
{
this.<>1__state = <>1__state;
<>l__initialThreadId = Environment.CurrentManagedThreadId;
}
[DebuggerHidden]
void IDisposable.Dispose()
{
}
private bool MoveNext()
{
switch (<>1__state)
{
default:
return false;
case 0:
<>1__state = -1;
<>2__current = 1;
<>1__state = 1;
return true;
case 1:
<>1__state = -1;
<>2__current = 2;
<>1__state = 2;
return true;
case 2:
<>1__state = -1;
return false;
}
}
bool IEnumerator.MoveNext()
{
//ILSpy generated this explicit interface implementation from .override directive in MoveNext
return this.MoveNext();
}
[DebuggerHidden]
void IEnumerator.Reset()
{
throw new NotSupportedException();
}
[DebuggerHidden]
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
<f>d__0 <f>d__;
if (<>1__state == -2 && <>l__initialThreadId == Environment.CurrentManagedThreadId)
{
<>1__state = 0;
<f>d__ = this;
}
else
{
<f>d__ = new <f>d__0(0);
<f>d__.<>4__this = <>4__this;
}
return <f>d__;
}
[DebuggerHidden]
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<int>)this).GetEnumerator();
}
}
[IteratorStateMachine(typeof(<f>d__0))]
public IEnumerable<int> f()
{
<f>d__0 <f>d__ = new <f>d__0(-2);
<f>d__.<>4__this = this;
return <f>d__;
}
}
So, there are a bunch of directives that have to do with the assembly itself - those are attributes like CompilationRelaxations for example.
The next thing to look at is the very last method, however. f
now creates a new instance of a compiler-generated type and returns it. You can see that the implementation keeps a <>1__state
field, and depending on its value the enumerable will be created, it will update its state and current values as an enumerator is created/initialized and then iterated.
OK, we're going to look at three different ways of iterating through an array of strings and printing out its output. Make sure you set the output pane to Release
and not Debug
.
using System;
public static class C {
public static void IterEval(string[] values) {
for (int i = 0; i < values.Length; i++) {
Console.WriteLine(values[i]);
}
}
public static void IterForeach(string[] values) {
foreach (var i in values) {
Console.WriteLine(i);
}
}
public static void IterAssigned(string[] values) {
int l = values.Length;
for (int i = 0; i < l; i++) {
Console.WriteLine(values[i]);
}
}
}
There are thee methods: IterEval evaluates values.Length
directly in the for
loop. IterForeach uses a foreach
loop. IterAssigned is the same as IterEval, but values.Length
is only evaluated once, outside the loop.
This is what each of these look like.
C.IterEval(System.String[])
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: xor edi, edi
L000a: mov ebx, [esi+4]
L000d: test ebx, ebx
L000f: jle short L001f
L0011: mov ecx, [esi+edi*4+8]
L0015: call System.Console.WriteLine(System.String)
L001a: inc edi
L001b: cmp ebx, edi
L001d: jg short L0011
L001f: pop ebx
L0020: pop esi
L0021: pop edi
L0022: pop ebp
L0023: ret
IterEval has the following instructions.
edi
is cleared, as it represents i
).C.IterForeach(System.String[])
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: xor edi, edi
L000a: mov ebx, [esi+4]
L000d: test ebx, ebx
L000f: jle short L001f
L0011: mov ecx, [esi+edi*4+8]
L0015: call System.Console.WriteLine(System.String)
L001a: inc edi
L001b: cmp ebx, edi
L001d: jg short L0011
L001f: pop ebx
L0020: pop esi
L0021: pop edi
L0022: pop ebp
L0023: ret
Turns out that even though conceptually foreach
is a very different beast from a for
loop (with its use of enumerators and whatnot), the compiler recognizes we're iterating over an array and generates identical code as with the straightforward for
loop.
C.IterAssigned(System.String[])
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: mov edi, [esi+4]
L000b: xor ebx, ebx
L000d: test edi, edi
L000f: jle short L003c
L0011: test edi, edi
L0013: setge cl
L0016: movzx ecx, cl
L0019: test cl, 1
L001c: je short L002e
L001e: mov ecx, [esi+ebx*4+8]
L0022: call System.Console.WriteLine(System.String)
L0027: inc ebx
L0028: cmp ebx, edi
L002a: jl short L001e
L002c: jmp short L003c
L002e: mov ecx, [esi+ebx*4+8]
L0032: call System.Console.WriteLine(System.String)
L0037: inc ebx
L0038: cmp ebx, edi
L003a: jl short L002e
L003c: pop ebx
L003d: pop esi
L003e: pop edi
L003f: pop ebp
L0040: ret
Now, here we see some differences at last. I've added some extra line breaks this time. This is what's happening in IterAssigned, in comparison to the prior two functions.
edi
.ebi
(this has been scheduled a bit earlier before, and here ebx
will be i
instead of edi
).edi
to jump to L2e or fall through to L1e.Here, I find myself a bit stumped - I can't quite say why all of this remains in the code.
ecx
but that gets overwritten before it's used.cl
has a 1 bit; at this point cl
is 1 for zero or positive values, 0 for negative (just based on this block).OK, so I managed to confuse myself and/or the compiler a bit, but we still didn't get a bounds check. Let's try this instead.
...
public static void IterArg(string[] values, int l) {
for (int i = 0; i < l; i++) {
Console.WriteLine(values[i]);
}
}
...
And then, hey presto!
C.IterArg(System.String[], Int32)
L0000: push ebp
L0001: mov ebp, esp
L0003: push edi
L0004: push esi
L0005: push ebx
L0006: mov esi, ecx
L0008: mov edi, edx
L000a: xor ebx, ebx
L000c: test edi, edi
L000e: jle short L004c
L0010: test esi, esi
L0012: je short L0039
L0014: cmp [esi+4], edi
L0017: setge cl
L001a: movzx ecx, cl
L001d: test edi, edi
L001f: setge al
L0022: movzx eax, al
L0025: test eax, ecx
L0027: je short L0039
L0029: mov ecx, [esi+ebx*4+8]
L002d: call System.Console.WriteLine(System.String)
L0032: inc ebx
L0033: cmp ebx, edi
L0035: jl short L0029
L0037: jmp short L004c
L0039: cmp ebx, [esi+4]
L003c: jae short L0051
L003e: mov ecx, [esi+ebx*4+8]
L0042: call System.Console.WriteLine(System.String)
L0047: inc ebx
L0048: cmp ebx, edi
L004a: jl short L0039
L004c: pop ebx
L004d: pop esi
L004e: pop edi
L004f: pop ebp
L0050: ret
L0051: call 0x71b775b0
L0056: int3
I'm not going to go into the detailed disassembly this time, as the patterns are roughly the same as before.
I will howerver call attention to L3c, which jumps past the return on L50 and onto L51, which is calling and external (fixed-address, known to JIT) error handler followed by a debug break interrupt.
A great way of learning more is looking at how other interesting constructs are handled, like async
and await
, throwing and handling exceptions, or switches with patterns.
If you're curious, the GitHub repo has the sources for the website.
Happy decompiling!
It's been a while since I looked at x86, so I ended up using a few references.