Ninkasi Bugs 2

Fixed one more Ninkasi bug today!

Functions in Ninkasi have three undocumented variables. Named function arguments just refer to stack positions relative to the current stack frame, so it was pretty simple to add variables that refer to the function ID itself, the return pointer, and the argument count, because all of these values exist on the stack just like the arguments to the function itself.

These are considered debugging-only features, but they should at least be not-broken debugging-only features.

These variables are:

The Bug

At some point along the way, I had designed the 'call' instruction to expect a stack that looked like this (example 3-parameter function):

index | value          | type
---------------------------------
-6    | ...            |
-5    | arg0           | any type
-4    | arg1           | any type
-3    | arg2           | any type
-2    | _functionId    | function
-1    | _argumentCount | integer

Later down the road, I had to change it to this:

index | value          | type
---------------------------------
-6    | ...            |
-5    | _functionId    | function
-4    | arg0           | any type
-3    | arg1           | any type
-2    | arg2           | any type
-1    | _argumentCount | integer

I'm not sure off the top of my head why I had to do this, but my guess is that it was simply the order that arguments needed to be evaluated and then pushed onto the stack.

Anyway, when I made this change, I apparently forgot to update the position of _functionId. The arguments still needed a +1 offset in the stack that I always considered a sort of mystery. Well, the +1 offset was to skip the _functionId, and the functionId was pointing to one of the arguments.

The Fix

This was a pretty simple fix. Just remove the offset, and move the _functionId setup code to before the argument setup loop.

Sometimes it really is that easy!

Posted: 2021-09-06


Rain the Lamia

Started a new D&D game, so that means new lamia art!

Rain the Lamia

Posted: 2021-09-05


Ninkasi Bug Fixes

I spent a bit of time hacking on Ninkasi code this weekend!

I got rid of one terrible idea I implemented, replaced it with something much better, fixed one crash bug, and two potential coroutine-related memory leaks!

Removing One Ugly Anti-feature

First order of business was to change the object-function-call stupidity I put in there forever ago. It's an anti-feature I never should have considered. Here's how it went:

An object can have values inside of it assigned to functions, like so:

function testFunc(testValue)
{
    print("Value passed in: ", testValue, "\n");
}

var ob = object();
ob["foo"] = testFunc;

Previously, these functions can be called like so:

ob.foo();

When called in this manner, the object itself is inserted as the first argument to the function. This is roughly equivalent to the following code:

{
    var _temp = ob.foo;
    _temp(ob);
}

And the output of this is, predictably:

Value passed in: <object:0>

The problem with this is that there's no way to distinguish between when you want the object to be inserted as the first argument to a function and when you do not.

My reasoning at the time I developed this "feature" was sketchy at best, so I finally got around to cutting it out. Now there is no special case considered for calling a function directly from an object's field using '.'.

Now this code...

function testFunc(testValue)
{
    print("Value passed in: ", testValue, "\n");
}

var ob = object();
ob["foo"] = testFunc;

ob.foo();

... is a runtime error, as it should be:

error:
test.nks:14: Incorrect argument count for function call. Expected: 1 Found: 0

To replace this feature, because still like some aspects of it, I have stolen and repurposed C's indirection operator. The following code acts the way ob.foo() acted before:

ob->foo();

Which is equivalent to:

ob.foo(ob);

This gets us back to the original result:

Value passed in: <object:0>

But now you get to decide if you want to call a function on an object without the object itself being added in as an argument, by using the -> or the ..

One Crash Bug

Callable Objects

First, an explanation of the feature this affected:

Callable objects in Ninkasi are a little oddity I thought it would be useful to throw in. It's sort of a halfway point to actual closures.

Objects may be called as functions through the use of specially-named fields.

Take the following code:

var ob = newobject;
ob._exec = function(foo) {
    print("foo: ", foo, "\n");
};
ob._data = "blah blah blah";
ob();

The output of this code is:

foo: blah blah blah

It's that simple. Set an object's _exec field to a function, set an object's _data field to a value, and you may call the object itself as though it were a function.

This code...

ob();

... is equivalent to this code...

ob._exec(ob._data);

You can even add extra arguments after the _data one, which is inserted first:

var ob = object();
ob._exec = function(foo, bar) {
    print("foo: ", foo, ", ", bar, "\n");
};
ob._data = "blah blah blah";
ob("whatever");

Output:

foo: blah blah blah, whatever

The Bug

This feature does, unfortunately, involve some weird shifting-around of stack data.

This part wasn't thought out super-well, so bare with me here.

The setup for the function-call operator involves pushing the function ID onto the stack, followed by all the function arguments, followed by an integer indicating the number of arguments there are.

Here's what the stack will look like for a normal 3-parameter function call, at the time the call opcode executes (with stack indices relative to the top of the stack):

index | value       | type
------------------------------
-6    | ...         |
-5    | function-ID | function
-4    | arg0        | any type
-3    | arg1        | any type
-2    | arg2        | any type
-1    | 3           | integer

We have the function ID, followed by the arguments, followed by an argument count.

In order for the _data field to always be first, we need the stack to look like this:

index | value       | type
------------------------------
-7    | ...         |
-6    | function-ID | function
-5    | _data       | any type
-4    | arg0        | any type
-3    | arg1        | any type
-2    | arg2        | any type
-1    | 4           | integer

That just involves inserting the _data into the stack, shifting everything forward by 1, and adding 1 to the argument count entry.

I know this is probably tickling everyone's inefficient-VM sense, but this feature was kind of an afterthought for my personal toy language, so be nice. :)

Anyway, our bug today was in the code to shift the stack data around. The argument count coming into the function call opcode could be larger than the stack, and it would happily underrun the bottom of the stack, leading to your usual C buffer overrun/underrun type of error. That code is also not the prettiest code in the world, leading me to think I was either tired or in a rush when I wrote it. Let's have a look...

// Make room for the _data contents on the stack.
nkiVmStackPush_internal(vm);

// Shift everything up the stack.
for(i = argumentCount; i >= 1; i--) {
  vm->currentExecutionContext->stack.values[
    vm->currentExecutionContext->stack.size - argumentCount - 2 + i + 1] =
    vm->currentExecutionContext->stack.values[
      vm->currentExecutionContext->stack.size - argumentCount - 2 + i];
}

Yikes.

How to Setup This Bug

So here's an interesting thing: You can't.

Well, let me be more clear, you can't if you're using the scripting language directly. This is purely a vulnerability exposed through the binary state snapshots.

I haven't actually reverse-engineered AFL's crash output for this one, to be entirely honest, but my thinking is that the bad executable just pushes a bad argument count onto the stack before doing a normal callable-object call.

The Fix

First, I know what you're going to say. How dare I not have a system in place to protect against these kinds of overrun/underruns, when it's so easy to check for!? That usually comes after "why the hell are you using C89 for this project when it's absolutely prone to these kinds of errors!?" followed by a recommendation that I switch to Rust.

The thing is, though, I do have a way of safely interacting with the stack. I'm just not using it here because... uh... reasons?

The stoage space allocated to the stack only doubles in size when it reallocates, so, given that the stack capacity is always a power of two, it's pretty easy to just mask off the lowest some-number-of bits for stack indices going into the array. I even store that stack mask, and use it like I actually know what I'm doing in most spots!

The fix (along with some much-needed cleanup) look like this:

struct NKVMStack *stack = &(vm->currentExecutionContext->stack);
struct NKValue *stackValues;
nkuint32_t stackSize;
nkuint32_t stackMask;
nkuint32_t stackSizeMinusArguments;

// Make room for the _data contents on the stack.
nkiVmStackPush_internal(vm);

stackValues = stack->values;
stackSize = stack->size;
stackMask = stack->indexMask;
stackSizeMinusArguments = stackSize - argumentCount - 2;

// Shift everything up the stack.
for(i = argumentCount; i >= 1; i--) {
    stackValues[(stackSizeMinusArguments + i + 1) & stackMask] =
      stackValues[(stackSizeMinusArguments + i) & stackMask];
}

This also throws in a few extra variables to cut down on those giant globs of indirection and indexing inside the loop, potentially offering a speedup on compilers that can't optimize out those redundant operations (not that it's a particularly fast piece of code to begin with).

Memory Leaks

I'm not going to get into too much detail with this one, because it's pretty simple. A failure during coroutine creation would make the program allocate the storage space needed for the coroutine's execution context (including stack, program counter, etc) and then fail to set the pointer to store it, causing it to leak.

The fix is just to check for those failures and clean up the mess when it does.

Memory still Tracked

So with this leak still have a backup system in place for these. Allocations inside the VM are actually tracked and easily cleaned up during deallocation of the VM object itself. It's not a C memory leak, it's just a leak of VM address space, more or less.

When the VM's address space is exhausted, it will bail out with its own internal allocation error and can still be cleaned up by the hosting application. So as far as the stability of the hosting application is concerned, and in relation to potentially hostile script code, this isn't a "real" threat. It does suck for long-running scripts, though.

Final Thoughts

AFL is great. Valgrind is awesome, too. All these issues found today were through those two tools. Thinking you can write secure code without them or some equivalent to them is hubris in its purest form.

The issues with coroutine setup here are definitely showing my lack of testing after writing the coroutine code. The callable object issues are something I wasn't expecting, because that code's been in there forever.

Ninkasi may never be perfect and vulnerability-proof, but at least it got a little bit closer today, and I'm pretty happy with that.

Posted: 2021-09-05


[ 0 ] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

Next