Skip to content

Scripting Language Ideas

TwnKey edited this page Nov 13, 2021 · 11 revisions

Prior Art

https://github.com/Ouroboros/EDDecompiler

Here's an example of a python script as decompiled by EDDecompiler for Sky/Crossbell games: EDDecompiler

The goal is to have something similar while improving it as much as possible.

Some improvements we could make:

  • EDDecompiler requires Python to be installed and some libraries as well, which can be troublesome. We can fix that by not using Python but a custom language that we would define ourselves.
  • Some instructions are a bit difficult to read. On the above screenshot, the "Jc" corresponds to OP Code 5 in Cold Steel where a jump is performed depending on some conditions (mainly scenario flags to control the story flow). I think it can be translated into another syntax where conditions are more apparent.
  • Modders have found a lot of OP codes uses, so readability overall can be improved. We can also include TBL content (in the form of constants such as ID_REAN, ID_ALISA, ... instead of 0x00, 0x01), which would also help readability and help the users.

Instruction Set Documentation

We need to decompile the binary inside the dat files into some sort of ASM format that is readable by humans. The current tool can give each instruction a name and list their operands and types, which represents most of the decompilation process (it's the disassembling part).

However, the recompilation process is left entirely to the xlsx file which contains the type of each operand, and indicates to the tool how to translate each operand into binary.

If we are to use a scripting language, we also need a way to know the types of the operands, which are not apparent without a function prototype. Thus, we need to document the type, size (in bytes), names and description of each instruction. It also adds to safety and prevents the user to mess up their file with incorrect instructions (but also prevents them to be creative, which is why we need to keep the XLSX format which gives more freedom).

An instruction is not defined only by its op code, but also by its control bytes.

Once this documentation is completed, the tool can read it and know exactly how to parse the scripting language and recompile it into binary.

Language choice

We need this language to:

  • Be simple (no type apparent in the script part, but could have some in the instructions set definition) no need for return values, no need for arithmetic
  • Be safe (should not let compile into dat if something is wrong, like an unrecognized function, too many parameters, unparsable parameters, etc)
  • Have a binding to C++ in order to let the "already existing logic for header creation and pointers update" do the work. Also this would facilitate the conversion between XLSX, DAT, and this new code format.

The Fifth Instruction:

This one is a challenge that needs to be adressed when chosing the language, because it can take any number of operands, since it finishes only when it reaches a 0x01 byte. This can lead to a very long instruction. It is also manipulating conditions which are better translated into a if / else statement rather than a simple function call. Thus we are looking for the most readable syntax there is for this type of instruction.

Here's the code decompiled with Ghidra:
image\

We can see that the sequence of operators/conditions/instructions is meant to return a value, and if it equals to 0 (often means a condition was not verified), we use the pointer at the end of the instruction, otherwise we ignore it and proceed with the flow of the file.

The sequence of actions can contain 37 different operators, which I will try to list below. They also might operate on an array of operands (uint) of size 99 (size is hardcoded) noted stack. For simplicity we call the last operands added "a" and "b" with a the most recent one.

BYTE VALUE Meaning
0 Adds the integer following this operator (x) to the stack (a = x)
1 Stops the sequence
2 b = (a == b)
3 b = (a != b)
4 b = (b < a)
5 b = (b > a)
6 b = (b <= a)
7 b = (b >= a)
8 b = (a == 0)
9 b = (a && b)
10 b = (a & b)
11 b = (a | b)
12 b = (a + b)
13 b = (a - b)
14 b = -b
15 b = b ^ a
16 b = b * a
17 b = b / a
18 b = b % a
19 Nothing
20 b = b * a
21 b = b / a
22 b = b % a
23 b = (a + b)
24 b = (a - b)
25 b = (a & b)
26 b = b ^ a
27 b = (a | b)
28 Executes the following instruction, then b = what the instruction returned (some instructions seem to return something at a specific location in memory, not sure though)
29 b = (a ~ b)
30 Reads a word (2 bytes) value named wd, splits wd in two parts: first part is the 13 most significant bits called ptr, second part is the least significant 3 bits called i. Then checks if the ith bit of *ptr is true.
31 Reads 1 byte bt, if (bt < 0x20), b = memory[bt], else b = 0
32 Reads 1 byte, which gives an index to one location in memory (I counted 18 of them)
33 Reads a word (2 bytes), then a single byte, then runs a function that takes those two as parameters, no idea what it does; the result of that function is assigned to b
34 no parameters, runs a function that returns a number (if negative, an ABS is applied); the result is assigned to b
35 Reads 1 byte bt, b = integers_array_in_memory[bt]
36 Reads 1 integer int (or float?), performs b = ((int & some_constant_location_in_memory) != 0)
37 Reads 1 byte bt, reads an integer int in memory, if (int == 0), b = 0, else b = some_memory_location[bt]

We can now try to interpret what some of the 5th instructions do in CS4: image

First, the byte 28 indicates that we are executing the instruction that follows. This instruction returns a result, and after that, 4 is loaded in the stack, which finally gives a = 4, b = instruction's result. Then a "5" operator follows, which is b > a. This means the result of the sequence is instruction's result > 4. If this is untrue, it means instruction's result is less than or equal to 4, and that we jump to offset 80990.

Another example:
image
Here we have operator 35, which is loading some value from memory into b at index 247. Then it loads -1 into a, and checks if (a != b). It will then jump if a == b.

Clone this wiki locally