Skip to content

5.0.1 has more broken 'NOREGNAME' syntax on ARM32. #2145

@gerph

Description

@gerph

Summary

I made some changes for 5.0.1 for the CS_OPT_SYNTAX_NOREGNAME which I thought were working, but things have become more broken in 5.0.1

It looks like NOREGNAME produces the same output as DEFAULT.

Example code

This example code prints out the default register form and the 'noregname' form.

#!/usr/bin/env python

import sys

from capstone import *
import capstone.arm_const

code = bytearray([0, 0x10, 0x90, 0xe5]) # LDR r1,[r0]

# A decoder for 'no regname'
mdnr = Cs(CS_ARCH_ARM, CS_MODE_ARM)
mdnr.detail = True
mdnr.syntax = capstone.CS_OPT_SYNTAX_NOREGNAME

# A decoder for default format
mddef = Cs(CS_ARCH_ARM, CS_MODE_ARM)
mddef.detail = True
mddef.syntax = capstone.CS_OPT_SYNTAX_DEFAULT

optype_names = dict((getattr(capstone.arm_const, optype), optype) for optype in dir(capstone.arm_const) if optype.startswith('ARM_OP_'))

print("cs_version() = %r" % (cs_version(),))

for regnum in range(0, 16):
    # Tweak the source register
    code[2] = (code[2] & 0xF0) | regnum
    for i in mddef.disasm(bytes(code), 0x1000):
        dis_default = "%-6s%s" % (i.mnemonic, i.op_str)
    for i in mdnr.disasm(bytes(code), 0x1000):
        dis_noregname = "%-6s%s" % (i.mnemonic, i.op_str)
    print("Register %2i: default: %-20s  noregname: %s" % (regnum, dis_default, dis_noregname))

Test results for 4.0.2

cs_version() = (4, 0, 1024)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [r9]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [r10]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [r11]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [r12]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Test results for 5.0.0

cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [r9]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [r10]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [r11]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [r12]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [r13]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [r14]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Notice this is all register numbers in the noregname case; this was what I tried to make more consistent with 4.0.x.

Test results for 5.0.1

cs_version() = (5, 0, 1280)
Register  0: default: ldr   r1, [r0]        noregname: ldr   r1, [r0]
Register  1: default: ldr   r1, [r1]        noregname: ldr   r1, [r1]
Register  2: default: ldr   r1, [r2]        noregname: ldr   r1, [r2]
Register  3: default: ldr   r1, [r3]        noregname: ldr   r1, [r3]
Register  4: default: ldr   r1, [r4]        noregname: ldr   r1, [r4]
Register  5: default: ldr   r1, [r5]        noregname: ldr   r1, [r5]
Register  6: default: ldr   r1, [r6]        noregname: ldr   r1, [r6]
Register  7: default: ldr   r1, [r7]        noregname: ldr   r1, [r7]
Register  8: default: ldr   r1, [r8]        noregname: ldr   r1, [r8]
Register  9: default: ldr   r1, [sb]        noregname: ldr   r1, [sb]
Register 10: default: ldr   r1, [sl]        noregname: ldr   r1, [sl]
Register 11: default: ldr   r1, [fp]        noregname: ldr   r1, [fp]
Register 12: default: ldr   r1, [ip]        noregname: ldr   r1, [ip]
Register 13: default: ldr   r1, [sp]        noregname: ldr   r1, [sp]
Register 14: default: ldr   r1, [lr]        noregname: ldr   r1, [lr]
Register 15: default: ldr   r1, [pc]        noregname: ldr   r1, [pc]

Note that the noregname case is exactly the same as the default.

Expected output

I had hoped that 5.0.1 would be closer to the 4.0.x version. It seems to have gone worse..

Possible reason

I looked at the constants in the Python capstone/__init__.py for CS_OPT_SYNTAX and I see a possible problem?

On 5.0.0 the constants are:

# Capstone syntax value
CS_OPT_SYNTAX_DEFAULT = 0    # Default assembly syntax of all platforms (CS_OPT_SYNTAX)
CS_OPT_SYNTAX_INTEL = 1    # Intel X86 asm syntax - default syntax on X86 (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_ATT = 2      # ATT asm syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_NOREGNAME = 3   # Asm syntax prints register name with only number - (CS_OPT_SYNTAX, CS_ARCH_PPC, CS_ARCH_ARM)
CS_OPT_SYNTAX_MASM = 4      # MASM syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_MOTOROLA = 5 # MOS65XX use $ as hex prefix

On 5.0.1 the constants are:

# Capstone syntax value
CS_OPT_SYNTAX_DEFAULT = 1 << 1  # Default assembly syntax of all platforms (CS_OPT_SYNTAX)
CS_OPT_SYNTAX_INTEL = 1 << 2  # Intel X86 asm syntax - default syntax on X86 (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_ATT = 1 << 3  # ATT asm syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_NOREGNAME = 1 << 4  # Asm syntax prints register name with only number - (CS_OPT_SYNTAX, CS_ARCH_PPC, CS_ARCH_ARM)
CS_OPT_SYNTAX_MASM = 1 << 5  # MASM syntax (CS_OPT_SYNTAX, CS_ARCH_X86)
CS_OPT_SYNTAX_MOTOROLA = 1 << 6  # MOS65XX use $ as hex prefix
CS_OPT_SYNTAX_CS_REG_ALIAS = 1 << 7  # Prints common register alias which are not defined in LLVM (ARM: r9 = sb etc.)

It's likely that this is correct, but the fact that the selection of the syntax has changed its constant values, and the output has stopped working makes me think that it might be related.

If I look at the setter for syntax in 5.0.1, I see:

    # syntax setter: modify assembly syntax.
    @syntax.setter
    def syntax(self, style):
        status = _cs.cs_option(self.csh, CS_OPT_SYNTAX, style)
        if status != CS_ERR_OK:
            raise CsError(status)
        # save syntax
        self._syntax = style

But for 'skipdata' I see it has this form:

    # setter: modify skipdata status
    @skipdata.setter
    def skipdata(self, opt):
        if opt == False:
            status = _cs.cs_option(self.csh, CS_OPT_SKIPDATA, CS_OPT_OFF)
        else:
            status = _cs.cs_option(self.csh, CS_OPT_SKIPDATA, CS_OPT_ON)
        if status != CS_ERR_OK:
            raise CsError(status)

        # save this option
        self._skipdata = opt

ie it's using CS_OPT_ON and CS_OPT_OFF in the call to change options, whilst the syntax isn't, and in capstone.h file we see the actual definitions as:

/// Runtime option value (associated with option type above)
typedef enum cs_opt_value {
	CS_OPT_OFF = 0,  ///< Turn OFF an option - default for CS_OPT_DETAIL, CS_OPT_SKIPDATA, CS_OPT_UNSIGNED.
	CS_OPT_ON = 1 << 0, ///< Turn ON an option (CS_OPT_DETAIL, CS_OPT_SKIPDATA).
	CS_OPT_SYNTAX_DEFAULT = 1 << 1, ///< Default asm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_INTEL = 1 << 2, ///< X86 Intel asm syntax - default on X86 (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_ATT = 1 << 3,   ///< X86 ATT asm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_NOREGNAME = 1 << 4, ///< Prints register name with only number (CS_OPT_SYNTAX)
	CS_OPT_SYNTAX_MASM = 1 << 5, ///< X86 Intel Masm syntax (CS_OPT_SYNTAX).
	CS_OPT_SYNTAX_MOTOROLA = 1 << 6, ///< MOS65XX use $ as hex prefix
	CS_OPT_SYNTAX_CS_REG_ALIAS = 1 << 7, ///< Prints common register alias which are not defined in LLVM (ARM: r9 = sb etc.)
} cs_opt_value;

The value of CS_OPT_ON and CS_OPT_OFF is 1 and 0 respectively, which makes me think that this was intended to be an OR'd bitfield to control the flags.

But I'm guessing here... it seems odd that a patch version update would change the meaning of the constants - that might make it hard in compiled languages that expect to be able to dynamic link with minor versions without an ABI change? Again, I'm guessing that's the case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions