Skip to content

Increased size of cs_isn.bytes to 33 to be able to hold big EVM instructions#1231

Closed
f0rki wants to merge 1694 commits intocapstone-engine:nextfrom
f0rki:evm_bytes_size
Closed

Increased size of cs_isn.bytes to 33 to be able to hold big EVM instructions#1231
f0rki wants to merge 1694 commits intocapstone-engine:nextfrom
f0rki:evm_bytes_size

Conversation

@f0rki
Copy link
Copy Markdown

@f0rki f0rki commented Aug 8, 2018

The EVM arch has a couple of really big instructions, with the biggest one being PUSH32 (with 33 bytes size). Currently cs_inst can only hold 16 bytes of raw instruction data, which is not enough for EVM. This results in the bytes being truncated to 16 bytes. This didn't really affect the disassembly or the op_str, but if you use cs_inst.size or cs_inst.bytes afterwards, it would return wrong data for those big instructions (PUSH15 up to PUSH32).

@aquynh
Copy link
Copy Markdown
Collaborator

aquynh commented Aug 9, 2018

33 looks weird though, should we round the size up to some other number?

@f0rki
Copy link
Copy Markdown
Author

f0rki commented Aug 9, 2018

I don't know. It already feels like a lot of wasted memory for all the other archs.
Maybe it should also be surrounded by #ifdef HAS_EVM and fall back to the 16 byte size?

aquynh and others added 28 commits March 2, 2019 14:59
)

* m68k: store correct m68k_reg value in op.reg_pair

Originally, value - M68K_REG_D0 was stored and the print logic added
M68K_REG_D0.

* m68k: fix license typo
* normalize tab character in cs

* normalize in issue mode
* Added RISCV dir to contain the RISCV architecture engine code. Adding the TableGen files generated from llvm-tblgen. Add Disassembler.h

* Started working on RISCVDisassembler.c - RISCV_init(), RISCVDisassembler_getInstruction, and RISCV_getInstruction

* Added all functions to RISCVDisassembler.c and needed modifications to RISCVGenDisassemblerTables.inc. Add and modified RISCVGenSubtargetInfo.inc. Start creation of RISCVInstPrinter.h

* Finished RISCVGenAsmWriter.inc. Finished RISCVGenRegisterInfo.inc. Minor fixes to RISCVDisassembler.c. Working on RISCVInstPrinter

* Finished RISCVInstPrinter, RISCVMapping, RISCVBaseInfo, RISCVGenInstrInfo.inc, RISCVModule.c. Working on riscv.h

* Backport it from: porto703@0db412c

* All RISCV files added. Compiled correctly and initial test for ADD, ADDI, AND works properly.

* Add refactored cs.c for RISCV

* Testing all I instructions in test_riscv.c

* Modify the orignal backport for RISCVGenRegisterInfo.inc, capstone.h and test_iter to work w/ the current code strcuture

* Fix issue with RISCVGenRegisterInfo.inc - RISCVRegDesc[] (Excess elements in struct initializer). Added RISCV tests to test_iter.c

* fixed bug related to incorrect initialization of memory after malloc

* fix compile bug

* Fix compile errors.

* move riscv.h to include/capstone

* fix indentation issues

* fix coding style issues

* Fix indentation issues

* fix coding style

* Move variable declaration to the top of the block

* Fix coding indentation

* Move some stuff into RISCVMappingInsn.inc

* Fix code sytle

* remove cs_mode support for RISCV

* update asmwriter-inc to LLVM upstream

* update the .inc files to riscv upstream

* update riscv disassembler function for suport 16bit instructions

* update printer & tablegen inc files which have fixed arguments mismatch

* update headers and mapping source

* add riscv architecture specific test code

* fix all RISCV tons of compiler errors

* pass final tests

* add riscv tablegen patchs

* merge with upstream/next

* fix cstool missing riscv file

* fix root Makefile

* add new TableGen patchs for riscv

* fix cmakefile.txt of missing one riscv file

* fix declaration conflict

* fix incompatible declaration type

* change riscvc from arch to mode

* fix test_riscv warnning

* fix code style and add riscv part of test_basic

* add RISCV64 mode

* add suite for riscv

* crack fuzz test

* fix getfeaturebits test add riscvc

* fix test missing const qualifier warnning

* fix testcase type mismatch

* fix return value missing

* change getfeaturebits test

* add test cs files

* using a winder type contain the decode string

* fix a copy typo

* remove useless mode for riscv

* change cs file blank type

* add repo for update_riscv & fix cstool missing riscv mode

* fix typo

* add riscv for cstool useage

* add TableGen patch for riscv asmwriter

* clean ctags file

* remove black comment line

* fix fuzz related something

* fix missing RISCV string of fuzz

* update readme, etc..

* add riscv *.s.cs file

* add riscv *.s.cs file & clear ctags

* clear useless array declarations at capstone_test

* update to 5e4069f

* update readme change name more formal

* change position of riscv after bpf and modify copyright more uniform

* clear useless ctags file

* change blank with tab in riscv.h

* add riscv python bindings

* add riscv in __init__.py

* fix riscv define value for python binding

* fix test_riscv.py typo

* add missing riscvc in __init__.py of python bindings

* fix alias-insn printer bug, remove useless newline

* change inst print delimter from tab to bankspace for travis

* add riscv tablegen patch

* fix inst output more consistency

* add TableGen patch which fix inst output formal

* crack the effective address output for detail and change register print function

* fix not detail crash bug

* change item declaration position at cs_riscv

* update riscv.py

* change function name more meaningfull

* update python binding makefile

* fix register enum sequence according to riscvgenreginfo.inc

* test function name

* add enum s0/fp in riscv.h & update riscv_const.py

* add register name enum
aeflores and others added 8 commits March 7, 2021 21:57
* fix bug in displacement offset

* fix k0-k7 registers in X86 table.
…e-engine#1702)

* mos65xx: use imm field for immediate operand value

using the wrong field works on little-endian hosts, but on big-endian the wrong value would be read

* mos65xx: set operand mem field to address also in relative modes

previously the last operand would have an offset, which doesn't match the printed operand

* mos65xx: add bpl instruction to test

this demonstrates an address operand with relative addressing
This was initially introduced in dce7da9 but lost in the LLVM 7 sync
in 5a99624.
@kabeor
Copy link
Copy Markdown
Member

kabeor commented Nov 10, 2021

I don't know. It already feels like a lot of wasted memory for all the other archs. Maybe it should also be surrounded by #ifdef HAS_EVM and fall back to the 16 byte size?

it's a good idea, can you do this continue?

Smartsmurf and others added 4 commits November 10, 2021 17:05
…ieri/moffset_disp

Fix the displacement offset for moffset-encoded operands
fixed library extension to build properly under CYGWIN
@tmfink
Copy link
Copy Markdown
Contributor

tmfink commented Nov 11, 2021

  • master branch is not really used, new development should go into next branch (unable to disassemble f3 48 0f 1e c8 (rdsspq rax) in Ubuntu 20.04 #1759)
  • Changing the size of a field in cs_insn is an ABI break, meaning a change like this should only go into next branch (and not backported to older branches)
    • As a consequence, I don't think we want the memory layout of cs_insn to change as a result of something like HAS_EVM. Otherwise, the ABI used in a libcapstone.so would depend on whether the library was compiled with HAS_EVM. Any headers distributed headers and and all language bindings would need to know how the library was compiled.

@kabeor
Copy link
Copy Markdown
Member

kabeor commented Nov 11, 2021

yes, plz rebase your branch to 'next'. 'master' is not accept to PR

  • As a consequence, I don't think we want the memory layout of cs_insn to change as a result of something like HAS_EVM. Otherwise, the ABI used in a libcapstone.so would depend on whether the library was compiled with HAS_EVM. Any headers distributed headers and and all language bindings would need to know how the library was compiled.

@aquynh any thoughts?

@f0rki f0rki changed the base branch from master to next November 11, 2021 12:54
@f0rki
Copy link
Copy Markdown
Author

f0rki commented Nov 11, 2021

I rebased to next.

@aquynh
Copy link
Copy Markdown
Collaborator

aquynh commented Nov 11, 2021

i hesitate on this PR since it would break the API :-(

@sylvainpelissier
Copy link
Copy Markdown

sylvainpelissier commented Jan 31, 2022

cs_isn.bytes size is internal only no ? why it would break the API ?

@f0rki
Copy link
Copy Markdown
Author

f0rki commented Feb 1, 2022

Alternatively: Would it make sense to move the opcode constant to the cs_evm struct?

@f0rki f0rki mentioned this pull request Feb 1, 2022
@tmfink
Copy link
Copy Markdown
Contributor

tmfink commented Feb 1, 2022

cs_isn.bytes size is internal only no ? why it would break the API ?

cs_insn is in capstone.h, which is a public header. Changing the bytes field changes the size of cs_insn which is an ABI break.

@david942j
Copy link
Copy Markdown
Contributor

Will it make more sense to declare the field bytes to the very end of struct cs_insn as a dynamic-length array? Something like:

struct cs_insn {
  // other fields..
  uint32_t len_bytes; // length of "bytes"
  uint8_t bytes[];
};

Or alternatively declaring bytes as a pointer just like the detail field.

The length changing of bytes was the reason that Capstone has to bump from 4 to 5 (see #1315 (comment)), maybe it's better to consider a solution instead of bumping major versions because of "minor" fixes.

@tmfink
Copy link
Copy Markdown
Contributor

tmfink commented Mar 28, 2022

The length changing of bytes was the reason that Capstone has to bump from 4 to 5 (see #1315 (comment)), maybe it's better to consider a solution instead of bumping major versions because of "minor" fixes.

Adding an extra pointer indirection/allocation is less efficient but could help maintain ABI compatibility. It depends on what we want to do.

@Rot127
Copy link
Copy Markdown
Collaborator

Rot127 commented Mar 20, 2024

Thank you for the PR! I closed it because it is out of date. With the new auto-sync update for v6 we made many changes to some main architectures and will do also to others.
This also changed the requirements we have now for new PRs.

If you still want to merge the changes, please rebase your fix onto the newest next branch and open a new PR.

@Rot127 Rot127 closed this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.