Skip to content

feat: add r2 extension#1172

Merged
wtdcode merged 25 commits into
qilingframework:devfrom
chinggg:r2
Jul 18, 2022
Merged

feat: add r2 extension#1172
wtdcode merged 25 commits into
qilingframework:devfrom
chinggg:r2

Conversation

@chinggg
Copy link
Copy Markdown
Contributor

@chinggg chinggg commented Jun 14, 2022

Checklist

Which kind of PR do you create?

  • This PR only contains minor fixes.
  • This PR contains major feature update.
  • This PR introduces a new function/api for Qiling Framework.

Coding convention?

  • The new code conforms to Qiling Framework naming convention.
  • The imports are arranged properly.
  • Essential comments are added.
  • The reference of the new code is pointed out.

Extra tests?

  • No extra tests are needed for this PR.
  • I have added enough tests for this PR.
  • Tests will be added after some discussion and review.

Changelog?

  • This PR doesn't need to update Changelog.
  • Changelog will be updated after some proper review.
  • Changelog has been updated in my PR.

Target branch?

  • The target branch is dev branch.

One last thing


This PR introduces radare2 extension and adds an example hello_r2.py to show its basic usage.

My main concerns are:

  • the organization of R2's members
    • dataclass vs namedtuple
  • how many r2 output details should be kept
    • section
    • string
    • function
    • flag
    • xref

Feel free to review it and give your advice.

Copy link
Copy Markdown
Member

@elicn elicn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! This functionality looks very cool.

I think this is a nice PoC that could be even cooler if it didn't take r2 data "as-is" and provided some post-processing, like turning all offsets into a vaddrs.

Anyway, I've made additional comments there.

Comment thread examples/extensions/r2/hello_r2.py Outdated
Comment thread examples/extensions/r2/hello_r2.py
Comment thread qiling/extensions/r2/r2.py
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated
class R2:
def __init__(self, ql: Qiling):
super().__init__()
path = ql.path.encode()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any adaptation we can do for "shellcode mode"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I remember r2 supports writing code to analyze?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you could load r2 with an empty buffer but you'll have to specify -a and -b for architecture and bits.
Shellcode content can be inserted with wx.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I have implemented "shellcode mode" using command wx HEX.
But when I try to use the ctype API libr.r_io.r_io_write_at(self._r2c.contents.io, loadaddr, code, size) it complains ctypes.ArgumentError: argument 1: <class 'TypeError'>: expected LP_struct_r_io_t instance instead of LP_struct_r_io_t

Copy link
Copy Markdown
Contributor Author

@chinggg chinggg Jun 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also there is currently no public util to accurately map QL_ARCH enum to arch and bits.

Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated


class R2:
def __init__(self, ql: Qiling):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggesting to allow the user to specify r2 evars to allow more analysis flexibility.
That could be done perhaps by accepting a dictionary of keys and values (e.g. {'anal.hasnext' : 'true', 'anal.depth' : 5} or any other reasonable format. Consider also allowing an "r2 init script" to let the user run a few r2 commands before starting the analysis.

Copy link
Copy Markdown
Contributor Author

@chinggg chinggg Jun 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 has its own type of config variables, I think it's similar to qiling's profile, which is for different OS though. We can extend it later.

Comment thread examples/extensions/r2/hello_r2.py
Comment thread qiling/extensions/r2/r2.py Outdated


@dataclass(unsafe_hash=True)
class Section:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless there is a fields post-processing or any additional functionality, these classes could be typing.NamedTuple.

Copy link
Copy Markdown
Contributor Author

@chinggg chinggg Jun 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now these classes have no special post-processing stuff. I use dataclass because it allows custom init function so I can pass all dict items got from r2 to construct the class without specifying fields. I can try using NamedTuple though.

Ref:
https://stackoverflow.com/questions/51671699/data-classes-vs-typing-namedtuple-primary-use-cases
https://stackoverflow.com/questions/17622419/creating-a-namedtuple-object-using-only-a-subset-of-arguments-passed

@wtdcode wtdcode self-assigned this Jun 16, 2022
Comment thread examples/extensions/r2/hello_r2.py Outdated
Comment thread setup.py Outdated
"fuzzercorn>=0.0.1;platform_system=='Linux'"
],
"SCA" : [
"r2libr"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elicn maybe we could add r2libr as an essential dep? My thought roughly is:

  • First of all, unlike the previously discussed udbserver, r2libr is maintained by me, in other words, by Qiling team directly.
  • r2libr could be tightly integrated into our loader.
  • The r2libr could help us polish some old API like set_api with the additional information.
  • There seems no use case that users really would like to use QIling without r2libr?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note, radare2 sometimes introduces break changes. To specify r2libr please use something like r2libr==5.7.0.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking changes should only happen when the major and minor numbers change. only patch releases are abi stable. so if you have r2libr-5.7.0 you can use r2-5.7.0, but also 5.7.4 or 5.7.8 without runtime problems. That relaxes a little the stress to maintain updates and also makes it easy to transition between abi-breaking versions

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I would refrain from defining something as a prerequisit unless it is absolutley required; that is, the program cannot operate without it. Plugging r2 into Qiling sounds like a great idea, where Qiling can benefit a lot from r2 static analysis.

Integrating r2 static analysis into Qiling requires some thinking of how to do it right and consistent across all modules. I really don't want it to be integrated into some modules and not in others. Until r2 is integrated across all Qiling modules, I believe we can wait with this prerequisit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand your concern so let me illustrate my motivation a bit more:

  1. Radare2 helps reduce redundant code and improves robustness, e.g. ELF loader, PE loader, even disassembler etc.
  2. Indeed, I strongly agree with you that r2 should be, some sort of everywhere in Qiling. I hope to achieve this by using r2 to build a few basic utils or modules. An example I showed before is to identify function boundaries. Imagine you can set_api on a binary without any symbol tables, which is really useful and I don't think users will reject it.

Based on these two points, I think r2 will become a prerequisite someday but also I agree we should wait until we really have to do so.

chinggg added 2 commits June 18, 2022 09:49
- eliminate magic number of baseaddr and loadaddr
- update example of shellcode mode
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread examples/extensions/r2/hello_r2.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread qiling/extensions/r2/r2.py
chinggg added 2 commits June 19, 2022 21:43
- Remove redundant __init__
- Abstract `cmdj` to parse json in only one place
- avoid importing whole functool
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread setup.py Outdated
"fuzzercorn>=0.0.1;platform_system=='Linux'"
],
"SCA" : [
"r2libr"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, I would refrain from defining something as a prerequisit unless it is absolutley required; that is, the program cannot operate without it. Plugging r2 into Qiling sounds like a great idea, where Qiling can benefit a lot from r2 static analysis.

Integrating r2 static analysis into Qiling requires some thinking of how to do it right and consistent across all modules. I really don't want it to be integrated into some modules and not in others. Until r2 is integrated across all Qiling modules, I believe we can wait with this prerequisit.

chinggg added 2 commits June 21, 2022 20:38
- replace str enum with typing.Literal
- avoid generic typing annotation for compability with Python 3.8
@chinggg
Copy link
Copy Markdown
Contributor Author

chinggg commented Jun 23, 2022

Have we reached an agreement about how to integrate r2? Or we need a PoC of using r2 everywhere in qiling at first?

@elicn
Copy link
Copy Markdown
Member

elicn commented Jun 24, 2022

Have we reached an agreement about how to integrate r2? Or we need a PoC of using r2 everywhere in qiling at first?

IMHO, r2 should be optional for now, and not a requirement (i.e. revert the last commit).

@chinggg
Copy link
Copy Markdown
Contributor Author

chinggg commented Jul 2, 2022

I am trying to implement a loader of minidump using r2. If r2 can help load different file formats, it may be considered as a requirement.

chinggg added 3 commits July 2, 2022 20:47
- use `s addr`, `p8 size` to get hex data
- return bytes that can be used to write memory
r2 command `iIj` return JSON of binary info like baddr and bintype
Comment thread qiling/extensions/r2/r2.py Outdated
@elicn
Copy link
Copy Markdown
Member

elicn commented Jul 4, 2022

I am trying to implement a loader of minidump using r2. If r2 can help load different file formats, it may be considered as a requirement.

As mentioned above, till Qiling becomes dependent of radare2 (by design, not opportunistically), radare2 should not appear as a required dependency. If someone wants to analyze a specific file format that its loading relies on radare2, they can install radare2 for that purpose (similar to EVM and its dependencies).

@wtdcode
Copy link
Copy Markdown
Member

wtdcode commented Jul 4, 2022

I am trying to implement a loader of minidump using r2. If r2 can help load different file formats, it may be considered as a requirement.

As mentioned above, till Qiling becomes dependent of radare2 (by design, not opportunistically), radare2 should not appear as a required dependency. If someone wants to analyze a specific file format that its loading relies on radare2, they can install radare2 for that purpose (similar to EVM and its dependencies).

I just had a talk with @chinggg and I plan to break this into two phases (also required by GSoC)

Phase 1

The main goal of this phase is to make r2 more smoothly used, including

  • Integrate radare2 flags system into Qiling. The integration doesn't mean simply executing commands, but means building a map between all our Qiling virtual memory addresses and the r2 flags. For example, given 0x40001001, we could know it's sym.main + 0x1 while given sym.puts we could know it's 0x654800. In this case, we no longer care about the base address, the relocation etc.
  • Based on the previous step, we introduce xrefs to Qiling. This enables a few powerful features like viewing current backtrace, tracing the reference to some strings in .const or knowing the target function we will jump to.This also involves some other related features like stack frame analysis, etc.
  • If we still have time left, we can introduce retdec to Qiling. This could enable Qiling to do more analysis on the source level, e.g. function prototype inference.

All these things can really ease the pain of writing Qiling scripts so far.

Expected date: July 24

Phase 2

For this phase, I haven't decided the exact task but a few ideas here:

  • Re-implement deflat and polish the IDA plugin. This one is simple but the most doable.
  • Introduce minidump. This will introduce a new loader and possibly other stuff and that's why @chinggg said it might result in radare2 as a requirement of Qiling.
  • Improve dyld cache analysis. This task is the most challenging but interesting as currently there is no good tool to dynamically emulate it.

Expected date: Sept 4

@elicn
Copy link
Copy Markdown
Member

elicn commented Jul 4, 2022

Leveraging radare2 static analysis capabilities to enhance Qiling emulation is a great idea, especially strings and flags. I implemented a similar resolving capabilities to my enhanced trace module [see here], but that relied on an address-to-symbol mapping which I got by parsing map files. Back then I was thinking about enabling PDB and DWARF parsing in Qiling, but never got to it..

All that being said, it is esential to keep in mind that radare2, as powerful as it might be, is a static analysis tool , whereas Qiling dynamically emulates code. Though some of radare2 analysis may be useful in a dynamic environment, its analysis may break or become irrelevant when dealing with dynamically loaded libraries, self modifying code, and basically everything that is not static.

My understanding is that radare2 may be useful to extract debugging information and strings, but I am not sure about the rest. We'll still have to load dynamic libraries, resolve imports, etc.

Like I mentioned in the past, QlOS and QlLoader are a mess and they need to be re-designed to better support processes, threads, etc. If there is a design plan behind minidump along with design changes to the Loader, it would be great if you could share it to get feedback. Please note that linking to an existing design might not be sufficient (ex: "see pwntols loader"); we need to see how it is going to be plugged into Qiling with all necessary adjustments.

chinggg added 5 commits July 5, 2022 00:06
- flag is a bookmark that associate a name with a given offset
- memory address in qiling can be interpreted better
- set arch and bits for r2 asm only in shellcode mode
@chinggg
Copy link
Copy Markdown
Contributor Author

chinggg commented Jul 8, 2022

@wtdcode I have made a simple PoC of address to flag mapping (by calling at function instead of mapping every address).
But I don't fully understand using xref to view current backtrace, I think It does not mean to give possible inaccurate backtraces statically since we want to bridge it with qiling instead of just static analysis.
I cannot find existing backtrace in qiling, I guess it maybe a kind of code hook. Currently I am not sure about the implementation.

def enable_trace(self, mode='full'):
# simple map from addr to flag name, cannot resolve addresses in the middle
self.ql.loader.symsmap = {flag.offset: flag.name for flag in self.flags}
if mode == 'full':
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an enumeration instead.

Copy link
Copy Markdown
Contributor Author

@chinggg chinggg Jul 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a PoC. The trace extension made by @elicn need a symsmap to resolve addresses, I have to made further modification on the trace extension.

Copy link
Copy Markdown
Member

@elicn elicn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turning r2 into an integral part of core is a bad idea, in my opinion.
As an extention, r2 analysis should stay an external add-on feature that may [or may not] be used with Qiling.

Comment thread qiling/extensions/trace.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated
Comment thread qiling/core.py Outdated
Comment thread qiling/arch/utils.py Outdated
Comment thread qiling/extensions/r2/r2.py Outdated
@wtdcode
Copy link
Copy Markdown
Member

wtdcode commented Jul 17, 2022

@elicn I think this PR is fine to go and we would like another a few PRs about R2 application in Qiling internally.

- "Qiling" is only used for type hint
@elicn
Copy link
Copy Markdown
Member

elicn commented Jul 17, 2022

Looks good to me.
Can we just remove the r2libr from the dependencies and make it optional, like it used to be?

@wtdcode wtdcode merged commit 34b4898 into qilingframework:dev Jul 18, 2022
@wtdcode
Copy link
Copy Markdown
Member

wtdcode commented Jul 18, 2022

Looking forward to the following work!

@chinggg chinggg mentioned this pull request Jul 25, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants