Skip to content

leak in llvm code generated by rust notfound: functions with String return parameter  #643

@StamesJames

Description

@StamesJames
  • I have searched open and closed issues for duplicates
  • I made sure that I am not using an old project version (DO: pull Phasar, update git submodules, rebuild the project and check if the bug is still there)

Bug description

I'm trying to find leaks in llvm code generated with rust for the following programm:

#[inline(never)]
#[no_mangle]
fn source() -> String {
    "Test".to_string()
}

#[inline(never)]
#[no_mangle]
fn sink(source: &str) -> String {
    source.to_string()
}

#[inline(never)]
#[no_mangle]
fn sanitize(source: &str) -> String {
    source.to_owned()
}

fn main() {
    let unsanitized = source();
    let source = source();
    let sanitized = sanitize(&source);
    let sink_unsanitized = sink(&unsanitized);
    let sink_sanitized = sink(&sanitized);
    println!("{sink_unsanitized}");
    println!("{sink_sanitized}");
}

A simpler example worked ( #642 ) now I changed the functions from returning ints to returning Strings. They get compiled to the following llvm code:

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @source(%"alloc::string::String"* sret(%"alloc::string::String") %0) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 bitcast (<{ [4 x i8] }>* @alloc63 to [0 x i8]*), i64 4)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

; Function Attrs: noinline nonlazybind uwtable
define dso_local void @sink(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1) unnamed_addr #1 {
start:
; call <str as alloc::string::ToString>::to_string
  call void @"_ZN47_$LT$str$u20$as$u20$alloc..string..ToString$GT$9to_string17h488739110bf80537E"(%"alloc::string::String"* sret(%"alloc::string::String") %0, [0 x i8]* align 1 %source.0, i64 %source.1)
  br label %bb1

bb1:                                              ; preds = %start
  ret void
}

I set my analysis-config to:

{
    "name": "taint-03-simple-functions-string",
    "version": 1,
    "functions": [
        {
            "name": "source",
            "params": {
                "source": [0]
            }
        },
        {
            "name": "sink",
            "params": {
                "sink": [1]
            }
        },
        {
            "name": "sanitize",
            "ret": "sanitizer"
        }
    ],
    "variables": []
  }

because in my understanding the two functions now don't return anything but get a pointer to which they write the value to return.
I Invoke my analysis with

phasar-cli \
   -m target/debug/deps/sql_injection_03_simple_requests-0a2c4db10e6afc34.ll \
   -D ifds-taint \
   --analysis-config=analysis-config.json \
   --entry-points _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E

Where _ZN32sql_injection_03_simple_requests4main17h3819e5f83b074069E is the mangled name of my main function.

If I set the 0th parameter of the sink function as sink, phasar reports a leak but it's not simply the leaked variable obtained by the source function but some very long description. Here the first lines of that

Leak(s):
IR  : %"core::fmt::Arguments"* %0 | ID: _ZN4core3fmt9Arguments6new_v117hc8a21f4658044cffE.0
IR  : %"alloc::string::String"* %0 | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.0
IR  : %"alloc::vec::Vec<u8>"* %bytes | ID: _ZN5alloc6string6String19from_utf8_unchecked17h6553b59f13851d7cE.1
IR  : @alloc55 = private unnamed_addr constant <{ [75 x i8] }> <{ [75 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/fmt/mod.rs" }>, align 1, !psr.id !4 | ID: 4
IR  : @alloc59 = private unnamed_addr constant <{ [74 x i8] }> <{ [74 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/alloc.rs" }>, align 1, !psr.id !8 | ID: 8
IR  : @alloc60 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [74 x i8] }>, <{ [74 x i8] }>* @alloc59, i32 0, i32 0, i32 0), [16 x i8] c"J\00\00\00\00\00\00\00\AC\00\00\00\1B\00\00\00" }>, align 8, !psr.id !9 | ID: 9
IR  : @alloc61 = private unnamed_addr constant <{ [76 x i8] }> <{ [76 x i8] c"/rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/raw_vec.rs" }>, align 1, !psr.id !10 | ID: 10
IR  : @alloc62 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [76 x i8] }>, <{ [76 x i8] }>* @alloc61, i32 0, i32 0, i32 0), [16 x i8] c"L\00\00\00\00\00\00\00\F7\00\00\00;\00\00\00" }>, align 8, !psr.id !11 | ID: 11
IR  : @alloc22 = private unnamed_addr constant <{ [1 x i8] }> <{ [1 x i8] c"\0A" }>, align 1, !psr.id !13 | ID: 13
IR  : @alloc21 = private unnamed_addr constant <{ i8*, [8 x i8], i8*, [8 x i8] }> <{ i8* bitcast (<{}>* @alloc20 to i8*), [8 x i8] zeroinitializer, i8* getelementptr inbounds (<{ [1 x i8] }>, <{ [1 x i8] }>* @alloc22, i32 0, i32 0, i32 0), [8 x i8] c"\01\00\00\00\00\00\00\00" }>, align 8, !psr.id !14 | ID: 14
IR  : %_2 = call i8* @"_ZN4core3ptr6unique15Unique$LT$T$GT$6as_ptr17h3b210c5ac01b064fE"(i8* %unique), !psr.id !18 | ID: 15
IR  : i8* %unique | ID: _ZN119_$LT$core..ptr..non_null..NonNull$LT$T$GT$$u20$as$u20$core..convert..From$LT$core..ptr..unique..Unique$LT$T$GT$$GT$$GT$4from17hd493d251c602c8e8E.0
IR  : %0 = call i8* @"_ZN4core3ptr8non_null16NonNull$LT$T$GT$13new_unchecked17h6f1d783941022635E"(i8* %_2), !psr.id !20 | ID: 17

But in my understanding the 0th parameter is no sink parameter because it acts as the return value but the 1st and 2nd should produce a leak because here values from inside the source String get passed.
I attached all relevant files.

Steps to reproduce

  • compile with cargo using rust-toolchain.toml and .cargo/config.toml
  • run phasar-cli taint analysis with custom entry-point and analysis-config.json

Actual result: Describe here what happens after you run the steps above (i.e. the buggy behaviour)

  • phasar dosn't find the leak

Expected result: Describe here what should happen after you run the steps above (i.e. what would be the correct behaviour)

  • phasar finds a leak because some values from the source String variable get passed into the sink function
    • not the whole String gets passed because of rusts String dereferencing into a string slice which in my understandign of the generated llvm code is then not passed as a struct but as its two components separately

Context (Environment)

Operating System:

  • Linux
  • Windows
  • macOS

Build Type:

  • cmake
  • custom build

Example files

Files:

examplefiles.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions