diff --git a/README.md b/README.md index 4aca2ce..79a18b0 100644 --- a/README.md +++ b/README.md @@ -46,14 +46,20 @@ cmake --build . ``` ### Usage -1. Create a configuration file (see `./ps2xRecomp/example_config.toml`) -2. Run the recompiler: +1. **Analyze the ELF**: Use the `ps2_analyzer` tool to generate an initial configuration. +```bash +./ps2_analyzer your_game.elf config.toml ``` -./ps2recomp your_config.toml +*For better results on retail games, see the [Ghidra Workflow](ps2xAnalyzer/Readme.md#3-ghidra-integration-recommended-for-complex-games).* + +2. **Recompile**: Run the recompiler using the generated configuration. +```bash +./ps2recomp config.toml ``` -Compile the generated C++ code -Link with a runtime implementation +3. **Compile Output**: +* Compile the generated C++ code in the `output/` directory. +* Link with the `ps2xRuntime` implementation. ### Configuration PS2Recomp uses TOML configuration files to specify: diff --git a/ps2xAnalyzer/Readme.md b/ps2xAnalyzer/Readme.md index 3ce4b20..27be1eb 100644 --- a/ps2xAnalyzer/Readme.md +++ b/ps2xAnalyzer/Readme.md @@ -1,6 +1,26 @@ # PS2 ELF Analyzer Tool -The PS2 ELF Analyzer Tool helps automate the process of creating TOML configuration files for the PS2Recomp static recompiler. It analyzes PlayStation 2 ELF files and generates a recommended configuration based on the binary's characteristics. +The PS2 ELF Analyzer Tool automates the creation of TOML configuration files for the PS2Recomp static recompiler. It identifies function boundaries, library stubs, and problematic instructions. + +## Analysis Paths + +The analyzer supports three distinct paths for discovering code within a PS2 binary: + +### 1. DWARF Debug Information +If the ELF was compiled with debug symbols (`-g`), the analyzer uses `libdwarf` to extract perfect function names and exact start/end addresses. This is common in homebrew or early development builds. + +### 2. Native Heuristic Scanner (Retail/Stripped) +For commercial games where symbols are stripped, the analyzer uses a "JAL Scanner": +* It scans executable sections for `JAL` (Jump and Link) instructions. +* It infers function start points based on jump targets. +* It generates names like `sub_XXXXXXXX`. + +### 3. Ghidra Integration (For Complex Games) +For the highest accuracy in stripped games, you can use Ghidra's superior analysis engine: +1. Use the provided script: `ps2xRecomp/tools/ghidra/ExportPS2Functions.py` or `.java`. +2. Run it in Ghidra to export a CSV map of all functions. +3. Add the CSV path to your TOML: `ghidra_output = "path/to/map.csv"`. +4. The recompiler will prioritize Ghidra's boundaries over its own heuristics. ## Key Features @@ -15,70 +35,29 @@ The PS2 ELF Analyzer Tool helps automate the process of creating TOML configurat ps2_analyzer ``` -### Where: +### Parameters: -* `input_elf` is the path to the PS2 ELF file you want to analyze -* `output_toml` is the path where the generated TOML configuration will be saved +* `input_elf`: Path to the PS2 ELF file. +* `output_toml`: Path where the generated TOML configuration will be saved. -## Example: -```bash -ps2_analyzer path/to/your/ps2_game.elf config.toml -``` - -## How It Works -The analyzer performs the following steps: - -* Parses the ELF file using the same ElfParser used by PS2Recomp -* Extracts functions, symbols, sections, and relocations -* Analyzes the entry point to understand initialization patterns -* Identifies library functions by name patterns and signatures -* Maps the call graph to understand relationships between functions -* Analyzes data usage patterns (basic implementation) -* Scans for problematic instructions that might need patching -* Generates a TOML configuration file with all findings +## Example Workflow +1. Run the analyzer on your game: + `ps2_analyzer game.elf config.toml` +2. (Optional) Open `game.elf` in Ghidra, run the export script, and update `config.toml` with the CSV path. +3. Run the recompiler: + `ps2recomp config.toml` ## Generated Configuration The tool creates a TOML file with the following sections: -```toml -[general] -input = "path/to/your/ps2_game.elf" -output = "output/" -single_file_output = false -runtime_header = "include/ps2_runtime.h" - -stubs = [ - # List of identified library functions to stub - "printf", - "malloc", - # ... -] - -skip = [ - # List of system functions to skip - "entry", - "_start", - # ... -] - -[patches] -instructions = [ - # Potential instruction patches - { address = "0x100008", value = "0x00000000" }, - # ... -] -``` - -## Extending the Analyzer -The analyzer is designed to be extensible. You can enhance its capabilities by: - -* Adding more library function patterns in initializeLibraryFunctions() -* Improving the call graph analysis in analyzeCallGraph() -* Enhancing data usage pattern detection in analyzeDataUsage() -* Refining patch detection logic in identifyPotentialPatches() +* `[general]`: Paths to ELF and Ghidra maps. +* `stubs`: List of library functions to be replaced by C++ stubs. +* `skip`: List of functions to be ignored (entry points, initialization). +* `[patches]`: Individual instructions that need to be replaced (SYSCALLs, COP0, etc.). ## Limitations -* The analyzer uses basic heuristics and may not catch all special cases -* Function identification relies heavily on symbol names -* Patch recommendations are preliminary and may need manual review -* Complex game-specific behaviors may not be detected \ No newline at end of file +* Heuristics may not catch all special cases in highly optimized code. +* Self-modifying code is flagged but requires manual review. +* Indirect jumps (jump tables) are detected but complex ones might need manual TOML entries. + +For more details on the recompilation process, see the [Main README](../README.md). \ No newline at end of file diff --git a/ps2xAnalyzer/include/ps2recomp/elf_analyzer.h b/ps2xAnalyzer/include/ps2recomp/elf_analyzer.h index e1a81a2..dba6896 100644 --- a/ps2xAnalyzer/include/ps2recomp/elf_analyzer.h +++ b/ps2xAnalyzer/include/ps2recomp/elf_analyzer.h @@ -45,6 +45,7 @@ namespace ps2recomp std::unordered_set m_libFunctions; std::unordered_set m_skipFunctions; + std::unordered_set m_knownLibNames; std::unordered_map> m_functionDataUsage; std::unordered_map m_commonDataAccess; diff --git a/ps2xAnalyzer/src/elf_analyzer.cpp b/ps2xAnalyzer/src/elf_analyzer.cpp index 63a491a..0188bae 100644 --- a/ps2xAnalyzer/src/elf_analyzer.cpp +++ b/ps2xAnalyzer/src/elf_analyzer.cpp @@ -109,6 +109,9 @@ namespace ps2recomp file << "# Path to input ELF file\n"; file << "input = \"" << escapeBackslashes(m_elfPath) << "\"\n\n"; + file << "# Path to Ghidra exported function map (optional CSV)\n"; + file << "ghidra_output = \"\"\n\n"; + file << "# Path to output directory\n"; file << "output = \"" << escapeBackslashes(outputDirStr) << "\"\n\n"; @@ -136,14 +139,14 @@ namespace ps2recomp file << "# Jump tables detected in the program\n"; file << "[jump_tables]\n"; - for (const auto & jt : m_jumpTables) + for (const auto &jt : m_jumpTables) { file << "[[jump_tables.table]]\n"; file << "address = \"0x" << std::hex << jt.address << "\"\n" << std::dec; file << "entries = [\n"; - for (const auto & [index, target] : jt.entries) + for (const auto &[index, target] : jt.entries) { file << " { index = " << index << ", target = \"0x" << std::hex << target << "\" },\n" @@ -194,7 +197,7 @@ namespace ps2recomp const std::vector stdLibFuncs = { // I/O functions "printf", "sprintf", "snprintf", "fprintf", "vprintf", "vfprintf", "vsprintf", "vsnprintf", - "puts", "putchar", "getchar", "gets", "fgets", "fputs", "scanf", "fscanf", "sscanf", + "puts", "putchar", "getchar", "gets", "fgets", "fputs", "scanf", "fscanf", "sscanf", "sprint", "sbprintf", // Memory management @@ -236,7 +239,7 @@ namespace ps2recomp // Extra string helpers "strnlen", "strspn", "strcspn", "strcasecmp", "strncasecmp"}; - m_libFunctions.insert(stdLibFuncs.begin(), stdLibFuncs.end()); + m_knownLibNames.insert(stdLibFuncs.begin(), stdLibFuncs.end()); } void ElfAnalyzer::analyzeEntryPoint() @@ -309,6 +312,18 @@ namespace ps2recomp } } } + + for (const auto &func : m_functions) + { + if (isLibraryFunction(func.name)) + { + m_libFunctions.insert(func.name); + } + else if (isSystemFunction(func.name)) + { + m_skipFunctions.insert(func.name); + } + } } void ElfAnalyzer::analyzeDataUsage() @@ -782,7 +797,7 @@ namespace ps2recomp } void ElfAnalyzer::analyzePerformanceCriticalPaths() const - { + { std::cout << "Analyzing performance-critical paths..." << std::endl; for (const auto &func : m_functions) @@ -795,9 +810,9 @@ namespace ps2recomp std::vector instructions = decodeFunction(func); - for (const auto& inst : instructions) + for (const auto &inst : instructions) { - if (inst.isBranch) + if (inst.isBranch) { int32_t offset = static_cast(inst.immediate) << 2; uint32_t targetAddr = inst.address + 4 + offset; @@ -814,7 +829,7 @@ namespace ps2recomp << " (size: " << loopSize << " instructions)" << std::endl; bool hasMultimedia = false; - for (const auto& instruction : instructions) + for (const auto &instruction : instructions) { if (instruction.address >= targetAddr && instruction.address <= inst.address) { @@ -841,68 +856,155 @@ namespace ps2recomp { std::cout << "Identifying recursive functions..." << std::endl; - std::unordered_map> callGraph; + // lets ignore skip and library + std::unordered_set eligible; + eligible.reserve(m_functions.size()); for (const auto &func : m_functions) { - if (m_functionCalls.contains(func.start)) + if (m_skipFunctions.contains(func.name) || + m_libFunctions.contains(func.name)) { - for (const auto &call : m_functionCalls[func.start]) - { - callGraph[func.name].insert(call.calleeName); - } + continue; } + + eligible.insert(func.name); } + std::unordered_map> callGraph; + callGraph.reserve(eligible.size()); + for (const auto &func : m_functions) { - if (callGraph[func.name].contains(func.name)) + if (eligible.contains(func.name)) { - std::cout << "Function " << func.name << " is directly recursive" << std::endl; + continue; } - } - for (const auto &func : m_functions) - { - if (m_skipFunctions.contains(func.name) || - m_libFunctions.contains(func.name)) + auto itCalls = m_functionCalls.find(func.start); + if (itCalls == m_functionCalls.end()) { continue; } - std::set visited; - std::function detectCycle; + auto &edges = callGraph[func.name]; + edges.reserve(itCalls->second.size()); - detectCycle = [&](const std::string &currFunc) -> bool + for (const auto &call : itCalls->second) { - if (visited.contains(currFunc)) + // non-eligible nodes to graph. + if (eligible.contains(call.calleeName)) { - return currFunc == func.name; + continue; } - visited.insert(currFunc); + edges.push_back(call.calleeName); + } + } + + std::unordered_map index; + std::unordered_map lowlink; + std::unordered_set onStack; + std::vector stack; + + index.reserve(eligible.size()); + lowlink.reserve(eligible.size()); + onStack.reserve(eligible.size()); + stack.reserve(eligible.size()); + + int currentIndex = 0; + + std::vector> sccs; + sccs.reserve(256); + + std::function strongconnect; + strongconnect = [&](const std::string &v) + { + index[v] = currentIndex; + lowlink[v] = currentIndex; + currentIndex++; + + stack.push_back(v); + onStack.insert(v); - for (const auto &callee : callGraph[currFunc]) + auto it = callGraph.find(v); + if (it != callGraph.end()) + { + for (const auto &w : it->second) + { + if (index.contains(w)) + { + strongconnect(w); + lowlink[v] = std::min(lowlink[v], lowlink[w]); + } + else if (onStack.contains(w)) + { + lowlink[v] = std::min(lowlink[v], index[w]); + } + } + } + + if (lowlink[v] == index[v]) + { + std::vector scc; + while (!stack.empty()) { - if (detectCycle(callee)) + std::string w = stack.back(); + stack.pop_back(); + onStack.erase(w); + + scc.push_back(w); + if (w == v) { - return true; + break; } } - visited.erase(currFunc); - return false; - }; + sccs.push_back(std::move(scc)); + } + }; + + for (const auto &name : eligible) + { + if (index.contains(name)) + { + strongconnect(name); + } + } + + // SCC size > 1 -> mutual recursion + // SCC size == 1 -> direct recursion if it calls itself + for (const auto &scc : sccs) + { + if (scc.size() > 1) + { + for (const auto &name : scc) + { + std::cout << "Function " << name << " is part of a mutually recursive cycle" << std::endl; + } + continue; + } + + const std::string &name = scc[0]; + auto it = callGraph.find(name); + if (it == callGraph.end()) + { + continue; + } - if (detectCycle(func.name)) + for (const auto &callee : it->second) { - std::cout << "Function " << func.name << " is part of a mutually recursive cycle" << std::endl; + if (callee == name) + { + std::cout << "Function " << name << " is directly recursive" << std::endl; + break; + } } } } void ElfAnalyzer::analyzeRegisterUsage() const - { + { std::cout << "Analyzing register usage patterns..." << std::endl; for (const auto &func : m_functions) @@ -1001,7 +1103,7 @@ namespace ps2recomp } void ElfAnalyzer::analyzeFunctionSignatures() const - { + { std::cout << "Analyzing function signatures..." << std::endl; for (const auto &func : m_functions) @@ -1142,6 +1244,9 @@ namespace ps2recomp patchAddrs.push_back(patch.first); } + if (patchAddrs.size() == 0) + return; + std::sort(patchAddrs.begin(), patchAddrs.end()); for (size_t i = 0; i < patchAddrs.size() - 1; i++) @@ -1161,7 +1266,7 @@ namespace ps2recomp } bool ElfAnalyzer::identifyMemcpyPattern(const Function &func) const - { + { std::vector instructions = decodeFunction(func); bool hasLoop = false; @@ -1169,7 +1274,7 @@ namespace ps2recomp bool storesData = false; bool incrementsPointers = false; - for (const auto & inst : instructions) + for (const auto &inst : instructions) { if (inst.isBranch) { @@ -1205,7 +1310,7 @@ namespace ps2recomp } bool ElfAnalyzer::identifyMemsetPattern(const Function &func) const - { + { std::vector instructions = decodeFunction(func); bool hasLoop = false; @@ -1213,7 +1318,7 @@ namespace ps2recomp bool storesData = false; bool incrementsPointer = false; - for (const auto & inst : instructions) + for (const auto &inst : instructions) { if (inst.isBranch) { @@ -1248,7 +1353,7 @@ namespace ps2recomp } bool ElfAnalyzer::identifyStringOperationPattern(const Function &func) const - { + { std::vector instructions = decodeFunction(func); bool hasLoop = false; @@ -1256,7 +1361,7 @@ namespace ps2recomp bool loadsByte = false; bool storesByte = false; - for (const auto & inst : instructions) + for (const auto &inst : instructions) { if (inst.isBranch) { @@ -1288,7 +1393,7 @@ namespace ps2recomp } bool ElfAnalyzer::identifyMathPattern(const Function &func) const - { + { std::vector instructions = decodeFunction(func); int mathOps = 0; @@ -1319,7 +1424,7 @@ namespace ps2recomp } CFG ElfAnalyzer::buildCFG(const Function &function) const - { + { CFG cfg; std::vector instructions = decodeFunction(function); std::map addrToIndex; @@ -1548,6 +1653,9 @@ namespace ps2recomp if (name.empty()) return false; + if (m_knownLibNames.find(name) != m_knownLibNames.end()) + return true; + if (hasPs2ApiPrefix(name)) return true; @@ -1562,7 +1670,7 @@ namespace ps2recomp } std::vector ElfAnalyzer::decodeFunction(const Function &function) const - { + { std::vector instructions; for (uint32_t addr = function.start; addr < function.end; addr += 4) @@ -1597,7 +1705,7 @@ namespace ps2recomp } bool ElfAnalyzer::hasMMIInstructions(const Function &function) const - { + { std::vector instructions = decodeFunction(function); for (const auto &inst : instructions) @@ -1612,7 +1720,7 @@ namespace ps2recomp } bool ElfAnalyzer::hasVUInstructions(const Function &function) const - { + { std::vector instructions = decodeFunction(function); for (const auto &inst : instructions) @@ -1629,7 +1737,7 @@ namespace ps2recomp bool ElfAnalyzer::identifyFunctionType(const Function &function) { if (m_libFunctions.contains(function.name) || - m_skipFunctions.contains(function.name)) + m_skipFunctions.contains(function.name)) { return false; } @@ -1680,11 +1788,11 @@ namespace ps2recomp return true; } - if (hasComplexMMI && isVeryLarge) + if (hasComplexMMI && isVeryLarge) { - m_skipFunctions.insert(function.name); - std::cout << "Skipping large function " << function.name << " with complex MMI" << std::endl; - return true; + m_skipFunctions.insert(function.name); + std::cout << "Skipping large function " << function.name << " with complex MMI" << std::endl; + return true; } return false; @@ -1710,7 +1818,7 @@ namespace ps2recomp } bool ElfAnalyzer::isSelfModifyingCode(const Function &function) const - { + { std::vector instructions = decodeFunction(function); for (size_t i = 0; i < instructions.size(); i++) @@ -1756,11 +1864,11 @@ namespace ps2recomp } bool ElfAnalyzer::isLoopHeavyFunction(const Function &function) const - { + { std::vector instructions = decodeFunction(function); int loopCount = 0; - for (const auto & inst : instructions) + for (const auto &inst : instructions) { if (inst.isBranch) { @@ -1784,9 +1892,9 @@ namespace ps2recomp return currentAddr + 4 + offset; } - if (inst.opcode == OPCODE_J || inst.opcode == OPCODE_JAL) + if (inst.opcode == OPCODE_J || inst.opcode == OPCODE_JAL) { - return (currentAddr & 0xF0000000) | (inst.target << 2); + return (currentAddr & 0xF0000000) | (inst.target << 2); } return currentAddr + 4; diff --git a/ps2xRecomp/CMakeLists.txt b/ps2xRecomp/CMakeLists.txt index b18cd37..e8ba62b 100644 --- a/ps2xRecomp/CMakeLists.txt +++ b/ps2xRecomp/CMakeLists.txt @@ -10,7 +10,7 @@ include(FetchContent) FetchContent_Declare( elfio GIT_REPOSITORY https://github.com/serge1/ELFIO.git - GIT_TAG main + GIT_TAG 7d30a22fc5aac06adfe7887ae57f3701b6b5f913 GIT_SHALLOW TRUE ) FetchContent_MakeAvailable(elfio) @@ -25,9 +25,34 @@ FetchContent_MakeAvailable(toml11) FetchContent_Declare( fmt GIT_REPOSITORY https://github.com/fmtlib/fmt.git - GIT_TAG master + GIT_TAG 12.1.0 ) FetchContent_MakeAvailable(fmt) + +FetchContent_Declare( + libdwarf + GIT_REPOSITORY https://github.com/davea42/libdwarf-code.git + GIT_TAG v2.2.0 + GIT_SHALLOW TRUE +) + +FetchContent_Declare( + libdwarf + GIT_REPOSITORY https://github.com/davea42/libdwarf-code.git + GIT_TAG v2.2.0 + GIT_SHALLOW TRUE +) + +set(BUILD_DWARFDUMP OFF CACHE BOOL "" FORCE) +set(BUILD_DWARFEXAMPLE OFF CACHE BOOL "" FORCE) +set(BUILD_DWARFGEN OFF CACHE BOOL "" FORCE) + +set(BUILD_SHARED OFF CACHE BOOL "" FORCE) +set(BUILD_NON_SHARED ON CACHE BOOL "" FORCE) + +FetchContent_MakeAvailable(libdwarf) + +set(LIBDWARF_INCLUDE_DIR "${libdwarf_SOURCE_DIR}/src/lib/libdwarf") file(GLOB_RECURSE PS2RECOMP_LIB_SOURCES CONFIGURE_DEPENDS src/lib/*.cpp @@ -40,10 +65,18 @@ file(GLOB_RECURSE PS2RECOMP_HEADERS CONFIGURE_DEPENDS add_library(ps2_recomp_lib STATIC ${PS2RECOMP_LIB_SOURCES} ${PS2RECOMP_HEADERS}) +target_compile_definitions(ps2_recomp_lib +PUBLIC + LIBDWARF_STATIC +PRIVATE + LIBDWARF_STATIC +) + target_include_directories(ps2_recomp_lib PUBLIC ${CMAKE_CURRENT_SOURCE_DIR}/include ${elfio_SOURCE_DIR} + ${LIBDWARF_INCLUDE_DIR} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/../ps2xRuntime/include @@ -52,7 +85,8 @@ PRIVATE target_link_libraries(ps2_recomp_lib PUBLIC fmt::fmt - toml11::toml11 + toml11::toml11 + dwarf ) file(GLOB_RECURSE PS2RECOMP_EXE_SOURCES CONFIGURE_DEPENDS diff --git a/ps2xRecomp/include/ps2recomp/code_generator.h b/ps2xRecomp/include/ps2recomp/code_generator.h index 2c950d8..2a329f7 100644 --- a/ps2xRecomp/include/ps2recomp/code_generator.h +++ b/ps2xRecomp/include/ps2recomp/code_generator.h @@ -29,6 +29,7 @@ namespace ps2recomp uint32_t bssStart = 0; uint32_t bssEnd = 0; uint32_t gp = 0; + std::string entryName; }; std::string generateFunction(const Function &function, const std::vector &instructions, const bool &useHeaders); @@ -121,8 +122,8 @@ namespace ps2recomp const std::vector &entries); std::string generateBootstrapFunction() const; - Symbol *findSymbolByAddress(uint32_t address); - std::string getFunctionName(uint32_t address); + const Symbol *findSymbolByAddress(uint32_t address) const; + std::string getFunctionName(uint32_t address) const; std::string getGeneratedFunctionName(const Function &function); }; diff --git a/ps2xRecomp/include/ps2recomp/config_manager.h b/ps2xRecomp/include/ps2recomp/config_manager.h index 38be56c..8f1375d 100644 --- a/ps2xRecomp/include/ps2recomp/config_manager.h +++ b/ps2xRecomp/include/ps2recomp/config_manager.h @@ -6,11 +6,10 @@ namespace ps2recomp { - class ConfigManager { public: - explicit ConfigManager(const std::string &configPath); + explicit ConfigManager(const std::string &configPath); ~ConfigManager(); RecompilerConfig loadConfig() const; diff --git a/ps2xRecomp/include/ps2recomp/elf_parser.h b/ps2xRecomp/include/ps2recomp/elf_parser.h index 8ee9da0..e1abdfa 100644 --- a/ps2xRecomp/include/ps2recomp/elf_parser.h +++ b/ps2xRecomp/include/ps2recomp/elf_parser.h @@ -8,46 +8,50 @@ namespace ps2recomp { - struct Relocation; - struct Section; - struct Function; - struct Symbol; - - class ElfParser - { - public: - explicit ElfParser(const std::string &filePath); - ~ElfParser(); - - bool parse(); - - std::vector extractFunctions() const; - std::vector extractSymbols(); - std::vector
getSections(); - std::vector getRelocations(); - - // Helper methods - bool isValidAddress(uint32_t address) const; - uint32_t readWord(uint32_t address) const; - uint8_t *getSectionData(const std::string §ionName) const; - uint32_t getSectionAddress(const std::string §ionName) const; - uint32_t getSectionSize(const std::string §ionName) const; - uint32_t getEntryPoint() const; - - private: - std::string m_filePath; - std::unique_ptr m_elf; - - std::vector
m_sections; - std::vector m_symbols; - std::vector m_relocations; - - void loadSections(); - void loadSymbols(); - void loadRelocations(); - bool isExecutableSection(const ELFIO::section *section) const; - bool isDataSection(const ELFIO::section *section) const; - }; + struct Relocation; + struct Section; + struct Function; + struct Symbol; + + class ElfParser + { + public: + explicit ElfParser(const std::string &filePath); + ~ElfParser(); + + bool parse(); + + bool loadGhidraFunctionMap(const std::string &mapPath); + std::vector extractFunctions() const; + std::vector extractExtraFunctions() const; + std::vector extractSymbols(); + std::vector
getSections(); + std::vector getRelocations(); + + // Helper methods + bool isValidAddress(uint32_t address) const; + uint32_t readWord(uint32_t address) const; + uint8_t *getSectionData(const std::string §ionName) const; + uint32_t getSectionAddress(const std::string §ionName) const; + uint32_t getSectionSize(const std::string §ionName) const; + uint32_t getEntryPoint() const; + + private: + std::string m_filePath; + std::unique_ptr m_elf; + + std::vector
m_sections; + std::vector m_symbols; + std::vector m_relocations; + std::vector m_extraFunctions; + + void loadSections(); + void loadSymbols(); + void loadRelocations(); + void loadDebugFunctions(); + bool isExecutableSection(const ELFIO::section *section) const; + bool isDataSection(const ELFIO::section *section) const; + }; } // namespace ps2recomp diff --git a/ps2xRecomp/include/ps2recomp/types.h b/ps2xRecomp/include/ps2recomp/types.h index 7855ccd..0ecd239 100644 --- a/ps2xRecomp/include/ps2recomp/types.h +++ b/ps2xRecomp/include/ps2recomp/types.h @@ -165,6 +165,7 @@ namespace ps2recomp { std::string inputPath; std::string outputPath; + std::string ghidraMapPath; bool singleFileOutput; std::vector skipFunctions; std::unordered_map patches; diff --git a/ps2xRecomp/src/lib/code_generator.cpp b/ps2xRecomp/src/lib/code_generator.cpp index e082d85..1409fd0 100644 --- a/ps2xRecomp/src/lib/code_generator.cpp +++ b/ps2xRecomp/src/lib/code_generator.cpp @@ -30,9 +30,10 @@ namespace ps2recomp { CodeGenerator::CodeGenerator(const std::vector &symbols) { - for (auto& symbol : symbols) { - m_symbols.emplace(symbol.address, symbol); - } + for (auto &symbol : symbols) + { + m_symbols.emplace(symbol.address, symbol); + } } void CodeGenerator::setRenamedFunctions(const std::unordered_map &renames) @@ -45,7 +46,7 @@ namespace ps2recomp m_bootstrapInfo = info; } - std::string CodeGenerator::getFunctionName(uint32_t address) + std::string CodeGenerator::getFunctionName(uint32_t address) const { auto it = m_renamedFunctions.find(address); if (it != m_renamedFunctions.end()) @@ -53,7 +54,7 @@ namespace ps2recomp return it->second; } - Symbol *sym = findSymbolByAddress(address); + const Symbol *sym = findSymbolByAddress(address); if (sym && sym->isFunction) { return sym->name; @@ -76,7 +77,7 @@ namespace ps2recomp return kKeywords.contains(name); } - static std::string sanitizeFunctionName(const std::string& name) + static std::string sanitizeFunctionName(const std::string &name) { std::string sanitized = name; @@ -1972,7 +1973,7 @@ namespace ps2recomp { // VCALLMS calls a VU0 microprogram at the specified immediate address. // VU0 micro memory is 4KB = 512 instructions (8 bytes each). Index is 0-511. - uint16_t instr_index = inst.immediate & 0x1FF; // Mask to 9 bits for VU0 + uint16_t instr_index = inst.immediate & 0x1FF; // Mask to 9 bits for VU0 uint32_t target_byte_addr = static_cast(instr_index) << 3; // Convert instruction index to byte address return fmt::format( @@ -2312,11 +2313,12 @@ namespace ps2recomp return ss.str(); } - Symbol *CodeGenerator::findSymbolByAddress(uint32_t address) + const Symbol *CodeGenerator::findSymbolByAddress(uint32_t address) const { auto it = m_symbols.find(address); - if (it != m_symbols.end()) { - return &it->second; + if (it != m_symbols.end()) + { + return &it->second; } return nullptr; @@ -2348,7 +2350,14 @@ namespace ps2recomp { ss << " SET_GPR_U32(ctx, 29, bss_end);\n"; } - ss << " ps2_main(rdram, ctx, runtime);\n"; + if (!m_bootstrapInfo.entryName.empty()) + { + ss << " " << m_bootstrapInfo.entryName << "(rdram, ctx, runtime);\n"; + } + else + { + throw std::runtime_error(" No entry function name available for bootstrap."); + } ss << "}\n"; return ss.str(); } diff --git a/ps2xRecomp/src/lib/config_manager.cpp b/ps2xRecomp/src/lib/config_manager.cpp index adbafbe..b5d0db4 100644 --- a/ps2xRecomp/src/lib/config_manager.cpp +++ b/ps2xRecomp/src/lib/config_manager.cpp @@ -15,7 +15,7 @@ namespace ps2recomp ConfigManager::~ConfigManager() = default; RecompilerConfig ConfigManager::loadConfig() const - { + { RecompilerConfig config; try @@ -24,6 +24,7 @@ namespace ps2recomp auto data = toml::parse(m_configPath); config.inputPath = toml::find(data, "general", "input"); + config.ghidraMapPath = toml::find(data, "general", "ghidra_output"); config.outputPath = toml::find(data, "general", "output"); config.singleFileOutput = toml::find(data, "general", "single_file_output"); config.stubImplementations = toml::find>(data, "general", "stubs"); @@ -59,7 +60,7 @@ namespace ps2recomp } void ConfigManager::saveConfig(const RecompilerConfig &config) const - { + { toml::value data; toml::table general; @@ -77,7 +78,7 @@ namespace ps2recomp toml::table patches; toml::array instPatches; - for (const auto & [addr, value] : config.patches) + for (const auto &[addr, value] : config.patches) { toml::table p; p["address"] = "0x" + std::to_string(addr); diff --git a/ps2xRecomp/src/lib/elf_parser.cpp b/ps2xRecomp/src/lib/elf_parser.cpp index 143a04c..193a521 100644 --- a/ps2xRecomp/src/lib/elf_parser.cpp +++ b/ps2xRecomp/src/lib/elf_parser.cpp @@ -2,6 +2,466 @@ #include "ps2recomp/types.h" #include #include +#include +#define NOMINMAX + +#include + +#if defined(_WIN32) +#include +#include +#include +#else +#include +#endif + +#include +#include +#include +#include +#include + +namespace +{ + bool IsAutoGeneratedName(const std::string &name) + { + return name.rfind("sub_", 0) == 0; + } + + void AppendLoadSegmentsAsSections(const ELFIO::elfio &elf, std::vector §ions) + { + const ELFIO::Elf_Half segCount = elf.segments.size(); + if (segCount == 0) + { + return; + } + + for (ELFIO::Elf_Half i = 0; i < segCount; ++i) + { + ELFIO::segment *segment = elf.segments[i]; + if (!segment || segment->get_type() != ELFIO::PT_LOAD) + { + continue; + } + + const ELFIO::Elf64_Addr vaddr = segment->get_virtual_address(); + const ELFIO::Elf_Xword fileSize = segment->get_file_size(); + const ELFIO::Elf_Xword memSize = segment->get_memory_size(); + const ELFIO::Elf_Word flags = segment->get_flags(); + + if (vaddr > 0xFFFFFFFFu || fileSize > 0xFFFFFFFFu || memSize > 0xFFFFFFFFu) + { + continue; + } + + if (fileSize > 0) + { + ps2recomp::Section load{}; + load.name = "LOAD" + std::to_string(i); + load.address = static_cast(vaddr); + load.size = static_cast(fileSize); + load.offset = static_cast(segment->get_offset()); + load.isCode = (flags & ELFIO::PF_X) != 0; + load.isData = (flags & ELFIO::PF_W) != 0 || (flags & ELFIO::PF_R) != 0; + load.isBSS = false; + load.isReadOnly = (flags & ELFIO::PF_W) == 0; + load.data = const_cast( + reinterpret_cast(segment->get_data())); + + sections.push_back(load); + } + + if (memSize > fileSize) + { + ps2recomp::Section bss{}; + bss.name = "LOAD" + std::to_string(i) + ".bss"; + bss.address = static_cast(vaddr + fileSize); + bss.size = static_cast(memSize - fileSize); + bss.offset = static_cast(segment->get_offset() + fileSize); + bss.isCode = false; + bss.isData = true; + bss.isBSS = true; + bss.isReadOnly = false; + bss.data = nullptr; + + sections.push_back(bss); + } + } + + if (!sections.empty()) + { + std::sort(sections.begin(), sections.end(), + [](const ps2recomp::Section &a, const ps2recomp::Section &b) + { return a.address < b.address; }); + } + } + + const ps2recomp::Section *FindSectionByAddress(const std::vector §ions, uint32_t address) + { + for (const auto §ion : sections) + { + if (address >= section.address && address < (section.address + section.size)) + { + return §ion; + } + } + return nullptr; + } +} + +namespace +{ + bool HasDwarfSections(const ELFIO::elfio &elf) + { + for (ELFIO::Elf_Half i = 0; i < elf.sections.size(); ++i) + { + const ELFIO::section *section = elf.sections[i]; + const std::string &name = section->get_name(); + if (name.rfind(".debug_", 0) == 0 || name.rfind(".zdebug_", 0) == 0) + { + return true; + } + } + return false; + } + + const ps2recomp::Section *FindCodeSectionByAddress(const std::vector §ions, uint32_t address) + { + for (const auto §ion : sections) + { + if (!section.isCode) + { + continue; + } + + if (address >= section.address && address < (section.address + section.size)) + { + return §ion; + } + } + return nullptr; + } + + std::string MakeAutoFunctionName(uint32_t address) + { + char buffer[32]{}; + std::snprintf(buffer, sizeof(buffer), "sub_%08X", address); + return std::string(buffer); + } + + std::string ReadDieName(Dwarf_Debug dbg, Dwarf_Die die, Dwarf_Error *error) + { + const int kAttrsToTry[] = + { +#ifdef DW_AT_linkage_name + DW_AT_linkage_name, +#endif +#ifdef DW_AT_MIPS_linkage_name + DW_AT_MIPS_linkage_name, +#endif + }; + + for (int attrNum : kAttrsToTry) + { + Dwarf_Attribute attr = nullptr; + if (dwarf_attr(die, attrNum, &attr, error) == DW_DLV_OK) + { + char *attrString = nullptr; + if (dwarf_formstring(attr, &attrString, error) == DW_DLV_OK && attrString) + { + std::string result(attrString); + dwarf_dealloc(dbg, attrString, DW_DLA_STRING); + dwarf_dealloc(dbg, attr, DW_DLA_ATTR); + return result; + } + dwarf_dealloc(dbg, attr, DW_DLA_ATTR); + } + } + + // Fallback: DW_AT_name + char *dieName = nullptr; + if (dwarf_diename(die, &dieName, error) == DW_DLV_OK && dieName) + { + std::string result(dieName); + dwarf_dealloc(dbg, dieName, DW_DLA_STRING); + return result; + } + + return {}; + } + + bool TryReadDieRange( + Dwarf_Debug dbg, + Dwarf_Die die, + uint32_t &outLowPc, + uint32_t &outHighPc, + Dwarf_Error *error) + { + outLowPc = 0; + outHighPc = 0; + + Dwarf_Addr lowPc = 0; + if (dwarf_lowpc(die, &lowPc, error) != DW_DLV_OK) + { + return false; + } + + // high_pc can be absolute address (DWARF2/3) or offset from low_pc (DWARF4+) + Dwarf_Addr highPc = 0; + Dwarf_Half highPcForm = 0; + Dwarf_Form_Class highPcClass = DW_FORM_CLASS_UNKNOWN; + + if (dwarf_highpc_b(die, &highPc, &highPcForm, &highPcClass, error) == DW_DLV_OK) + { + if (highPcClass == DW_FORM_CLASS_CONSTANT) + { + highPc = lowPc + highPc; + } + + if (lowPc <= 0xFFFFFFFFu && highPc <= 0xFFFFFFFFu && highPc > lowPc) + { + outLowPc = static_cast(lowPc); + outHighPc = static_cast(highPc); + return true; + } + return false; + } + + // If no high_pc, try DW_AT_ranges + Dwarf_Attribute rangesAttr = nullptr; + if (dwarf_attr(die, DW_AT_ranges, &rangesAttr, error) != DW_DLV_OK) + { + if (lowPc <= 0xFFFFFFFFu) + { + outLowPc = static_cast(lowPc); + outHighPc = static_cast(lowPc + 4); + return true; + } + return false; + } + + Dwarf_Off rangesOffset = 0; + if (dwarf_global_formref(rangesAttr, &rangesOffset, error) != DW_DLV_OK) + { + dwarf_dealloc(dbg, rangesAttr, DW_DLA_ATTR); + return false; + } + + Dwarf_Ranges *ranges = nullptr; + Dwarf_Signed rangesCount = 0; + Dwarf_Unsigned byteCount = 0; + Dwarf_Off realOffset = 0; + + if (dwarf_get_ranges_b(dbg, rangesOffset, die, &realOffset, &ranges, &rangesCount, &byteCount, error) != DW_DLV_OK) + { + dwarf_dealloc(dbg, rangesAttr, DW_DLA_ATTR); + return false; + } + + Dwarf_Addr baseAddr = lowPc; + Dwarf_Addr minPc = 0; + Dwarf_Addr maxPc = 0; + bool hasAny = false; + + for (Dwarf_Signed i = 0; i < rangesCount; ++i) + { + const Dwarf_Ranges &entry = ranges[i]; + + if (entry.dwr_type == DW_RANGES_END) + { + break; + } + + if (entry.dwr_type == DW_RANGES_ADDRESS_SELECTION) + { + baseAddr = entry.dwr_addr2; + continue; + } + + if (entry.dwr_type != DW_RANGES_ENTRY) + { + continue; + } + + const Dwarf_Addr start = baseAddr + entry.dwr_addr1; + const Dwarf_Addr end = baseAddr + entry.dwr_addr2; + + if (end <= start) + { + continue; + } + + if (!hasAny) + { + minPc = start; + maxPc = end; + hasAny = true; + } + else + { + minPc = std::min(minPc, start); + maxPc = std::max(maxPc, end); + } + } + + dwarf_dealloc_ranges(dbg, ranges, rangesCount); + dwarf_dealloc(dbg, rangesAttr, DW_DLA_ATTR); + + if (!hasAny) + { + return false; + } + + if (minPc <= 0xFFFFFFFFu && maxPc <= 0xFFFFFFFFu && maxPc > minPc) + { + outLowPc = static_cast(minPc); + outHighPc = static_cast(maxPc); + return true; + } + + return false; + } + + void VisitDieTreeAndCollectFunctions( + Dwarf_Debug dbg, + Dwarf_Die rootDie, + ps2recomp::ElfParser *parser, + std::vector &outFunctions) + { + Dwarf_Error error = nullptr; + + Dwarf_Die current = rootDie; + while (current) + { + Dwarf_Half tag = 0; + if (dwarf_tag(current, &tag, &error) == DW_DLV_OK) + { + if (tag == DW_TAG_subprogram) + { + uint32_t lowPc = 0; + uint32_t highPc = 0; + + if (TryReadDieRange(dbg, current, lowPc, highPc, &error)) + { + if (FindCodeSectionByAddress(parser->getSections(), lowPc)) + { + ps2recomp::Function func{}; + func.name = ReadDieName(dbg, current, &error); + func.start = lowPc; + func.end = highPc; + func.isRecompiled = false; + func.isStub = false; + + if (func.name.empty()) + { + func.name = MakeAutoFunctionName(func.start); + } + + outFunctions.push_back(std::move(func)); + } + } + } + } + + // Depth-first: child first + Dwarf_Die child = nullptr; + if (dwarf_child(current, &child, &error) == DW_DLV_OK) + { + VisitDieTreeAndCollectFunctions(dbg, child, parser, outFunctions); + } + + // Next sibling + Dwarf_Die sibling = nullptr; + const int siblingResult = dwarf_siblingof_b(dbg, current, TRUE, &sibling, &error); + dwarf_dealloc(dbg, current, DW_DLA_DIE); + + if (siblingResult != DW_DLV_OK) + { + break; + } + + current = sibling; + } + } + + void ScanJalTargetsFallback(ps2recomp::ElfParser *parser, std::vector &outFunctions) + { + std::unordered_set starts; + starts.reserve(4096); + + const uint32_t entry = parser->getEntryPoint(); + if (FindCodeSectionByAddress(parser->getSections(), entry)) + { + starts.insert(entry); + } + + const auto §ions = parser->getSections(); + for (const auto §ion : sections) + { + if (!section.isCode || !section.data || section.size < 4) + { + continue; + } + + for (uint32_t offset = 0; offset + 4 <= section.size; offset += 4) + { + const uint32_t pc = section.address + offset; + + uint32_t raw = 0; + std::memcpy(&raw, section.data + offset, sizeof(uint32_t)); + + const uint32_t op = (raw >> 26) & 0x3F; + if (op != 0x03) // JAL + { + continue; + } + + const uint32_t index = raw & 0x03FFFFFF; + const uint32_t target = ((pc + 4) & 0xF0000000u) | (index << 2); + + if (FindCodeSectionByAddress(sections, target)) + { + starts.insert(target); + } + } + } + + std::vector sortedStarts(starts.begin(), starts.end()); + std::sort(sortedStarts.begin(), sortedStarts.end()); + + for (size_t i = 0; i < sortedStarts.size(); ++i) + { + const uint32_t start = sortedStarts[i]; + + const ps2recomp::Section *sec = FindCodeSectionByAddress(sections, start); + if (!sec) + { + continue; + } + + const uint32_t secEnd = sec->address + sec->size; + + uint32_t end = secEnd; + if (i + 1 < sortedStarts.size()) + { + const uint32_t next = sortedStarts[i + 1]; + if (next > start && next < secEnd) + { + end = next; + } + } + + ps2recomp::Function func{}; + func.name = MakeAutoFunctionName(start); + func.start = start; + func.end = (end > start) ? end : (start + 4); + func.isRecompiled = false; + func.isStub = false; + + outFunctions.push_back(std::move(func)); + } + } +} namespace ps2recomp { @@ -23,28 +483,96 @@ namespace ps2recomp } std::vector ElfParser::extractFunctions() const - { + { std::vector functions; + functions.reserve(m_symbols.size() + m_extraFunctions.size()); - for (const auto &symbol : m_symbols) + std::unordered_map indexByStart; + indexByStart.reserve(functions.capacity()); + + auto addOrMerge = [&](const Function &newFunction) { - if (symbol.isFunction && symbol.size > 0) + if (newFunction.start == 0) { - Function func; - func.name = symbol.name; - func.start = symbol.address; - func.end = symbol.address + symbol.size; - func.isRecompiled = false; - func.isStub = false; + return; + } - functions.push_back(func); + auto it = indexByStart.find(newFunction.start); + if (it == indexByStart.end()) + { + indexByStart.emplace(newFunction.start, functions.size()); + functions.push_back(newFunction); + return; } + + Function &existing = functions[it->second]; + + if (!newFunction.name.empty()) + { + if (existing.name.empty() || (IsAutoGeneratedName(existing.name) && !IsAutoGeneratedName(newFunction.name))) + { + existing.name = newFunction.name; + } + } + + if (newFunction.end > existing.end) + { + existing.end = newFunction.end; + } + + existing.isStub = existing.isStub || newFunction.isStub; + }; + + for (const auto &symbol : m_symbols) + { + if (!symbol.isFunction || symbol.isImported) + { + continue; + } + Function func; + func.name = symbol.name; + func.start = symbol.address; + func.end = (symbol.size > 0) ? (symbol.address + symbol.size) : 0; + func.isRecompiled = false; + func.isStub = false; + + addOrMerge(func); + } + + for (const auto &func : m_extraFunctions) + { + addOrMerge(func); } std::sort(functions.begin(), functions.end(), [](const Function &a, const Function &b) { return a.start < b.start; }); + for (size_t index = 0; index < functions.size(); ++index) + { + Function &func = functions[index]; + + if (func.end > func.start) + { + continue; + } + + const Section *section = FindSectionByAddress(m_sections, func.start); + uint32_t sectionEnd = section ? (section->address + section->size) : (func.start + 4); + + uint32_t nextStart = sectionEnd; + if (index + 1 < functions.size()) + { + const uint32_t candidate = functions[index + 1].start; + if (candidate > func.start && section && candidate < sectionEnd) + { + nextStart = candidate; + } + } + + func.end = (nextStart > func.start) ? nextStart : (func.start + 4); + } + return functions; } @@ -63,6 +591,11 @@ namespace ps2recomp return m_relocations; } + std::vector ElfParser::extractExtraFunctions() const + { + return m_extraFunctions; + } + bool ElfParser::isValidAddress(uint32_t address) const { for (const auto §ion : m_sections) @@ -94,7 +627,7 @@ namespace ps2recomp } uint8_t *ElfParser::getSectionData(const std::string §ionName) const - { + { for (const auto §ion : m_sections) { if (section.name == sectionName) @@ -107,7 +640,7 @@ namespace ps2recomp } uint32_t ElfParser::getSectionAddress(const std::string §ionName) const - { + { for (const auto §ion : m_sections) { if (section.name == sectionName) @@ -120,7 +653,7 @@ namespace ps2recomp } uint32_t ElfParser::getSectionSize(const std::string §ionName) const - { + { for (const auto §ion : m_sections) { if (section.name == sectionName) @@ -137,6 +670,91 @@ namespace ps2recomp return static_cast(m_elf->get_entry()); } + bool ElfParser::loadGhidraFunctionMap(const std::string &mapPath) + { + if (mapPath.empty()) + { + return false; + } + + std::ifstream file(mapPath); + if (!file.is_open()) + { + std::cerr << "Warning: Could not open Ghidra function map: " << mapPath << std::endl; + return false; + } + + std::string line; + if (!std::getline(file, line)) + { + return false; + } + + int count = 0; + while (std::getline(file, line)) + { + if (line.empty()) + continue; + + std::stringstream ss(line); + std::string name, startStr, endStr, sizeStr; + + if (!std::getline(ss, name, ',') || + !std::getline(ss, startStr, ',') || + !std::getline(ss, endStr, ',') || + !std::getline(ss, sizeStr, ',')) + { + continue; + } + + try + { + uint32_t start = std::stoul(startStr, nullptr, 0); + uint32_t end = std::stoul(endStr, nullptr, 0); + + Function func{}; + func.name = name; + func.start = start; + func.end = end; + func.isRecompiled = false; + func.isStub = false; + + m_extraFunctions.push_back(std::move(func)); + count++; + } + catch (...) + { + continue; + } + } + + if (count > 0) + { + std::cout << "Loaded " << count << " functions from Ghidra map" << std::endl; + + std::sort(m_extraFunctions.begin(), m_extraFunctions.end(), + [](const Function &a, const Function &b) + { return a.start < b.start; }); + + m_extraFunctions.erase( + std::unique(m_extraFunctions.begin(), m_extraFunctions.end(), + [](const Function &a, const Function &b) + { + if (a.start == b.start) + { + // pick the function with real name and not auto generated + return true; + } + return false; + }), + m_extraFunctions.end()); + + return true; + } + + return false; + } + ElfParser::~ElfParser() = default; bool ElfParser::parse() @@ -157,6 +775,7 @@ namespace ps2recomp loadSections(); loadSymbols(); loadRelocations(); + loadDebugFunctions(); return true; } @@ -192,6 +811,16 @@ namespace ps2recomp m_sections.push_back(section); } + + if (m_sections.empty()) + { + AppendLoadSegmentsAsSections(*m_elf, m_sections); + if (!m_sections.empty()) + { + std::cout << "Info: ELF has no section headers; using loadable segments as sections (" + << m_sections.size() << " entries)." << std::endl; + } + } } void ElfParser::loadSymbols() @@ -204,6 +833,12 @@ namespace ps2recomp if (psec->get_type() == ELFIO::SHT_SYMTAB || psec->get_type() == ELFIO::SHT_DYNSYM) { + if (psec->get_link() >= m_elf->sections.size()) + { + std::cerr << "Warning: Symbol section link out of bounds: " << psec->get_link() << std::endl; + continue; + } + ELFIO::symbol_section_accessor symbols(*m_elf, psec); ELFIO::Elf_Xword sym_num = symbols.get_symbols_num(); @@ -223,8 +858,7 @@ namespace ps2recomp symbols.get_symbol(j, name, value, size, bind, type, section_index, other); - // Skip empty symbols or those with invalid section index - if (name.empty() || section_index == ELFIO::SHN_UNDEF) + if (name.empty()) { continue; } @@ -234,8 +868,9 @@ namespace ps2recomp symbol.address = static_cast(value); symbol.size = static_cast(size); symbol.isFunction = (type == ELFIO::STT_FUNC); - symbol.isImported = (bind == ELFIO::STB_GLOBAL && section_index == ELFIO::SHN_UNDEF); - symbol.isExported = (bind == ELFIO::STB_GLOBAL && section_index != ELFIO::SHN_UNDEF); + + symbol.isImported = section_index == ELFIO::SHN_UNDEF; + symbol.isExported = (!symbol.isImported && bind == ELFIO::STB_GLOBAL); m_symbols.push_back(symbol); } @@ -253,9 +888,22 @@ namespace ps2recomp if (psec->get_type() == ELFIO::SHT_REL || psec->get_type() == ELFIO::SHT_RELA) { + if (psec->get_link() >= m_elf->sections.size()) + { + std::cout << "Warning: Relocation section link out of bounds: " << psec->get_link() << std::endl; + continue; + } + ELFIO::relocation_section_accessor relocs(*m_elf, psec); ELFIO::section *symSec = m_elf->sections[psec->get_link()]; + + if (symSec->get_link() >= m_elf->sections.size()) + { + std::cout << "Warning: Symbol section link out of bounds (in relocation): " << symSec->get_link() << std::endl; + continue; + } + ELFIO::symbol_section_accessor symbols(*m_elf, symSec); ELFIO::section *strSec = m_elf->sections[symSec->get_link()]; @@ -294,4 +942,91 @@ namespace ps2recomp } } } -} \ No newline at end of file + + void ElfParser::loadDebugFunctions() + { + m_extraFunctions.clear(); + + if (HasDwarfSections(*m_elf)) + { +#if defined(_WIN32) + const int fileDescriptor = _open(m_filePath.c_str(), _O_RDONLY | _O_BINARY); +#else + const int fileDescriptor = ::open(m_filePath.c_str(), O_RDONLY); +#endif + if (fileDescriptor >= 0) + { + Dwarf_Debug dbg = nullptr; + Dwarf_Error error = nullptr; + + const int initResult = dwarf_init_b(fileDescriptor, DW_GROUPNUMBER_BASE, nullptr, nullptr, &dbg, &error); + if (initResult == DW_DLV_OK) + { + for (;;) + { + Dwarf_Unsigned cuHeaderLength = 0; + Dwarf_Half versionStamp = 0; + Dwarf_Unsigned abbrevOffset = 0; + Dwarf_Half addressSize = 0; + Dwarf_Half lengthSize = 0; + Dwarf_Half extensionSize = 0; + Dwarf_Sig8 typeSignature = {0}; + Dwarf_Unsigned typeOffset = 0; + Dwarf_Unsigned nextCuHeader = 0; + Dwarf_Half headerCuType = 0; + Dwarf_Die cuDie = nullptr; + + const int cuResult = dwarf_next_cu_header_e( + dbg, + TRUE, + &cuDie, + &cuHeaderLength, + &versionStamp, + &abbrevOffset, + &addressSize, + &lengthSize, + &extensionSize, + &typeSignature, + &typeOffset, + &nextCuHeader, + &headerCuType, + &error); + + if (cuResult != DW_DLV_OK) + { + break; + } + + if (cuDie != nullptr) + { + VisitDieTreeAndCollectFunctions(dbg, cuDie, this, m_extraFunctions); + } + } + + dwarf_finish(dbg); + } + +#if defined(_WIN32) + _close(fileDescriptor); +#else + ::close(fileDescriptor); +#endif + } + } + + if (m_extraFunctions.empty()) + { + ScanJalTargetsFallback(this, m_extraFunctions); + } + + std::sort(m_extraFunctions.begin(), m_extraFunctions.end(), + [](const Function &a, const Function &b) + { return a.start < b.start; }); + + m_extraFunctions.erase( + std::unique(m_extraFunctions.begin(), m_extraFunctions.end(), + [](const Function &a, const Function &b) + { return a.start == b.start; }), + m_extraFunctions.end()); + } +} diff --git a/ps2xRecomp/src/lib/ps2_recompiler.cpp b/ps2xRecomp/src/lib/ps2_recompiler.cpp index cd817f5..9b65eeb 100644 --- a/ps2xRecomp/src/lib/ps2_recompiler.cpp +++ b/ps2xRecomp/src/lib/ps2_recompiler.cpp @@ -71,6 +71,11 @@ namespace ps2recomp return false; } + if (!m_config.ghidraMapPath.empty()) + { + m_elfParser->loadGhidraFunctionMap(m_config.ghidraMapPath); + } + m_functions = m_elfParser->extractFunctions(); m_symbols = m_elfParser->extractSymbols(); m_sections = m_elfParser->getSections(); @@ -250,6 +255,27 @@ namespace ps2recomp m_codeGenerator->setRenamedFunctions(m_functionRenames); } + if (m_bootstrapInfo.valid && m_codeGenerator) + { + auto entryIt = std::find_if(m_functions.begin(), m_functions.end(), + [&](const Function &fn) + { return fn.start == m_bootstrapInfo.entry; }); + if (entryIt != m_functions.end()) + { + auto renameIt = m_functionRenames.find(entryIt->start); + if (renameIt != m_functionRenames.end()) + { + m_bootstrapInfo.entryName = renameIt->second; + } + else + { + m_bootstrapInfo.entryName = sanitizeFunctionName(entryIt->name); + } + } + + m_codeGenerator->setBootstrapInfo(m_bootstrapInfo); + } + m_generatedStubs.clear(); for (const auto &function : m_functions) { @@ -709,7 +735,7 @@ namespace ps2recomp return outputPath; } - std::string PS2Recompiler::sanitizeFunctionName(const std::string& name) const + std::string PS2Recompiler::sanitizeFunctionName(const std::string &name) const { std::string sanitized = name; std::replace(sanitized.begin(), sanitized.end(), '.', '_'); @@ -727,7 +753,7 @@ namespace ps2recomp if (sanitized.size() >= 2 && sanitized[0] == '_' && (sanitized[1] == '_' || - std::isupper(static_cast(sanitized[1])))) + std::isupper(static_cast(sanitized[1])))) { return "ps2_" + sanitized; } diff --git a/ps2xRecomp/tools/ghidra/ExportPS2Functions.java b/ps2xRecomp/tools/ghidra/ExportPS2Functions.java new file mode 100644 index 0000000..4198258 --- /dev/null +++ b/ps2xRecomp/tools/ghidra/ExportPS2Functions.java @@ -0,0 +1,54 @@ +// Exports function addresses and names to CSV for PS2Recomp +// @category PS2Recomp + +import ghidra.app.script.GhidraScript; +import ghidra.program.model.address.AddressSetView; +import ghidra.program.model.listing.Function; +import ghidra.program.model.listing.FunctionIterator; +import ghidra.program.model.listing.FunctionManager; + +import java.io.File; +import java.io.PrintWriter; + +public class ExportPS2Functions extends GhidraScript { + + @Override + public void run() throws Exception { + File file = askFile("Choose output CSV file", "Save"); + + if (file == null) { + return; + } + + int count = 0; + try (PrintWriter writer = new PrintWriter(file)) { + writer.println("Name,Start,End,Size"); + + FunctionManager fm = currentProgram.getFunctionManager(); + FunctionIterator it = fm.getFunctions(true); + + while (it.hasNext() && !monitor.isCancelled()) { + Function func = it.next(); + + String name = func.getName(); + long start = func.getEntryPoint().getOffset(); + + AddressSetView body = func.getBody(); + long maxAddr = body.getMaxAddress().getOffset(); + + long size = body.getNumAddresses(); + + writer.printf("%s,0x%08X,0x%08X,%d%n", + name, + start, + maxAddr + 1, // End address is exclusive + size + ); + + count++; + } + } + + println(String.format("Exported %d functions to %s", count, file.getAbsolutePath())); + } +} diff --git a/ps2xRecomp/tools/ghidra/ExportPS2Functions.py b/ps2xRecomp/tools/ghidra/ExportPS2Functions.py new file mode 100644 index 0000000..b0e7ed8 --- /dev/null +++ b/ps2xRecomp/tools/ghidra/ExportPS2Functions.py @@ -0,0 +1,43 @@ +# Exports function addresses and names to CSV for PS2Recomp +# @category PS2Recomp + +import csv +import os + +from ghidra.program.model.symbol import SourceType + +def run(): + f = askFile("Choose output CSV file", "Save") + + if f is None: + return + + with open(f.getAbsolutePath(), 'w') as csvfile: + writer = csv.writer(csvfile) + writer.writerow(['Name', 'Start', 'End', 'Size']) + + fm = currentProgram.getFunctionManager() + functions = fm.getFunctions(True) # True iterates forward wtf kkkkk + + count = 0 + for func in functions: + name = func.getName() + start = func.getEntryPoint().getOffset() + + body = func.getBody() + max_addr = body.getMaxAddress().getOffset() + + size = body.getNumAddresses() + + writer.writerow([ + name, + "0x{:08X}".format(start), + "0x{:08X}".format(max_addr + 1), # End address is exclusive + size + ]) + count += 1 + + print("Exported {} functions to {}".format(count, f.getAbsolutePath())) + +if __name__ == "__main__": + run()