Skip to content

The execution of the project will not only crash in main_ga, but also hang midway in other scripts such as main_baseline #5

@lilejin322

Description

@lilejin322

Sorry for the bothering again. We've encountered some problems while running this project and don't know how to solve them.

Describe the issue

  The script test_main.py can complete execution smoothly due to its single iteration. However, when switching to scripts like main_baseline.py and main_ga.py, which require long-running processes, the system appears to be unstable. Specifically, after running the scenario several times, it will eventually crash with a segmentation fault error indicated from the shell.
  We’ve diagnosed the project using tools like pdb, but still couldn’t find where does the distinct error come from. The screenshot is shown as follow.

shell segmentation fault

Environment

  1. Hardware configuration
    CPU: Intel Core i9 14900K (24-core)
    Memory: 128GB
    Graphic Card: None
  2. Software configuration
    OS: Ubuntu 18.04
    Docker-CE: version 24.0.2
    Python: version 3.9.18
  3. Requirements to be met in README.md

To reproduce

  1. Run python main_ga.py
  2. Open the Dreamviews in browser
  3. After several iterations in the Genetic Cycle -> g0s0, g0s1,…, the shell will get stuck
  4. From the observation in the browser, it seems that there is a potential system freezing whenever one of the ADCs arrives its destination
  5. Dozens of minutes later, the shell reports “Segmentation fault” and the python script exits
  6. Command ~$ docker kill $(docker ps -q)
  7. Change the DoppelTest map data to san_mateo in config.py
  8. Update the {ApolloROOT}/modules/common/data/global_flagfile.txt as well
  9. Run python main_baseline.py
  10. Step 2-5 regenerated

Current Result

  The script might report an error and exit after running for several generations.

Expected Result

  The system shall exit normally after a timeout instructed by RUN_FOR_HOUR parameter defined in config.py.

Debugging Endeavors

  1. Check if the versions in conda env meet the requirements.txt
  2. Alter different map asset bundles
  3. Force containers to restart on each iteration within the main() by moving the ctn starting code into the While loop
    ctn = ApolloContainer(APOLLO_ROOT, 'ROUTE_0')
    ctn.start_instance()
    ctn.start_dreamview()
  4. In framework/scenario/ScenarioRunner.py, disable all applications of MessageBroker using comment sign #
    mbk = MessageBroker(self.__runners)
    mbk.spin()

    mbk.broadcast(Topics.TrafficLight, tld.SerializeToString())

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions