Main thread spread exception when thread-mgr enabled by xujuntwt95329 · Pull Request #1889 · bytecodealliance/wasm-micro-runtime

xujuntwt95329 · 2023-01-16T02:01:19Z

No description provided.

wenyongh · 2023-01-16T03:32:47Z

core/iwasm/common/wasm_runtime_common.c

        wasm_runtime_set_exception(exec_env->module_inst,
                                   "the result conversion is failed");
+#if WASM_ENABLE_THREAD_MGR != 0
+        wasm_cluster_spread_exception(exec_env);


Should also spread exception when wasm_runtime_prepare_call_function fails, L1767?

according to our discussion, I‘ve moved the spread into wasm_set_exception function, so we don't need to process the spread elsewhere

wenyongh · 2023-01-16T03:36:54Z

core/iwasm/aot/aot_runtime.c

 clear_wasi_proc_exit_exception(AOTModuleInstance *module_inst)
 {
-#if WASM_ENABLE_LIBC_WASI != 0
+#if (WASM_ENABLE_LIBC_WASI != 0) && (WASM_ENABLE_THREAD_MGR == 0)


Not sure why skip clear the exception for multi-threads?

Restored, thanks~

wenyongh · 2023-01-16T03:38:53Z

core/iwasm/interpreter/wasm_interp_classic.c

 clear_wasi_proc_exit_exception(WASMModuleInstance *module_inst)
 {
-#if WASM_ENABLE_LIBC_WASI != 0
+#if (WASM_ENABLE_LIBC_WASI != 0) && (WASM_ENABLE_THREAD_MGR == 0)


Same as above, why skip clear for multi-threads?

yamt · 2023-01-16T07:00:41Z

i guess you need some kind of a retry loop as other threads can create threads in the meantime.
i guess traverse_list should require a lock. (cluster->lock?) but it isn't a fault of this PR.

xujuntwt95329 · 2023-01-16T08:53:32Z

Thanks for the suggestions.

i guess you need some kind of a retry loop as other threads can create threads in the meantime.

I think we can add a status field to record the status of a cluster. When we spreading the exception:

get cluster->lock
set cluster's status to exception
spread the exception

Then if there are any other threads creating new thread at the meantime, we directly fail it if the cluster's status is exception

@yamt @wenyongh how do you think about this solution?

i guess traverse_list should require a lock. (cluster->lock?) but it isn't a fault of this PR.

Yes, and traverse_list is used for different lists, I‘ll analyze every situation and add the locks if necessary

samples/multi-thread/wasm-apps/main_thread_exception.c

core/iwasm/libraries/thread-mgr/thread_manager.c

yamt · 2023-01-16T09:24:56Z

Thanks for the suggestions.

i guess you need some kind of a retry loop as other threads can create threads in the meantime.

I think we can add a status field to record the status of a cluster. When we spreading the exception:
1. get `cluster->lock`

2. set cluster's status to `exception`

3. spread the exception
Then if there are any other threads creating new thread at the meantime, we directly fail it if the cluster's status is exception

@yamt @wenyongh how do you think about this solution?

i think it works.

i guess traverse_list should require a lock. (cluster->lock?) but it isn't a fault of this PR.

Yes, and traverse_list is used for different lists, I‘ll analyze every situation and add the locks if necessary

thank you.

Co-authored-by: Marcin Kolny <marcin.kolny@gmail.com>

wenyongh · 2023-01-16T12:26:43Z

Thanks for the suggestions.

i guess you need some kind of a retry loop as other threads can create threads in the meantime.

I think we can add a status field to record the status of a cluster. When we spreading the exception:

get cluster->lock

set cluster's status to exception

spread the exception

Then if there are any other threads creating new thread at the meantime, we directly fail it if the cluster's status is exception

@yamt @wenyongh how do you think about this solution?

It looks good to me.

wenyongh · 2023-01-16T12:30:16Z

core/iwasm/common/wasm_runtime_common.c

+    bh_assert(module_inst_comm->module_type == Wasm_Module_Bytecode
+              || module_inst_comm->module_type == Wasm_Module_AoT);
+
+    const char *exception = wasm_get_exception(module_inst);


Had better declare the variable at the beginning of the function, a concern is that old version compiler might report warning.

Done, thanks

wenyongh · 2023-01-16T12:34:00Z

core/iwasm/common/wasm_runtime_common.c

+#if WASM_ENABLE_THREAD_MGR != 0
+    wasm_cluster_spread_exception(
+        wasm_clusters_search_exec_env((WASMModuleInstanceCommon *)module_inst),
+        exception ? false : true);


When exception is NULL, does it mean to clear exception of other threads?

core/iwasm/libraries/thread-mgr/thread_manager.c

xujuntwt95329 · 2023-01-16T13:14:38Z

When adding lock for traverse_list, we will meet dead lock issue: in wasm_cluster_terminate_all_except_self, we get the lock, and call os_thread_join to wait, but the exiting thread will also try to get the lock so dead lock occurred.

I use this strategy to avoid this:

add a new field processing in cluster
in wasm_cluster_terminate_all_except_self we set processing to true, and unlock the lock before calling thread_join
other threads can still get this lock, but in wasm_cluster_spawn_exec_env and wasm_cluster_create_thread it will fail directly if processing is true
wasm_cluster_terminate_all_except_self clear processing flag once it finished

wenyongh · 2023-01-19T06:12:21Z

core/iwasm/aot/aot_runtime.c

    wasm_exec_env_destroy(exec_env);
-    (void)clear_wasi_proc_exit_exception(module_inst);
+    (void)clear_wasi_proc_exit_exception(
+        (WASMModuleInstanceCommon *)module_inst);


Seems no need to clear wasi proc exit exception here. WASI module doesn't set internal start func index，instead it exports a function named "_start". If here it really call wasi proc exit，we can also let instantiation process failed. @lum1n0us what is your opinion?

Agree with that, and updated

wenyongh · 2023-01-19T06:12:59Z

core/iwasm/common/wasm_runtime_common.c

 }
 #endif

+bool


can set to static if the above is ok

wenyongh · 2023-01-19T06:15:18Z

core/iwasm/libraries/thread-mgr/thread_manager.c

+    os_mutex_unlock(&cluster->lock);
+
    traverse_list(&cluster->exec_env_list, terminate_thread_visitor, NULL);
+


does traverse_list add lock for the list? why not remove L845 and L849

In terminate_thread_visitor, we will call os_thread_join to wait for other threads to exit, and the exited thread need to get cluster->lock for accessing the list, so we can't hold the lock during terminate_thread_visitor

Got it, thanks.

wenyongh

LGTM

…ance#1889) And refactor clear_wasi_proc_exit_exception, refer to bytecodealliance#1869

xujuntwt95329 and others added 4 commits January 15, 2023 23:45

main thread spread exception

3d416ab

terminate threads when spread exception

9067254

add file header

c72dfc9

auto format

4174215

wenyongh reviewed Jan 16, 2023

View reviewed changes

xujuntwt95329 added 3 commits January 16, 2023 13:59

spread exception during throwing

2d3da36

restore unused modification

ff1bd10

minor fix

a4a2bb6

xujuntwt95329 added 3 commits January 16, 2023 15:42

refactor wasi proc exception processing

0016228

minor fix

2426fd8

fix build error

fc5c945

xujuntwt95329 changed the title ~~Main thread spread~~ Main thread spread exception when thread-mgr enabled Jan 16, 2023

loganek reviewed Jan 16, 2023

View reviewed changes

samples/multi-thread/wasm-apps/main_thread_exception.c Outdated Show resolved Hide resolved

loganek reviewed Jan 16, 2023

View reviewed changes

samples/multi-thread/wasm-apps/main_thread_exception.c Outdated Show resolved Hide resolved

loganek reviewed Jan 16, 2023

View reviewed changes

core/iwasm/libraries/thread-mgr/thread_manager.c Outdated Show resolved Hide resolved

xujuntwt95329 and others added 4 commits January 16, 2023 17:39

fix typo

c770575

Co-authored-by: Marcin Kolny <marcin.kolny@gmail.com>

address some PR comments

12cf0af

add lock when traversing list in thread-mgr

8f0a046

avoid creating new thread while spreading exception

1ace00f

wenyongh reviewed Jan 16, 2023

View reviewed changes

xujuntwt95329 added 3 commits January 16, 2023 20:53

fix dead lock issue

d0f5d94

update comments

d7d78c0

check return value

f30f900

xujuntwt95329 and others added 2 commits January 17, 2023 19:05

fix spec test issue

1d6553e

fix sgx spec test issue

315985b

wenyongh reviewed Jan 19, 2023

View reviewed changes

address PR comments

d939080

wenyongh approved these changes Jan 20, 2023

View reviewed changes

wenyongh merged commit cadf9d0 into bytecodealliance:main Jan 20, 2023

wenyongh mentioned this pull request Jan 20, 2023

Handle thread exit #1869

Merged

eloparco mentioned this pull request Jan 23, 2023

proc_exit and traps do not stop thread executing blocking instructions #1910

Closed

loganek mentioned this pull request Jan 31, 2023

Fix data race when terminating or waiting for a thread. #1924

Merged

vickiegpt pushed a commit to vickiegpt/wamr-aot-gc-checkpoint-restore that referenced this pull request May 27, 2024

Main thread spread exception when thread-mgr is enabled (bytecodealli…

9b3c241

…ance#1889) And refactor clear_wasi_proc_exit_exception, refer to bytecodealliance#1869

               }
               #endif
+              bool

		os_mutex_unlock(&cluster->lock);

		traverse_list(&cluster->exec_env_list, terminate_thread_visitor, NULL);

Conversation

xujuntwt95329 commented Jan 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yamt commented Jan 16, 2023

Uh oh!

xujuntwt95329 commented Jan 16, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yamt commented Jan 16, 2023

Uh oh!

wenyongh commented Jan 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xujuntwt95329 commented Jan 16, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wenyongh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants