net: tcp2: fix sysworkq corruption in tcp_conn_unref()
Bug description:
When in tcp_conn_unref(), in case one of the delayed works is already
submitted to sysworkq (after delay period), e.g. send_timer, the check
of k_delayed_work_remaining_get() prevents calling
k_delayed_work_cancel().
This leads to corrupting sysworkq when zeroing struct tcp* conn.
Note that the "next" pointer for the work queue is part of the struct
work (in _reserved field). Which is, in this case, a member of struct
tcp.
Scenario leading to the bug:
(1) net_tcp_connect() is called from a work in sysworkq
(2) net_tcp_connect() submits conn->send_timer to sysworkq
(3) while net_tcp_connect() is waiting on connect_sem, delay period
passes (z_timeout) and send_timer enters sysworkq work slist
(4) also, some other code (app) submits more works to queue, now pointed
by conn->send_timer in sysworkq work list
(5) connection fails (no answer to SYN), causing a call to
tcp_conn_unref()
(6) tcp_conn_unref() is calling tcp_send_queue_flush()
(7) checking k_delayed_work_remaining_get(&conn->send_timer) returns 0
due to delay period end, but send_timer is still in sysworkq work
slist (sysworkq thread still hasn't handled the work)
(8) BUG!: no call to k_delayed_work_cancel(&conn->send_timer)
(9) back in tcp_conn_unref(), a call to memset(conn, 0, sizeof(*conn))
zeroes conn->send_timer
(10) conn->send_timer is pointed to in sysworkq work slist, but is
zeroed, clearing pointer to following works submitted in stage (4)
(11) EFFECT! the works in stage (4) are never executed!!
NOTES:
* k_delayed_work_cancel(), handles both states:
(1) delayed work pends on timeout and
(2) work already in queue.
So there is no need to check k_delayed_work_remaining_get()
* This is also relevant for conn->send_data_timer
Solution:
removing checks of k_delayed_work_remaining_get(), always calling
k_delayed_work_cancel() for work in struct tcp, in unref, before memset
Signed-off-by:
David Komel <a8961713@gmail.com>
Loading
Please sign in to comment