Skip to content

Commit da314c9

Browse files
herbertxdavem330
authored andcommitted
netlink: Replace rhash_portid with bound
On Mon, Sep 21, 2015 at 02:20:22PM -0400, Tejun Heo wrote: > > store_release and load_acquire are different from the usual memory > barriers and can't be paired this way. You have to pair store_release > and load_acquire. Besides, it isn't a particularly good idea to OK I've decided to drop the acquire/release helpers as they don't help us at all and simply pessimises the code by using full memory barriers (on some architectures) where only a write or read barrier is needed. > depend on memory barriers embedded in other data structures like the > above. Here, especially, rhashtable_insert() would have write barrier > *before* the entry is hashed not necessarily *after*, which means that > in the above case, a socket which appears to have set bound to a > reader might not visible when the reader tries to look up the socket > on the hashtable. But you are right we do need an explicit write barrier here to ensure that the hashing is visible. > There's no reason to be overly smart here. This isn't a crazy hot > path, write barriers tend to be very cheap, store_release more so. > Please just do smp_store_release() and note what it's paired with. It's not about being overly smart. It's about actually understanding what's going on with the code. I've seen too many instances of people simply sprinkling synchronisation primitives around without any knowledge of what is happening underneath, which is just a recipe for creating hard-to-debug races. > > @@ -1539,7 +1546,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr, > > } > > } > > > > - if (!nlk->portid) { > > + if (!nlk->bound) { > > I don't think you can skip load_acquire here just because this is the > second deref of the variable. That doesn't change anything. Race > condition could still happen between the first and second tests and > skipping the second would lead to the same kind of bug. The reason this one is OK is because we do not use nlk->portid or try to get nlk from the hash table before we return to user-space. However, there is a real bug here that none of these acquire/release helpers discovered. The two bound tests here used to be a single one. Now that they are separate it is entirely possible for another thread to come in the middle and bind the socket. So we need to repeat the portid check in order to maintain consistency. > > @@ -1587,7 +1594,7 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr, > > !netlink_allowed(sock, NL_CFG_F_NONROOT_SEND)) > > return -EPERM; > > > > - if (!nlk->portid) > > + if (!nlk->bound) > > Don't we need load_acquire here too? Is this path holding a lock > which makes that unnecessary? Ditto. ---8<--- The commit 1f770c0 ("netlink: Fix autobind race condition that leads to zero port ID") created some new races that can occur due to inconcsistencies between the two port IDs. Tejun is right that a barrier is unavoidable. Therefore I am reverting to the original patch that used a boolean to indicate that a user netlink socket has been bound. Barriers have been added where necessary to ensure that a valid portid and the hashed socket is visible. I have also changed netlink_insert to only return EBUSY if the socket is bound to a portid different to the requested one. This combined with only reading nlk->bound once in netlink_bind fixes a race where two threads that bind the socket at the same time with different port IDs may both succeed. Fixes: 1f770c0 ("netlink: Fix autobind race condition that leads to zero port ID") Reported-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Nacked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7bbe33f commit da314c9

File tree

2 files changed

+29
-12
lines changed

2 files changed

+29
-12
lines changed

net/netlink/af_netlink.c

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1031,7 +1031,7 @@ static inline int netlink_compare(struct rhashtable_compare_arg *arg,
10311031
const struct netlink_compare_arg *x = arg->key;
10321032
const struct netlink_sock *nlk = ptr;
10331033

1034-
return nlk->rhash_portid != x->portid ||
1034+
return nlk->portid != x->portid ||
10351035
!net_eq(sock_net(&nlk->sk), read_pnet(&x->pnet));
10361036
}
10371037

@@ -1057,7 +1057,7 @@ static int __netlink_insert(struct netlink_table *table, struct sock *sk)
10571057
{
10581058
struct netlink_compare_arg arg;
10591059

1060-
netlink_compare_arg_init(&arg, sock_net(sk), nlk_sk(sk)->rhash_portid);
1060+
netlink_compare_arg_init(&arg, sock_net(sk), nlk_sk(sk)->portid);
10611061
return rhashtable_lookup_insert_key(&table->hash, &arg,
10621062
&nlk_sk(sk)->node,
10631063
netlink_rhashtable_params);
@@ -1110,16 +1110,16 @@ static int netlink_insert(struct sock *sk, u32 portid)
11101110

11111111
lock_sock(sk);
11121112

1113-
err = -EBUSY;
1114-
if (nlk_sk(sk)->portid)
1113+
err = nlk_sk(sk)->portid == portid ? 0 : -EBUSY;
1114+
if (nlk_sk(sk)->bound)
11151115
goto err;
11161116

11171117
err = -ENOMEM;
11181118
if (BITS_PER_LONG > 32 &&
11191119
unlikely(atomic_read(&table->hash.nelems) >= UINT_MAX))
11201120
goto err;
11211121

1122-
nlk_sk(sk)->rhash_portid = portid;
1122+
nlk_sk(sk)->portid = portid;
11231123
sock_hold(sk);
11241124

11251125
err = __netlink_insert(table, sk);
@@ -1135,7 +1135,9 @@ static int netlink_insert(struct sock *sk, u32 portid)
11351135
goto err;
11361136
}
11371137

1138-
nlk_sk(sk)->portid = portid;
1138+
/* We need to ensure that the socket is hashed and visible. */
1139+
smp_wmb();
1140+
nlk_sk(sk)->bound = portid;
11391141

11401142
err:
11411143
release_sock(sk);
@@ -1521,6 +1523,7 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
15211523
struct sockaddr_nl *nladdr = (struct sockaddr_nl *)addr;
15221524
int err;
15231525
long unsigned int groups = nladdr->nl_groups;
1526+
bool bound;
15241527

15251528
if (addr_len < sizeof(struct sockaddr_nl))
15261529
return -EINVAL;
@@ -1537,9 +1540,14 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
15371540
return err;
15381541
}
15391542

1540-
if (nlk->portid)
1543+
bound = nlk->bound;
1544+
if (bound) {
1545+
/* Ensure nlk->portid is up-to-date. */
1546+
smp_rmb();
1547+
15411548
if (nladdr->nl_pid != nlk->portid)
15421549
return -EINVAL;
1550+
}
15431551

15441552
if (nlk->netlink_bind && groups) {
15451553
int group;
@@ -1555,7 +1563,10 @@ static int netlink_bind(struct socket *sock, struct sockaddr *addr,
15551563
}
15561564
}
15571565

1558-
if (!nlk->portid) {
1566+
/* No need for barriers here as we return to user-space without
1567+
* using any of the bound attributes.
1568+
*/
1569+
if (!bound) {
15591570
err = nladdr->nl_pid ?
15601571
netlink_insert(sk, nladdr->nl_pid) :
15611572
netlink_autobind(sock);
@@ -1603,7 +1614,10 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
16031614
!netlink_allowed(sock, NL_CFG_F_NONROOT_SEND))
16041615
return -EPERM;
16051616

1606-
if (!nlk->portid)
1617+
/* No need for barriers here as we return to user-space without
1618+
* using any of the bound attributes.
1619+
*/
1620+
if (!nlk->bound)
16071621
err = netlink_autobind(sock);
16081622

16091623
if (err == 0) {
@@ -2444,10 +2458,13 @@ static int netlink_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
24442458
dst_group = nlk->dst_group;
24452459
}
24462460

2447-
if (!nlk->portid) {
2461+
if (!nlk->bound) {
24482462
err = netlink_autobind(sock);
24492463
if (err)
24502464
goto out;
2465+
} else {
2466+
/* Ensure nlk is hashed and visible. */
2467+
smp_rmb();
24512468
}
24522469

24532470
/* It's a really convoluted way for userland to ask for mmaped
@@ -3273,7 +3290,7 @@ static inline u32 netlink_hash(const void *data, u32 len, u32 seed)
32733290
const struct netlink_sock *nlk = data;
32743291
struct netlink_compare_arg arg;
32753292

3276-
netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->rhash_portid);
3293+
netlink_compare_arg_init(&arg, sock_net(&nlk->sk), nlk->portid);
32773294
return jhash2((u32 *)&arg, netlink_compare_arg_len / sizeof(u32), seed);
32783295
}
32793296

net/netlink/af_netlink.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ struct netlink_ring {
2525
struct netlink_sock {
2626
/* struct sock has to be the first member of netlink_sock */
2727
struct sock sk;
28-
u32 rhash_portid;
2928
u32 portid;
3029
u32 dst_portid;
3130
u32 dst_group;
@@ -36,6 +35,7 @@ struct netlink_sock {
3635
unsigned long state;
3736
size_t max_recvmsg_len;
3837
wait_queue_head_t wait;
38+
bool bound;
3939
bool cb_running;
4040
struct netlink_callback cb;
4141
struct mutex *cb_mutex;

0 commit comments

Comments
 (0)