Commit 68d4fd30 authored by David S. Miller's avatar David S. Miller
Browse files

Merge branch 'net-bridge-mcast-IGMPv3-MLDv2-fast-path-part-2'



Nikolay Aleksandrov says:

====================
net: bridge: mcast: IGMPv3/MLDv2 fast-path (part 2)

This is the second part of the IGMPv3/MLDv2 support which adds support
for the fast-path. In order to be able to handle source entries we add
mdb support for S,G entries (i.e. we add source address support to
br_ip), that requires to extend the current mdb netlink API, fortunately
we just add another attribute which will contain nested future mdb
attributes, then we use it to add support for S,G user- add, del and
dump. The lookup sequence is simple: when IGMPv3/MLDv2 are enabled do
the S,G lookup first and if it fails fallback to *,G. The more complex
part is when we begin handling source lists and auto-installing S,G entries
and *,G filter mode transitions. We have the following cases:
 1) *,G INCLUDE -> EXCLUDE transition: we need to install the port in
    all of *,G's installed S,G entries for proper replication (except
    the ones explicitly blocked), this is also necessary when adding a
    new *,G EXCLUDE port group

 2) *,G EXCLUDE -> INCLUDE transition: we need to remove the port from
    all of *,G's installed S,G entries, this is also necessary when
    removing a *,G port group

 3) New S,G port entry: we need to install all current *,G EXCLUDE ports

 4) Remove S,G port entry: if all other port groups were auto-installed we
    can safely remove them and delete the whole S,G entry

Currently we compute these operations from the available ports, their
source lists and their filter mode. In the future we can extend the port
group structure and reduce the running time of these ops. Also one
current limitation is that host-joined S,G entries are not supported.
I.e. one cannot add "dev bridge port bridge" mdb S,G entries. The host
join is currently considered an EXCLUDE {} join, so it's reflected in
all of *,G's installed S,G entries. If an S,G,port entry is added as
temporary then the kernel can take it over if a source shows up from a
report, permanent entries are skipped. In order to properly handle
blocked sources we add a new port group blocked flag to avoid forwarding
to that port group in the S,G. Finally when forwarding we use the port
group filter mode (if it's INCLUDE and the port group is from a *,G then
don't replicate to it, respectively if it's EXCLUDE then forward) and the
blocked flag (obviously if it's set - skip that port unless it's a
router port) to decide if the port should be skipped. Another limitation
is that we can't do some of the above transitions without small traffic
drop while installing/removing entries. That will be taken care of when
we add atomic swap of port replication lists later.

Patch break down:
 patches 1-3: prepare the mdb code for better extack support which is
              used in future patches to return a more meaningful error
 patches 4-6: add the source address field to struct br_ip, and do minor
              cleanups around it
 patches 7-8: extend the mdb netlink API so we can send new mdb
              attributes and uses the new API for S,G entry add/del/dump
              support
 patch     9: takes care of S,G entries when doing a lookup (first S,G
              then *,G lookup)
 patch    10: adds a new port group field and attribute for origin protocol
              we use the already available RTPROT_ definitions,
              currently user-space entries are added as RTPROT_STATIC and
              kernel entries are added as RTPROT_KERNEL, we may allow
              user-space to set custom values later (e.g. for FRR, clag)
 patch    11: adds an internal S,G,port rhashtable to speed up filter
              mode transitions
 patch    12: initial automatic install of S,G entries based on port
              groups' source lists
 patch    13: handles port group modes on transitions or when new
              port group entries are added
 patch    14: self-explanatory - adds support for blocked port group
              entries needed to stop forwarding to particular S,G,port
              entries
 patch    15: handles host-join/leave state changes, treats host-joins
              as EXCLUDE {} groups (reflected in all *,G's S,G entries)
 patch    16: finally adds the fast-path filter mode and block flag
              support

Here're the sets that will come next (in order):
 - iproute2 support for IGMPv3/MLDv2
 - selftests for all mode transitions and group flags
 - explicit host tracking for proper fast-leave support
 - atomic port replication lists (these are also needed for broadcast
   forwarding optimizations)
 - mode transition optimization and removal of open-coded sorted lists

Not implemented yet:
 - Host IGMPv3/MLDv2 filter support (currently we handle only join/leave
   as before)
 - Proper other querier source timer and value updates
 - IGMPv3/v2 MLDv2/v1 compat (I have a few rough patches for this one)

v2: fix build with CONFIG_BATMAN_ADV_MCAST in patch 6
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 35c52c5c 36cfec73
Loading
Loading
Loading
Loading
+7 −1
Original line number Diff line number Diff line
@@ -19,7 +19,13 @@ struct br_ip {
#if IS_ENABLED(CONFIG_IPV6)
		struct in6_addr ip6;
#endif
	} u;
	} src;
	union {
		__be32	ip4;
#if IS_ENABLED(CONFIG_IPV6)
		struct in6_addr ip6;
#endif
	} dst;
	__be16		proto;
	__u16           vid;
};
+17 −0
Original line number Diff line number Diff line
@@ -457,6 +457,8 @@ enum {
	MDBA_MDB_EATTR_TIMER,
	MDBA_MDB_EATTR_SRC_LIST,
	MDBA_MDB_EATTR_GROUP_MODE,
	MDBA_MDB_EATTR_SOURCE,
	MDBA_MDB_EATTR_RTPROT,
	__MDBA_MDB_EATTR_MAX
};
#define MDBA_MDB_EATTR_MAX (__MDBA_MDB_EATTR_MAX - 1)
@@ -516,6 +518,8 @@ struct br_mdb_entry {
	__u8 state;
#define MDB_FLAGS_OFFLOAD	(1 << 0)
#define MDB_FLAGS_FAST_LEAVE	(1 << 1)
#define MDB_FLAGS_STAR_EXCL	(1 << 2)
#define MDB_FLAGS_BLOCKED	(1 << 3)
	__u8 flags;
	__u16 vid;
	struct {
@@ -530,10 +534,23 @@ struct br_mdb_entry {
enum {
	MDBA_SET_ENTRY_UNSPEC,
	MDBA_SET_ENTRY,
	MDBA_SET_ENTRY_ATTRS,
	__MDBA_SET_ENTRY_MAX,
};
#define MDBA_SET_ENTRY_MAX (__MDBA_SET_ENTRY_MAX - 1)

/* [MDBA_SET_ENTRY_ATTRS] = {
 *    [MDBE_ATTR_xxx]
 *    ...
 * }
 */
enum {
	MDBE_ATTR_UNSPEC,
	MDBE_ATTR_SOURCE,
	__MDBE_ATTR_MAX,
};
#define MDBE_ATTR_MAX (__MDBE_ATTR_MAX - 1)

/* Embedded inside LINK_XSTATS_TYPE_BRIDGE */
enum {
	BRIDGE_XSTATS_UNSPEC,
+7 −7
Original line number Diff line number Diff line
@@ -221,7 +221,7 @@ static u8 batadv_mcast_mla_rtr_flags_bridge_get(struct batadv_priv *bat_priv,
		 * address here, only IPv6 ones
		 */
		if (br_ip_entry->addr.proto == htons(ETH_P_IPV6) &&
		    ipv6_addr_is_ll_all_routers(&br_ip_entry->addr.u.ip6))
		    ipv6_addr_is_ll_all_routers(&br_ip_entry->addr.dst.ip6))
			flags &= ~BATADV_MCAST_WANT_NO_RTR6;

		list_del(&br_ip_entry->list);
@@ -562,10 +562,10 @@ out:
static void batadv_mcast_mla_br_addr_cpy(char *dst, const struct br_ip *src)
{
	if (src->proto == htons(ETH_P_IP))
		ip_eth_mc_map(src->u.ip4, dst);
		ip_eth_mc_map(src->dst.ip4, dst);
#if IS_ENABLED(CONFIG_IPV6)
	else if (src->proto == htons(ETH_P_IPV6))
		ipv6_eth_mc_map(&src->u.ip6, dst);
		ipv6_eth_mc_map(&src->dst.ip6, dst);
#endif
	else
		eth_zero_addr(dst);
@@ -609,11 +609,11 @@ static int batadv_mcast_mla_bridge_get(struct net_device *dev,
				continue;

			if (tvlv_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES &&
			    ipv4_is_local_multicast(br_ip_entry->addr.u.ip4))
			    ipv4_is_local_multicast(br_ip_entry->addr.dst.ip4))
				continue;

			if (!(tvlv_flags & BATADV_MCAST_WANT_NO_RTR4) &&
			    !ipv4_is_local_multicast(br_ip_entry->addr.u.ip4))
			    !ipv4_is_local_multicast(br_ip_entry->addr.dst.ip4))
				continue;
		}

@@ -623,11 +623,11 @@ static int batadv_mcast_mla_bridge_get(struct net_device *dev,
				continue;

			if (tvlv_flags & BATADV_MCAST_WANT_ALL_UNSNOOPABLES &&
			    ipv6_addr_is_ll_all_nodes(&br_ip_entry->addr.u.ip6))
			    ipv6_addr_is_ll_all_nodes(&br_ip_entry->addr.dst.ip6))
				continue;

			if (!(tvlv_flags & BATADV_MCAST_WANT_NO_RTR6) &&
			    IPV6_ADDR_MC_SCOPE(&br_ip_entry->addr.u.ip6) >
			    IPV6_ADDR_MC_SCOPE(&br_ip_entry->addr.dst.ip6) >
			    IPV6_ADDR_SCOPE_LINKLOCAL)
				continue;
		}
+15 −2
Original line number Diff line number Diff line
@@ -274,14 +274,23 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
	struct net_bridge *br = netdev_priv(dev);
	struct net_bridge_port *prev = NULL;
	struct net_bridge_port_group *p;
	bool allow_mode_include = true;
	struct hlist_node *rp;

	rp = rcu_dereference(hlist_first_rcu(&br->router_list));
	p = mdst ? rcu_dereference(mdst->ports) : NULL;
	if (mdst) {
		p = rcu_dereference(mdst->ports);
		if (br_multicast_should_handle_mode(br, mdst->addr.proto) &&
		    br_multicast_is_star_g(&mdst->addr))
			allow_mode_include = false;
	} else {
		p = NULL;
	}

	while (p || rp) {
		struct net_bridge_port *port, *lport, *rport;

		lport = p ? p->port : NULL;
		lport = p ? p->key.port : NULL;
		rport = hlist_entry_safe(rp, struct net_bridge_port, rlist);

		if ((unsigned long)lport > (unsigned long)rport) {
@@ -292,6 +301,10 @@ void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
						   local_orig);
				goto delivered;
			}
			if ((!allow_mode_include &&
			     p->filter_mode == MCAST_INCLUDE) ||
			    (p->flags & MDB_PG_FLAGS_BLOCKED))
				goto delivered;
		} else {
			port = rport;
		}
+272 −99

File changed.

Preview size limit exceeded, changes collapsed.

Loading