Commit fed07ef3 authored by David S. Miller's avatar David S. Miller
Browse files

Merge tag 'mlx5-updates-2019-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux



Saeed Mahameed says:

====================
mlx5 tc flow handling for concurrent execution (Part 3)

This series includes updates to mlx5 ethernet and core driver:

Vlad submits part 3 of 3 part series to allow TC flow handling
for concurrent execution.

Vlad says:
==========

Structure mlx5e_neigh_hash_entry code that uses it are refactored in
following ways:

- Extend neigh_hash_entry with rcu and modify its users to always take
  reference to the structure when using it (neigh_hash_entry has already
  had atomic reference counter which was only used when scheduling neigh
  update on workqueue from atomic context of neigh update netevent).

- Always use mlx5e_neigh_update_table->encap_lock when modifying neigh
  update hash table and list. Originally, this lock was only used to
  synchronize with netevent handler function, which is called from bh
  context and cannot use rtnl lock for synchronization. Use rcu read lock
  instead of encap_lock to lookup nhe in atomic context of netevent even
  handler function. Convert encap_lock to mutex to allow creating new
  neigh hash entries while holding it, which is safe to do because the
  lock is no longer used in atomic context.

- Rcu-ify mlx5e_neigh_hash_entry->encap_list by changing operations on
  encap list to their rcu counterparts and extending encap structure
  with rcu_head to free the encap instances after rcu grace period. This
  allows fast traversal of list of encaps attached to nhe under rcu read
  lock protection.

- Take encap_table_lock when accessing encap entries in neigh update and
  neigh stats update code to protect from concurrent encap entry
  insertion or removal.

This approach leads to potential race condition when neigh update and
neigh stats update code can access encap and flow entries that are not
fully initialized or are being destroyed, or neigh can change state
without updating encaps that are created concurrently. Prevent these
issues by following changes in flow and encap initialization:

- Extend mlx5e_tc_flow with 'init_done' completion. Modify neigh update
  to wait for both encap and flow completions to prevent concurrent
  access to a structure that is being initialized by tc.

- Skip structures that failed during initialization: encaps with
  encap_id<0 and flows that don't have OFFLOADED flag set.

- To ensure that no new flows are added to encap when it is being
  accessed by neigh update or neigh stats update, take encap_table_lock
  mutex.

- To prevent concurrent deletion by tc, ensure that neigh update and
  neigh stats update hold references to encap and flow instances while
  using them.

With changes presented in this patch set it is now safe to execute tc
concurrently with neigh update and neigh stats update. However, these
two workqueue tasks modify same flow "tmp_list" field to store flows
with reference taken in temporary list to release the references after
update operation finishes and should not be executed concurrently with
each other.

Last 3 patches of this series provide 3 new mlx5 trace points to track
mlx5 tc requests and mlx5 neigh updates.
====================

Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
parents 2b9b5e74 5970882a
Loading
Loading
Loading
Loading
+46 −0
Original line number Diff line number Diff line
@@ -12,6 +12,7 @@ Contents
- `Enabling the driver and kconfig options`_
- `Devlink info`_
- `Devlink health reporters`_
- `mlx5 tracepoints`_

Enabling the driver and kconfig options
================================================
@@ -219,3 +220,48 @@ User commands examples:
    $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal

NOTE: This command can run only on PF.

mlx5 tracepoints
================

mlx5 driver provides internal trace points for tracking and debugging using
kernel tracepoints interfaces (refer to Documentation/trace/ftrase.rst).

For the list of support mlx5 events check /sys/kernel/debug/tracing/events/mlx5/

tc and eswitch offloads tracepoints:

- mlx5e_configure_flower: trace flower filter actions and cookies offloaded to mlx5::

    $ echo mlx5:mlx5e_configure_flower >> /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace
    ...
    tc-6535  [019] ...1  2672.404466: mlx5e_configure_flower: cookie=0000000067874a55 actions= REDIRECT

- mlx5e_delete_flower: trace flower filter actions and cookies deleted from mlx5::

    $ echo mlx5:mlx5e_delete_flower >> /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace
    ...
    tc-6569  [010] .N.1  2686.379075: mlx5e_delete_flower: cookie=0000000067874a55 actions= NULL

- mlx5e_stats_flower: trace flower stats request::

    $ echo mlx5:mlx5e_stats_flower >> /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace
    ...
    tc-6546  [010] ...1  2679.704889: mlx5e_stats_flower: cookie=0000000060eb3d6a bytes=0 packets=0 lastused=4295560217

- mlx5e_tc_update_neigh_used_value: trace tunnel rule neigh update value offloaded to mlx5::

    $ echo mlx5:mlx5e_tc_update_neigh_used_value >> /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace
    ...
    kworker/u48:4-8806  [009] ...1 55117.882428: mlx5e_tc_update_neigh_used_value: netdev: ens1f0 IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_used=1

- mlx5e_rep_neigh_update: trace neigh update tasks scheduled due to neigh state change events::

    $ echo mlx5:mlx5e_rep_neigh_update >> /sys/kernel/debug/tracing/set_event
    $ cat /sys/kernel/debug/tracing/trace
    ...
    kworker/u48:7-2221  [009] ...1  1475.387435: mlx5e_rep_neigh_update: netdev: ens1f0 MAC: 24:8a:07:9a:17:9a IPv4: 1.1.1.10 IPv6: ::ffff:1.1.1.10 neigh_connected=1
+1 −1
Original line number Diff line number Diff line
@@ -35,7 +35,7 @@ mlx5_core-$(CONFIG_MLX5_EN_RXNFC) += en_fs_ethtool.o
mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_MLX5_ESWITCH)     += en_rep.o en_tc.o en/tc_tun.o lib/port_tun.o lag_mp.o \
					lib/geneve.o en/tc_tun_vxlan.o en/tc_tun_gre.o \
					en/tc_tun_geneve.o
					en/tc_tun_geneve.o diag/en_tc_tracepoint.o

#
# Core extra
+54 −0
Original line number Diff line number Diff line
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2019 Mellanox Technologies. */

#undef TRACE_SYSTEM
#define TRACE_SYSTEM mlx5

#if !defined(_MLX5_EN_REP_TP_) || defined(TRACE_HEADER_MULTI_READ)
#define _MLX5_EN_REP_TP_

#include <linux/tracepoint.h>
#include <linux/trace_seq.h>
#include "en_rep.h"

TRACE_EVENT(mlx5e_rep_neigh_update,
	    TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, const u8 *ha,
		     bool neigh_connected),
	    TP_ARGS(nhe, ha, neigh_connected),
	    TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name)
			     __array(u8, ha, ETH_ALEN)
			     __array(u8, v4, 4)
			     __array(u8, v6, 16)
			     __field(bool, neigh_connected)
			     ),
	    TP_fast_assign(const struct mlx5e_neigh *mn = &nhe->m_neigh;
			struct in6_addr *pin6;
			__be32 *p32;

			__assign_str(devname, mn->dev->name);
			__entry->neigh_connected = neigh_connected;
			memcpy(__entry->ha, ha, ETH_ALEN);

			p32 = (__be32 *)__entry->v4;
			pin6 = (struct in6_addr *)__entry->v6;
			if (mn->family == AF_INET) {
				*p32 = mn->dst_ip.v4;
				ipv6_addr_set_v4mapped(*p32, pin6);
			} else if (mn->family == AF_INET6) {
				*pin6 = mn->dst_ip.v6;
			}
			),
	    TP_printk("netdev: %s MAC: %pM IPv4: %pI4 IPv6: %pI6c neigh_connected=%d\n",
		      __get_str(devname), __entry->ha,
		      __entry->v4, __entry->v6, __entry->neigh_connected
		      )
);

#endif /* _MLX5_EN_REP_TP_ */

/* This part must be outside protection */
#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH ./diag
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_FILE en_rep_tracepoint
#include <trace/define_trace.h>
+58 −0
Original line number Diff line number Diff line
// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
/* Copyright (c) 2019 Mellanox Technologies. */

#define CREATE_TRACE_POINTS
#include "en_tc_tracepoint.h"

void put_ids_to_array(int *ids,
		      const struct flow_action_entry *entries,
		      unsigned int num)
{
	unsigned int i;

	for (i = 0; i < num; i++)
		ids[i] = entries[i].id;
}

#define NAME_SIZE 16

static const char FLOWACT2STR[NUM_FLOW_ACTIONS][NAME_SIZE] = {
	[FLOW_ACTION_ACCEPT]	= "ACCEPT",
	[FLOW_ACTION_DROP]	= "DROP",
	[FLOW_ACTION_TRAP]	= "TRAP",
	[FLOW_ACTION_GOTO]	= "GOTO",
	[FLOW_ACTION_REDIRECT]	= "REDIRECT",
	[FLOW_ACTION_MIRRED]	= "MIRRED",
	[FLOW_ACTION_VLAN_PUSH]	= "VLAN_PUSH",
	[FLOW_ACTION_VLAN_POP]	= "VLAN_POP",
	[FLOW_ACTION_VLAN_MANGLE]	= "VLAN_MANGLE",
	[FLOW_ACTION_TUNNEL_ENCAP]	= "TUNNEL_ENCAP",
	[FLOW_ACTION_TUNNEL_DECAP]	= "TUNNEL_DECAP",
	[FLOW_ACTION_MANGLE]	= "MANGLE",
	[FLOW_ACTION_ADD]	= "ADD",
	[FLOW_ACTION_CSUM]	= "CSUM",
	[FLOW_ACTION_MARK]	= "MARK",
	[FLOW_ACTION_WAKE]	= "WAKE",
	[FLOW_ACTION_QUEUE]	= "QUEUE",
	[FLOW_ACTION_SAMPLE]	= "SAMPLE",
	[FLOW_ACTION_POLICE]	= "POLICE",
	[FLOW_ACTION_CT]	= "CT",
};

const char *parse_action(struct trace_seq *p,
			 int *ids,
			 unsigned int num)
{
	const char *ret = trace_seq_buffer_ptr(p);
	unsigned int i;

	for (i = 0; i < num; i++) {
		if (ids[i] < NUM_FLOW_ACTIONS)
			trace_seq_printf(p, "%s ", FLOWACT2STR[ids[i]]);
		else
			trace_seq_printf(p, "UNKNOWN ");
	}

	trace_seq_putc(p, 0);
	return ret;
}
+114 −0
Original line number Diff line number Diff line
/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
/* Copyright (c) 2019 Mellanox Technologies. */

#undef TRACE_SYSTEM
#define TRACE_SYSTEM mlx5

#if !defined(_MLX5_TC_TP_) || defined(TRACE_HEADER_MULTI_READ)
#define _MLX5_TC_TP_

#include <linux/tracepoint.h>
#include <linux/trace_seq.h>
#include <net/flow_offload.h>
#include "en_rep.h"

#define __parse_action(ids, num) parse_action(p, ids, num)

void put_ids_to_array(int *ids,
		      const struct flow_action_entry *entries,
		      unsigned int num);

const char *parse_action(struct trace_seq *p,
			 int *ids,
			 unsigned int num);

DECLARE_EVENT_CLASS(mlx5e_flower_template,
		    TP_PROTO(const struct flow_cls_offload *f),
		    TP_ARGS(f),
		    TP_STRUCT__entry(__field(void *, cookie)
				     __field(unsigned int, num)
				     __dynamic_array(int, ids, f->rule ?
					     f->rule->action.num_entries : 0)
				     ),
		    TP_fast_assign(__entry->cookie = (void *)f->cookie;
			__entry->num = (f->rule ?
				f->rule->action.num_entries : 0);
			if (__entry->num)
				put_ids_to_array(__get_dynamic_array(ids),
						 f->rule->action.entries,
						 f->rule->action.num_entries);
			),
		    TP_printk("cookie=%p actions= %s\n",
			      __entry->cookie, __entry->num ?
				      __parse_action(__get_dynamic_array(ids),
						     __entry->num) : "NULL"
			      )
);

DEFINE_EVENT(mlx5e_flower_template, mlx5e_configure_flower,
	     TP_PROTO(const struct flow_cls_offload *f),
	     TP_ARGS(f)
	     );

DEFINE_EVENT(mlx5e_flower_template, mlx5e_delete_flower,
	     TP_PROTO(const struct flow_cls_offload *f),
	     TP_ARGS(f)
	     );

TRACE_EVENT(mlx5e_stats_flower,
	    TP_PROTO(const struct flow_cls_offload *f),
	    TP_ARGS(f),
	    TP_STRUCT__entry(__field(void *, cookie)
			     __field(u64, bytes)
			     __field(u64, packets)
			     __field(u64, lastused)
			     ),
	    TP_fast_assign(__entry->cookie = (void *)f->cookie;
		__entry->bytes = f->stats.bytes;
		__entry->packets = f->stats.pkts;
		__entry->lastused = f->stats.lastused;
		),
	    TP_printk("cookie=%p bytes=%llu packets=%llu lastused=%llu\n",
		      __entry->cookie, __entry->bytes,
		      __entry->packets, __entry->lastused
		      )
);

TRACE_EVENT(mlx5e_tc_update_neigh_used_value,
	    TP_PROTO(const struct mlx5e_neigh_hash_entry *nhe, bool neigh_used),
	    TP_ARGS(nhe, neigh_used),
	    TP_STRUCT__entry(__string(devname, nhe->m_neigh.dev->name)
			     __array(u8, v4, 4)
			     __array(u8, v6, 16)
			     __field(bool, neigh_used)
			     ),
	    TP_fast_assign(const struct mlx5e_neigh *mn = &nhe->m_neigh;
			struct in6_addr *pin6;
			__be32 *p32;

			__assign_str(devname, mn->dev->name);
			__entry->neigh_used = neigh_used;

			p32 = (__be32 *)__entry->v4;
			pin6 = (struct in6_addr *)__entry->v6;
			if (mn->family == AF_INET) {
				*p32 = mn->dst_ip.v4;
				ipv6_addr_set_v4mapped(*p32, pin6);
			} else if (mn->family == AF_INET6) {
				*pin6 = mn->dst_ip.v6;
			}
			),
	    TP_printk("netdev: %s IPv4: %pI4 IPv6: %pI6c neigh_used=%d\n",
		      __get_str(devname), __entry->v4, __entry->v6,
		      __entry->neigh_used
		      )
);

#endif /* _MLX5_TC_TP_ */

/* This part must be outside protection */
#undef TRACE_INCLUDE_PATH
#define TRACE_INCLUDE_PATH ./diag
#undef TRACE_INCLUDE_FILE
#define TRACE_INCLUDE_FILE en_tc_tracepoint
#include <trace/define_trace.h>
Loading