AF_SMC - Sockets for SMC communication
#define AF_SMC 43
#define SMCPROTO_SMC 0
#define SMCPROTO_SMC6 1
tcp_sockfd = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC);
tcp_sockfd = socket(AF_SMC, SOCK_STREAM, SMCPROTO_SMC6);
Shared Memory Communication via RDMA (SMC) is a socket over the RDMA communication protocol that allows existing TCP socket applications to transparently benefit from RDMA when exchanging data over an RDMA over Converged Ethernet (RoCE) network. Those networks are not routable. SMC provides host-to-host direct memory access without traditional TCP/IP processing overhead. SMC offers preservation of existing IP topology and IP security, and introduces minimal administrative and operational changes. The exploitation of SMC is transparent to TCP socket applications.
The new address family AF_SMC supports the SMC protocol on Linux. It keeps the address format of AF_INET and AF_INET6 sockets and supports streaming socket types only.
Two usage modes are possible:
SMC socket capabilities are negotiated at connection setup. If one peer is not SMC capable, further socket processing falls back to TCP usage automatically.
|AF_SMC native usage||uses the socket domain AF_SMC instead of AF_INET and AF_INET6. Specify SMCPROTO_SMC for AF_INET compatible socket semantics, and SMC_PROTO_SMC6 for AF_INET6 respectively.|
|Usage of AF_INET socket applications with SMC preload library||converts AF_INET and AF_INET6 sockets to AF_SMC sockets. The SMC preload library is part of the SMC tools package.|
Implementation details: Links and Link Groups
To run RDMA traffic to a peer, a so-called link is established between a local RoCE card and a remote RoCE card. To enhance availability, you can configure alternate links with automatic fail over. Primary and backup links to a certain peer are combined in a so-called link group.
RoCE adapter mapping: Creation of a pnet table
The SMC protocol requires grouping of multiple physical networks - standard Ethernet and RoCE networks. Such groups are called Physical Networks (PNets). For SMC, RoCE adapter mapping is configured within a table called pnet table. Any available Ethernet interface can be combined with available RDMA-capable network interface cards (RNICs), if they belong to the same Converged Ethernet fabric. To configure RoCE Adapter mapping, you must create a pnet table. Modify the table with the smc-tools command smc_pnet.
For details see smc_pnet(8).
Displaying SMC socket state information
SMC socket state information can be obtained with the smc-tools command smcss. For details see smcss(8).
Starting a TCP application to work with SMC
To use an existing TCP application to work with SMC, use the SMC preload library. The SMC Tools package provides the command smc_run to convert AF_INET and AF_INET6 socket calls to AF_SMC socket calls by means of the preload technique. For more information about the preload mechanism, see also ld.so(8).
This command-line example starts an FTP client over SMC.
MTU and Infiniband data transfer
Infiniband traffic may use MTU values 256, 512, 1024, 2048, or 4096. SMC determines the configured MTU size of the RoCE Ethernet port, announces this MTU size to the peer during connection start, and chooses the minimum MTU size of both peers.
|AF_SMC, version 1.0.0|