å› äØ“èƒ½åŠ›æœ‰é™åQŒè¿˜æ˜¯æœ‰å¾ˆå¤šä¸œè¥¿åQˆSO_REUSEADDRå’ŒSO_REUSEPORT的区别ç‰åQ‰æ²¡æœ‰èƒ½å¤Ÿåœ¨ä¸€½‹‡æ–‡å—ä¸è¡¨è¾¾æ¸…楚åQŒä½œä¸ø™¡¥é—,也方便以åŽè‡ªå·±å›ž˜q‡å¤´æ¥å¤ä¹ ã€?/p>
ä¸¤è€…ä¸æ˜¯ä¸€ç 事åQŒæ²¡æœ‰å¯æ¯”æ€§ã€‚æœ‰æ—¶ä¹Ÿä¼šè¢«å…¶æžæ™•,自己æ€È»“çš„ä¸å¥½ï¼ŒæŽ¨èStackOverflowçš?a >Socket options SO_REUSEADDR and SO_REUSEPORT, how do they differ?资料åQŒæ€È»“的很全é¢ã€?/p>
½Ž€å•æ¥è¯ß_¼š
若有困惑åQŒæŽ¨è两者都讄¡½®åQŒä¸ä¼šæœ‰å†²çªã€?/p>
上一½‹‡è®²åˆ°SO_REUSEPORTåQŒå¤šä¸ªç¨‹¾l‘定åŒä¸€ä¸ªç«¯å£ï¼Œå¯ä»¥æ ÒŽ®éœ€è¦æŽ§åˆ¶è¿›½E‹çš„æ•°é‡ã€‚这里讲讲基äº?code>Netty 4.0.25+Epoll navtie transport在å•个进½E‹å†…多个¾U¿ç¨‹¾l‘定åŒä¸€ä¸ªç«¯å£çš„æƒ…况åQŒä¹Ÿæ˜¯æ¯”较实用的ã€?/p>
˜q™æ˜¯ä¸€ä¸ªPING-PONG½Cø™Œƒåº”用åQ?/p>
public void run() throws Exception {
final EventLoopGroup bossGroup = new EpollEventLoopGroup();
final EventLoopGroup workerGroup = new EpollEventLoopGroup();
ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup)
.channel(EpollServerSocketChannel. class)
.childHandler( new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ch.pipeline().addLast(
new StringDecoder(CharsetUtil.UTF_8 ),
new StringEncoder(CharsetUtil.UTF_8 ),
new PingPongServerHandler());
}
}).option(ChannelOption. SO_REUSEADDR, true)
.option(EpollChannelOption. SO_REUSEPORT, true)
.childOption(ChannelOption. SO_KEEPALIVE, true);
int workerThreads = Runtime.getRuntime().availableProcessors();
ChannelFuture future;
for ( int i = 0; i < workerThreads; ++i) {
future = b.bind( port).await();
if (!future.isSuccess())
throw new Exception(String. format("fail to bind on port = %d.",
port), future.cause());
}
Runtime. getRuntime().addShutdownHook (new Thread(){
@Override
public void run(){
workerGroup.shutdownGracefully();
bossGroup.shutdownGracefully();
}
});
}
打æˆjar包,在CentOS 7下题q行åQŒæ£€æŸ¥åŒä¸€ä¸ªç«¯å£æ‰€æ‰“å¼€çš„æ–‡ä»¶å¥æŸ„ã€?/p>
# lsof -i:8000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 3515 root 42u IPv6 29040 0t0 TCP *:irdmi (LISTEN)
java 3515 root 43u IPv6 29087 0t0 TCP *:irdmi (LISTEN)
java 3515 root 44u IPv6 29088 0t0 TCP *:irdmi (LISTEN)
java 3515 root 45u IPv6 29089 0t0 TCP *:irdmi (LISTEN)
åŒä¸€˜q›ç¨‹åQŒä½†æ‰“å¼€çš„æ–‡ä»¶å¥æŸ„是ä¸ä¸€æ ïLš„ã€?/p>
/**
* UDPè°šè¯æœåŠ¡å™¨ï¼Œå•è¿›½E‹å¤š¾U¿ç¨‹¾l‘定åŒä¸€ç«¯å£½Cø™Œƒ
*/
public final class QuoteOfTheMomentServer {
private static final int PORT = Integer.parseInt(System. getProperty("port" ,
"9000" ));
public static void main(String[] args) throws Exception {
final EventLoopGroup group = new EpollEventLoopGroup();
Bootstrap b = new Bootstrap();
b.group(group).channel(EpollDatagramChannel. class)
.option(EpollChannelOption. SO_REUSEPORT, true )
.handler( new QuoteOfTheMomentServerHandler());
int workerThreads = Runtime.getRuntime().availableProcessors();
for (int i = 0; i < workerThreads; ++i) {
ChannelFuture future = b.bind( PORT).await();
if (!future.isSuccess())
throw new Exception(String.format ("Fail to bind on port = %d.",
PORT), future.cause());
}
Runtime. getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
group.shutdownGracefully();
}
});
}
}
}
@Sharable
class QuoteOfTheMomentServerHandler extends
SimpleChannelInboundHandler<DatagramPacket> {
private static final String[] quotes = {
"Where there is love there is life." ,
"First they ignore you, then they laugh at you, then they fight you, then you win.",
"Be the change you want to see in the world." ,
"The weak can never forgive. Forgiveness is the attribute of the strong.", };
private static String nextQuote() {
int quoteId = ThreadLocalRandom.current().nextInt( quotes .length );
return quotes [quoteId];
}
@Override
public void channelRead0(ChannelHandlerContext ctx, DatagramPacket packet)
throws Exception {
if ("QOTM?" .equals(packet.content().toString(CharsetUtil. UTF_8))) {
ctx.write( new DatagramPacket(Unpooled.copiedBuffer( "QOTM: "
+ nextQuote(), CharsetUtil. UTF_8), packet.sender()));
}
}
@Override
public void channelReadComplete(ChannelHandlerContext ctx) {
ctx.flush();
}
@Override
public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
cause.printStackTrace();
}
}
åŒæ ·ä¹Ÿè¦‹‚€‹¹‹ä¸€ä¸‹ç«¯å£æ–‡ä»¶å¥æŸ„打开情况åQ?/p>
# lsof -i:9000
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 3181 root 26u IPv6 27188 0t0 UDP *:cslistener
java 3181 root 27u IPv6 27217 0t0 UDP *:cslistener
java 3181 root 28u IPv6 27218 0t0 UDP *:cslistener
java 3181 root 29u IPv6 27219 0t0 UDP *:cslistener
以上为Netty+SO_REUSEPORT多线½E‹ç»‘定åŒä¸€ç«¯å£çš„ä¸€äº›æƒ…å†µï¼Œæ˜¯äØ“è®°è²ã€?/p>
本篇用于记录å¦ä¹ SO_REUSEPORT的笔记和心得åQŒæœ«ž®¾è¿˜ä¼šæä¾›ä¸€ä¸ªbindpž®å·¥å…·ä¹Ÿèƒ½äؓ已有的程åºäínå—这个新的特性ã€?/p>
˜q行在Linux¾pÈ»Ÿä¸Šç½‘¾lœåº”用程åºï¼Œä¸ÞZº†åˆ©ç”¨å¤šæ ¸çš„优势,一般ä‹É用以下比较典型的多进½E?多线½E‹æœåŠ¡å™¨æ¨¡åž‹åQ?/p>
ä¸Šé¢æ¨¡åž‹è™½ç„¶å¯ä»¥åšåˆ°¾U¿ç¨‹å’ŒCPUæ ¸ç»‘å®šï¼Œä½†éƒ½ä¼šå˜åœ¨ï¼š
比如HTTP CPS(Connection Per Second)åžåé‡åƈ没有éšç€CPUæ ¸æ•°å¢žåŠ å‘ˆçŽ°¾U¿æ€§å¢žé•¿ï¼š
Linux kernel 3.9带æ¥äº†SO_REUSEPORTç‰ÒŽ€§ï¼Œå¯ä»¥è§£å†³ä»¥ä¸Šå¤§éƒ¨åˆ†é—®é¢˜ã€?/p>
linux man文档ä¸ä¸€ŒD‰|–‡å—æ˜q°å…¶ä½œç”¨åQ?/p>
The new socket option allows multiple sockets on the same host to bind to the same port, and is intended to improve the performance of multithreaded network server applications running on top of multicore systems.
SO_REUSEPORT支æŒå¤šä¸ª˜q›ç¨‹æˆ–者线½E‹ç»‘定到åŒä¸€ç«¯å£åQŒæé«˜æœåС噍½E‹åºçš„æ€§èƒ½åQŒè§£å†³çš„问题åQ?/p>
å…¶æ ¸å¿ƒçš„å®žçŽ°ä¸»è¦æœ‰ä¸‰ç‚¹ï¼š
代ç 分æžåQŒå¯ä»¥å‚考引用资æ–?[多个˜q›ç¨‹¾l‘定相åŒç«¯å£çš„实现分æž[Google Patch]]ã€?/p>
以å‰é€šè¿‡fork
å½¢å¼åˆ›å¾å¤šä¸ªåè¿›½E‹ï¼ŒçŽ°åœ¨æœ‰äº†SO_REUSEPORTåQŒå¯ä»¥ä¸ç”¨é€šè¿‡fork
çš„åÅžå¼ï¼Œè®©å¤š˜q›ç¨‹ç›‘å¬åŒä¸€ä¸ªç«¯å£ï¼Œå„个˜q›ç¨‹ä¸?code>accept socket fdä¸ä¸€æ øP¼Œæœ‰æ–°˜qžæŽ¥å»ºç«‹æ—Óž¼Œå†…æ ¸åªä¼šå”¤é†’一个进½E‹æ¥accept
åQŒåƈ且ä¿è¯å”¤é†’çš„å‡è¡¡æ€§ã€?/p>
模型½Ž€å•,¾l´æŠ¤æ–¹ä¾¿äº†ï¼Œ˜q›ç¨‹çš„管ç†å’Œåº”用逻辑解耦,˜q›ç¨‹çš„ç®¡ç†æ°´òqÏx‰©å±•æƒé™ä¸‹æ”„¡»™½E‹åºå‘?½Ž¡ç†å‘˜ï¼Œå¯ä»¥æ ÒŽ®å®žé™…˜q›è¡ŒæŽ§åˆ¶˜q›ç¨‹å¯åЍ/å…³é—åQŒå¢žåŠ äº†ç‰|´»æ€§ã€?/p>
˜q™å¸¦æ¥äº†ä¸€ä¸ªè¾ƒä¸ºå¾®è§‚的水åã^扩展æ€èµ\åQŒçº¿½E‹å¤šž®‘是å¦åˆé€‚ï¼ŒçŠ¶æ€æ˜¯å¦å˜åœ¨å…±äº«ï¼Œé™ä½Žå•个˜q›ç¨‹çš„资æºä¾èµ–ï¼Œé’ˆå¯¹æ— çŠ¶æ€çš„æœåŠ¡å™¨æž¶æž„æœ€ä¸ºé€‚åˆäº†ã€?/p>
å¯ä»¥å¾ˆæ–¹ä¾¿çš„‹¹‹è¯•新特性,åŒä¸€ä¸ªç¨‹åºï¼Œä¸åŒç‰ˆæœ¬åŒæ—¶˜q行ä¸ï¼Œæ ÒŽ®˜q行¾l“果军_®šæ–°è€ç‰ˆæœ¬æ›´˜q与å¦ã€?/p>
针对对客æˆïL«¯è€Œè¨€åQŒè¡¨é¢ä¸Šæ„Ÿå—ä¸åˆ°å…¶å˜åŠ¨ï¼Œå› äØ“˜q™äº›å·¥ä½œå®Œå…¨åœ¨æœåŠ¡å™¨ç«¯è¿›è¡Œã€?/p>
æƒÏx³•是,我们˜q代了一版本åQŒéœ€è¦éƒ¨¾|²åˆ°¾U¿ä¸ŠåQŒäؓ之å¯åŠ¨ä¸€ä¸ªæ–°çš„è¿›½E‹åŽåQŒç¨åŽå…³é—旧版本˜q›ç¨‹½E‹åºåQŒæœåŠ¡ä¸€ç›´åœ¨˜q行ä¸ä¸é—´æ–åQŒéœ€è¦åã^衡过度。这ž®±åƒErlangè¯è¨€å±‚颿‰€æä¾›çš„çƒæ›´æ–°ä¸€æ —÷€?/p>
æƒÏx³•ä¸é”™åQŒä½†æ˜¯å®žé™…æ“作è“væ¥ï¼Œž®×ƒ¸æ˜¯é‚£ä¹ˆåã^滑了åQŒè¿˜å¥½æœ‰ä¸€ä¸?a >hubtimeå¼€æºå·¥å…øP¼ŒåŽŸç†ä¸?code>SIGHUPä¿¡å·å¤„ç†å™?SO_REUSEPORT+LD_RELOADåQŒå¯ä»¥å¸®åŠ©æˆ‘ä»¬è½»æ‘֚刎ͼŒæœ‰éœ€è¦çš„åŒå¦å¯ä»¥‹‚€å‡ø™¯•用一下ã€?/p>
SO_REUSEPORTæ ÒŽ®æ•°æ®åŒ…的四元¾l„{src ip, src port, dst ip, dst port}和当å‰ç»‘定åŒä¸€ä¸ªç«¯å£çš„æœåŠ¡å™¨å¥—æŽ¥å—æ•°é‡˜q›è¡Œæ•°æ®åŒ…分å‘。若æœåŠ¡å™¨å¥—æŽ¥å—æ•°é‡äº§ç”Ÿå˜åŒ–åQŒå†…æ æ€¼šæŠŠæœ¬è¯¥ä¸Šä¸€ä¸ªæœåŠ¡å™¨å¥—æŽ¥å—æ‰€å¤„ç†çš„客æˆïL«¯˜qžæŽ¥æ‰€å‘é€çš„æ•°æ®åŒ…ï¼ˆæ¯”å¦‚ä¸‰æ¬¡æ¡æ‰‹æœŸé—´çš„劘qžæŽ¥åQŒä»¥åŠå·²¾lå®Œæˆæ¡æ‰‹ä½†åœ¨é˜Ÿåˆ—ä¸æŽ’é˜Ÿçš„è¿žæŽ¥ï¼‰åˆ†å‘到其它的æœåŠ¡å™¨å¥—æŽ¥å—上é¢åQŒå¯èƒ½ä¼šå¯ÆD‡´å®¢æˆ·ç«¯è¯·æ±‚失败,一般å¯ä»¥ä‹É用:
与RFS/RPS/XPS-mqå作åQŒå¯ä»¥èŽ·å¾—è¿›ä¸€æ¥çš„æ€§èƒ½åQ?/p>
目的嘛,数æ®åŒ…çš„è½¯ç¡¬ä¸æ–ã€æŽ¥æ”¶ã€å¤„ç†ç‰åœ¨ä¸€ä¸ªCPUæ æ€¸ŠåQŒåƈ行化处ç†åQŒå°½å¯èƒ½åšåˆ°èµ„æºåˆ©ç”¨æœ€å¤§åŒ–ã€?/p>
虽然SO_REUSEPORT解决了多个进½E‹å…±åŒç»‘å®?监å¬åŒä¸€ç«¯å£çš„é—®é¢˜ï¼Œä½†æ ¹æ®æ–°‹¹ªæž—晓峰åŒå¦‹¹‹è¯•¾l“æžœæ¥çœ‹åQŒåœ¨å¤šæ ¸æ‰©å±•层é¢ä¹Ÿæœªèƒ½å¤Ÿåšåˆ°ç†æƒ³çš„线性扩展:
å¯ä»¥å‚考Fastsocket在其基础之上的改˜q›ï¼Œé“¾æŽ¥åœ°å€ã€?/p>
æ·˜å®çš„Tengineå·²ç»æ”¯æŒäº†SO_REUSEPORTç‰ÒŽ€§ï¼Œåœ¨å…¶‹¹‹è¯•报告ä¸ï¼Œæœ‰ä¸€ä¸ªç®€å•测试,å¯ä»¥çœ‹å‡ºæ¥ç›¸å¯Òޝ”SO_REUSEPORT所带æ¥çš„æ€§èƒ½æå‡åQ?/p>
使用SO_REUSEPORT以åŽåQŒæœ€æ˜Žæ˜¾çš„æ•ˆæžœæ˜¯åœ¨åŽ‹åŠ›ä¸‹ä¸å®¹æ˜“出çŽîC¸¢è¯äh±‚的情况,CPUå‡è¡¡æ€§åã^½EŸë€?/p>
JDK 1.6è¯è¨€å±‚é¢ä¸æ”¯æŒï¼Œè‡³äºŽä»¥åŽçš„版本,ç”׃ºŽæš‚时没有使用刎ͼŒä¸å¤šè¯´ã€?/p>
Netty 3/4ç‰ˆæœ¬é»˜è®¤éƒ½ä¸æ”¯æŒSO_REUSEPORTç‰ÒŽ€§ï¼Œä½†Netty 4.0.19以åŠä¹‹åŽç‰ˆæœ¬æ‰çœŸæ£æä¾›äº†JNIæ–¹å¼å•独包装的epoll native transport版本åQˆåœ¨Linux¾pÈ»Ÿä¸‹è¿è¡Œï¼‰åQŒå¯ä»¥é…¾|®ç±»ä¼égºŽSO_REUSEPORT½{‰ï¼ˆJAVA NIIO没有æä¾›åQ‰é€‰é¡¹åQŒè¿™éƒ¨åˆ†æ˜¯åœ¨io.netty.channel.epoll.EpollChannelOption
ä¸å®šä¹‰ï¼ˆåœ¨çº¿ä»£ç 部分åQ‰ã€?/p>
在linux环境下ä‹É用epoll native transportåQŒå¯ä»¥èŽ·å¾—å†…æ ¸å±‚é¢ç½‘¾lœå †æ ˆå¢žå¼ºçš„¾U¢åˆ©åQŒå¦‚何ä‹É用å¯å‚è€?a >Native transports文档ã€?/p>
使用epoll native transport倒也½Ž€å•,¾cÕd½E作替æ¢åQ?/p>
NioEventLoopGroup → EpollEventLoopGroup
NioEventLoop → EpollEventLoop
NioServerSocketChannel → EpollServerSocketChannel
NioSocketChannel → EpollSocketChannel
比如写一个PING-PONG应用æœåŠ¡å™¨ç¨‹åºï¼Œ¾cÖM¼¼ä»£ç åQ?/p>
public void run() throws Exception {
EventLoopGroup bossGroup = new EpollEventLoopGroup();
EventLoopGroup workerGroup = new EpollEventLoopGroup();
try {
ServerBootstrap b = new ServerBootstrap();
ChannelFuture f = b
.group(bossGroup, workerGroup)
.channel(EpollServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch)
throws Exception {
ch.pipeline().addLast(
new StringDecoder(CharsetUtil.UTF_8),
new StringEncoder(CharsetUtil.UTF_8),
new PingPongServerHandler());
}
}).option(ChannelOption.SO_REUSEADDR, true)
.option(EpollChannelOption.SO_REUSEPORT, true)
.childOption(ChannelOption.SO_KEEPALIVE, true).bind(port)
.sync();
f.channel().closeFuture().sync();
} finally {
workerGroup.shutdownGracefully();
bossGroup.shutdownGracefully();
}
}
è‹¥ä¸è¦è¿™ä¹ˆæŠ˜è…¾ï¼Œ˜q˜æƒ³è®©ä»¥å¾€Java/Netty应用½E‹åºåœ¨ä¸åšä“Qä½•æ”¹åŠ¨çš„å‰æä¸‹é¡ºåˆ©åœ¨Linux kernel >= 3.9ä¸‹åŒæ ·äínå—到SO_REUSEPORT带æ¥çš„好处,ä¸å¦¨ž®è¯•一ä¸?a >bindpåQŒæ›´ä¸ºç»‹¹Žï¼Œ˜q™ä¸€éƒ¨åˆ†ä¸‹é¢ä¼šè®²åˆ°ã€?/p>
以剿‰€å†?a >bindpž®ç¨‹åºï¼Œå¯ä»¥ä¸ºå·²æœ‰ç¨‹åºç»‘定指定的IP地å€å’Œç«¯å£ï¼Œä¸€æ–šw¢å¯ä»¥çœåŽ»¼‹¬ç¼–ç ,å¦ä¸€æ–šw¢ä¹ŸäØ“‹¹‹è¯•æä¾›äº†ä¸€äº›æ–¹ä¾Ñ€?/p>
å¦å¤–åQŒäØ“äº†è®©ä»¥å‰æ²¡æœ‰¼‹¬ç¼–ç ?code>SO_REUSEPORT的应用程åºå¯ä»¥åœ¨Linuxå†…æ ¸3.9以åŠä¹‹åŽLinux¾pÈ»Ÿä¸Šä¹Ÿèƒ½å¤Ÿå¾—åˆ°å†…æ ¸å¢žå¼ºæ”¯æŒåQŒç¨åšä¿®æ”¹ï¼Œæ·ÕdŠ æ”¯æŒã€?/p>
ä½†è¦æ±‚如下:
䏿»¡‘³ä»¥ä¸Šæ¡ä»Óž¼Œæ¤ç‰¹æ€§å°†æ— 法生效ã€?/p>
使用½Cø™ŒƒåQ?/p>
REUSE_PORT=1 BIND_PORT=9999 LD_PRELOAD=./libbindp.so java -server -jar pingpongserver.jar &
当然åQŒä½ å¯ä»¥æ ÒŽ®éœ€è¦è¿è¡Œå‘½ä»¤å¤š‹Æ¡ï¼Œå¤šä¸ª˜q›ç¨‹ç›‘å¬åŒä¸€ä¸ªç«¯å£ï¼Œå•机˜q›ç¨‹æ°´åã^扩展ã€?/p>
使用python脚本快速构å»ÞZ¸€ä¸ªå°çš„示范原型,两个˜q›ç¨‹åQŒéƒ½ç›‘å¬åŒä¸€ä¸ªç«¯å?0000åQŒå®¢æˆïL«¯è¯äh±‚˜q”回ä¸åŒå†…容åQŒä»…供娱ä¹ã€?/p>
server_v1.pyåQŒç®€å•PING-PONGåQ?/p>
# -*- coding:UTF-8 -*-
import socket
import os
PORT = 10000
BUFSIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', PORT))
s.listen(1)
while True:
conn, addr = s.accept()
data = conn.recv(PORT)
conn.send('Connected to server[%s] from client[%s]\n' % (os.getpid(), addr))
conn.close()
s.close()
server_v2.pyåQŒè¾“å‡ºå½“å‰æ—¶é—ß_¼š
# -*- coding:UTF-8 -*-
import socket
import time
import os
PORT = 10000
BUFSIZE = 1024
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', PORT))
s.listen(1)
while True:
conn, addr = s.accept()
data = conn.recv(PORT)
conn.send('server[%s] time %s\n' % (os.getpid(), time.ctime()))
conn.close()
s.close()
借助于bindp˜q行两个版本的程åºï¼š
REUSE_PORT=1 LD_PRELOAD=/opt/bindp/libindp.so python server_v1.py &
REUSE_PORT=1 LD_PRELOAD=/opt/bindp/libindp.so python server_v2.py &
模拟客户端请æ±?0‹Æ¡ï¼š
for i in {1..10};do echo "hello" | nc 127.0.0.1 10000;done
看看¾l“æžœå§ï¼š
Connected to server[3139] from client[('127.0.0.1', 48858)]
server[3140] time Thu Feb 12 16:39:12 2015
server[3140] time Thu Feb 12 16:39:12 2015
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48862)]
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48864)]
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48866)]
Connected to server[3139] from client[('127.0.0.1', 48867)]
å¯ä»¥çœ‹å‡ºæ¥ï¼ŒCPU分é…很å‡è¡¡ï¼Œå„自分é…50%的请求é‡ã€?/p>
嗯,虽是ž®çީ典P¼Œæœ‰äº›æ„æ€?:))
更多使用说明åQŒè¯·å‚è€?a >READMEã€?/p>
å‰é¢å•°å•°å—¦å—¦çš„å‡ ½‹‡æ–‡å—,å„个斚w¢ä»‹ç»äº†FastsocketåQŒç›²äººæ‘¸è±¡ä¸€èˆ¬ï¼Œèƒ½åŠ›æœ‰é™åQŒè¿˜å¾—ç‘ô¾l深入å¦ä¹ 䏿˜¯ã€‚è¿™ä¸ï¼ŒåˆîCº†è¯¥å°¾l“æ”¶ž®„¡š„时候了ã€?/p>
使用Linuxä½œäØ“æœåŠ¡å™¨ï¼Œåœ¨è¯·æ±‚é‡å¾ˆå°çš„æ—¶å€™ï¼Œæ˜¯ä¸ç”¨æ‹…心其性能。但在æ“vé‡çš„æ•°æ®è¯äh±‚下,Linuxå†…æ ¸åœ¨TCP/IP¾|‘ç»œå¤„ç†æ–šw¢åQŒå·²¾læˆä¸ºç“¶é¢ˆã€‚比如新‹¹ªåœ¨æŸå°HAProxyæœåŠ¡å™¨ä¸Šå–æ ·åQ?0%çš„CPUæ—‰™—´è¢«å†…æ ¸å 用,应用½E‹åºåªèƒ½å¤Ÿåˆ†é…到较少的CPUæ—‰™’Ÿå‘¨æœŸçš„资æºã€?/p>
¾l过Haproxy¾pÈ»Ÿè¯¦å°½åˆ†æžåŽï¼Œå‘现大部分CPUèµ„æºæ¶ˆè€—在kernel里,òq¶ä¸”åœ¨å¤šæ ¸åã^åîC¸‹åQŒkernel在网¾lœåè®®æ ˆå¤„ç†˜q‡ç¨‹ä¸å˜åœ¨ç€å¤§é‡åŒæ¥å¼€é”€ã€?/p>
åŒæ—¶åœ¨å¤šæ æ€¸Š˜q›è¡Œ‹¹‹è¯•åQŒHTTP CPS(Connection Per Second)åžåé‡åƈ没有éšç€CPUæ ¸æ•°å¢žåŠ å‘ˆçŽ°¾U¿æ€§å¢žé•¿ï¼š
TCP处ç†&å¤šæ ¸
Linux VFSçš„åŒæ¥æŸè€—严é‡?/p>
CPU之间ä¸å…±äº«æ•°æ®ï¼Œòq¶è¡ŒåŒ–å„自独立处ç†TCP˜qžæŽ¥åQŒä¹Ÿæ˜¯å…¶é«˜æ•ˆçš„主è¦åŽŸå› ã€‚å…¶æž¶æž„å›‘Ö¯ä»¥çœ‹å‡ºå…¶æ”¹è¿›åQ?/p>
Fastsocketæž¶æž„å›‘Ö¯ä»¥å¾ˆæ¸…æ™°è¯´æ˜Žå…¶å¤§è‡´ç»“æž„ï¼Œå†…æ ¸æ€å’Œç”¨æˆ·æ€é€šè¿‡ioctl
å‡½æ•°ä¼ è¾“ã€‚è®°å¾—netmap在é‡å†™ç½‘å¡é©±åŠ¨é‡Œé¢é€šè¿‡ioctl
函数直接é€ä¼ 到用æˆäh€ä¸åQŒå…¶æ›´äؓ高效åQŒä½†æ²¡æœ‰å®Œæ•´çš„TCP/IP¾|‘ç»œå †æ ˆæ”¯æŒå˜›ã€?/p>
在新‹¹ªæµ‹è¯•ä¸åQŒåœ¨24æ ¸çš„å®‰è£…æœ‰Centos 6.5çš„æœåŠ¡å™¨ä¸Šï¼Œå€ŸåŠ©äºŽFastsocketåQŒNginxå’ŒHAProxyæ¯ç§’处熘qžæŽ¥æ•°æŒ‡æ ‡ï¼ˆconnection/secondåQ‰æ€§èƒ½å¾ˆæƒŠäººï¼Œåˆ†åˆ«å¢žåŠ 290%å’?20%ã€‚è¿™ä¹Ÿè¯æ˜Žäº†åQŒFastsocket带æ¥äº†TCP˜qžæŽ¥å¿«é€Ÿå¤„ç†çš„能力ã€?除æ¤ä¹‹å¤–åQŒå€ŸåŠ©äºŽç¡¬ä»¶ç‰¹æ€§ï¼š
Fastsocket V1.0æ£å¼ç‰ˆä»Ž2014òq?月䆾开始已¾l在新浪生äñ”环境ä¸ä‹Éç”¨ï¼Œç”¨ä½œä»£ç†æœåŠ¡å™¨ï¼Œå› æ¤å¤§å®¶å¯ä»¥è€ƒè™‘是å¦å¯ä»¥é‡‡ç”¨ã€‚é’ˆå¯?.0版本åQŒä»¥ä¸‹çŽ¯å¢ƒè¾ƒä¸ºæ”¶ç›Šï¼š
多线½E‹å˜›åQŒå°±å¾—需è¦å‚考示范应用所æä¾›å®žè·µå»ø™®®äº†ã€?/p>
从下表测试图片ä¸åQŒå¯ä»¥çœ‹åˆŽÍ¼š
‹¹‹è¯•¾l“æžœä¸ï¼š
8æ ¸æœåС噍¾U¿ä¸ŠçŽ¯å¢ƒ˜q行äº?4ž®æ—¶çš„æˆ¾l©ï¼Œå›¾a展示了部¾|²fastsocket之å‰CPU利用率,图b为部¾|²äº†fastsocekt之åŽçš„CPU利用率ã€?Fastsocket带æ¥çš„æ”¶ç›Šï¼š
其实å§ï¼Œ˜q™ä¸€å—期待新‹¹ªå…¬å¸ƒæ›´å¤šçš„æ•°æ®ã€?/p>
长连接支æŒï¼Œ˜q˜æ˜¯éœ€è¦ç‰ä¸€½{‰çš„ã€‚ä½†æ˜¯è¦æ”¯æŒä»€ä¹ˆç±»åž‹é•¿˜qžæŽ¥åQŸç™¾ä¸‡çñ”别应用æœåС噍¾cÕdž‹åQŒè¿˜æ˜¯redisåQŒå¯èƒ½æ˜¯åŽè€…ã€‚è™½ç„¶ç›®å‰æ£åšï¼Œä½†ç›®å‰æ²¡æœ‰æ—¶é—´è¡¨åQŒä½†ç›®å‰æ‰€åšç‰¹æ€§æ€È»“如下åQ?/p>
‹¹‹è¯•环境:
Redisé…置选项:
‹¹‹è¯•¾l“æžœåQ?/p>
ä½†éœ€è¦æ³¨æ„:
V1.1版本è¦å¢žåŠ é•¿˜qžæŽ¥çš„æ”¯æŒï¼Œé‚£ä¹ˆ¾cÖM¼¼äºŽRedisçš„æœåŠ¡å™¨åº”ç”¨½E‹åºž®±å¾ˆå—ç›Šäº†ï¼Œå› äØ“æ²¡æœ‰å…·ä½“çš„æ—¶é—´è¡¨åQŒåªèƒ½å¤Ÿæ…¢æ…¢½{‰å¾…了ã€?/p>
说是å¯Òޝ”åQŒå…¶å®žæ˜¯æˆ‘从mTCPè®ºæ–‡ä¸æ‘˜å–出æ¥ï¼Œå¢žåŠ äº†Fastsocket一æ ,å¯ä»¥çœ‹å‡ºäºÞZ»¬ä¸€ç›´åŠªåŠ›çš„è„šæ¥ã€?/p>
Types | Accept queue | Conn. Locality | Socket API | Event Handling | Packet I/O | Application Mod- ification | Kernel Modification |
PSIO , DPDK , PF RING , netmap | No TCP stack | Batched | No interface for transport layer | No (NIC driver) | |||
Linux-2.6 | Shared | None | BSD socket | Syscalls | Per packet | Transparent | No |
Linux-3.9 | Per-core | None | BSD socket | Syscalls | Per packet | Add option SO REUSEPORT | No |
Affinity-Accept | Per-core | Yes | BSD socket | Syscalls | Per packet | Transparent | Yes |
MegaPipe | Per-core | Yes | lwsocket | Batched syscalls | Per packet | Event model to completion I/O | Yes |
FlexSC,VOS | Shared | None | BSD socket | Batched syscalls | Per packet | Change to use new API | Yes |
mTCP | Per-core | Yes | User-level socket | Batched function calls | Batched | Socket API to mTCP API | No (NIC driver) |
Fastsocket | Per-core | Yes | BSD socket | Ioctl + kernel calls | Per packet | Transparent | No |
有一个大致的å°è±¡åQŒä¹Ÿæ–¹ä¾¿å¯Òޝ”åQŒä½†˜q™åªèƒ½æ˜¯ä¸€ä¸ªæš‚时的摘è¦è€Œå·²åQŒäh¾cÕd¯¹æ€§èƒ½çš„æÍ求æ€ÀL˜¯æœç€æ›´å¥½çš„æ–¹å‘å‘展ç€ã€?/p>
怎么说呢åQŒFastsocketæ˜¯äØ“å¤§å®¶è€³ç†Ÿèƒ½è¯¦æœåŠ¡å™¨ç¨‹åºNginxåQŒHAProxy½{‰è€Œå¼€å‘çš„ã€‚ä½†è‹¥åº”ç”¨çŽ¯å¢ƒäØ“å¤§é‡çš„矘qžæŽ¥åQŒåƈ且是ž®æ–‡ä»¶ç±»åž‹è¯·æ±‚,ä¸éœ€è¦å¼ºåˆ¶æ”¯æŒKeep-aliveç‰ÒŽ€§ï¼ˆçŸè¿žæŽ¥è¦çš„æ˜¯å¿«é€Ÿè¯·æ±?相应åQŒç„¶åŽå…³é—)åQŒé‚£ä¹ˆç®¡ç†å‘˜å¯ä»¥ž®è¯•一下FastsocketåQŒè‡³äºŽéƒ¨¾|²ç–略,选择性部¾|²å‡ åîC½œä¸ºå®žéªŒçœ‹çœ‹ç»“æžœã€?/p>
本系列到æ¤ç®—是告一ŒDµè½å•¦ã€‚以åŽå‘¢åQŒè‡ªç„¶æ˜¯å¸Œæœ›Fastsocketž®½å¿«å‘布寚w•¿˜qžæŽ¥çš„æ”¯æŒï¼Œ˜q˜æœ‰æ›´é«˜æ€§èƒ½çš„æå‡å’¯ :))
å‰é¢åˆ†æžFastsocket慢慢凑æˆäº†å‡ ½‹‡çƒ‚æ–‡å—åQŒè¦æŠŠä¸€ä»¶äº‹æƒ…åšæŒåšä¸‹æ¥åQŒæœ‰æ—¶å‘³åŒçˆµèœ¡ï¼Œä½†æ—¢ç„‰™€‰æ‹©äº†ï¼Œä¹Ÿå¾—¼‹¬ç€å¤´çš®åšä¸‹åŽ…R€‚é—²è¯å°‘è¯ß_¼Œæ–‡å½’æ£æ–‡ã€‚本文接自上½‹‡å†…æ ¸æ¨¡å—篇åQŒç‘ô¾l记录å¦ä¹ Fastsocketå†…æ ¸çš„ç¬”è®°å†…å®V€?/p>
Linux kernel 3.9包å«TCP/UDP支æŒå¤šè¿›½E‹ã€å¤š¾U¿ç¨‹¾l‘定åŒä¸€ä¸ªIP和端å£çš„ç‰ÒŽ€§ï¼Œå?code>SO_REUSEPORTåQ›åœ¨å†…æ ¸å±‚é¢åŒæ—¶ä¹Ÿè®©¾U¿ç¨‹/˜q›ç¨‹ä¹‹é—´å„自独äínSOCKETåQŒé¿å…CPUæ æ€¹‹é—´ä»¥é”资æºäº‰å¤?code>accept queue的调用。在fastsocket/kernel/net/sock.h定义sock_common
¾l“æž„æ—Óž¼Œå¯ä»¥çœ‹åˆ°å…¶èín影:
unsigned char skc_reuse:4;
unsigned char skc_reuseport:4;
在多个socket.hæ–‡äšgä¸ï¼ˆæ¯”如fastsocket/kernel/include/asm/socket.håQ‰ï¼Œå®šä¹‰äº†SO_REUSESORTçš„å˜é‡å€û|¼š
#define SO_REUSEPORT 15
在fastsocket/kernel/net/core/sock.cçš„sock_setsockoptå’Œsock_getsockopt函数ä¸ï¼Œéƒ½æœ‰SO_REUSEPORT
çš„èín影:
sock_setsockopt函数ä¸ï¼š
case SO_REUSEADDR:
sk->sk_reuse = valbool;
break;
case SO_REUSEPORT:
sk->sk_reuseport = valbool;
break;
sock_getsockopt函数体ä¸åQ?/p>
case SO_REUSEADDR:
v.val = sk->sk_reuse;
break;
case SO_REUSEPORT:
v.val = sk->sk_reuseport;
break;
åœ?code>SO_REUSEPORTç‰ÒŽ€§æ”¯æŒä¹‹å‰çš„事äšg驱动驱动æœåŠ¡å™¨èµ„æºç«žäº‰ï¼š
之åŽå‘¢ï¼Œå¯ä»¥çœ‹åšæ˜¯åÆˆè¡Œçš„äº†ï¼š
Fastsocket没有é‡å¤å‘明轮ååQŒåœ¨SO_REUSEPORT
基础上进行进一æ¥çš„优化½{‰ã€?/p>
嗯,åŽé¢å‡†å¤‡å†™ä¸€ä¸ªåЍæ€é“¾æŽ¥åº“ž®ç¨‹åºï¼Œæ‰“算让以å‰çš„æ²¡æœ‰¼‹¬ç¼–ç ?code>SO_REUSEPORT的程åºä¹Ÿèƒ½å¤Ÿåœ¨Linux kernel >= 3.9¾pÈ»Ÿä¸Šäínå—真æ£çš„端å£é‡ç”¨çš„æ–°ç‰ÒŽ€§çš„æ”¯æŒã€?/p>
䏋颿Œ‰ç…§å…¶æž¶æž„图所½Cºå†…æ ¸å±‚é¢ä»Žä¸Šåˆ°ä¸‹ä¸€ä¸€åˆ—出ã€?/p>
å› äØ“Linux Kernel VFSçš„åŒæ¥æŸè€—严é‡?/p>
æäº¤è®°å½•åQ?/p>
a209dfc vfs: dont chain pipe/anon/socket on superblock s_inodes list
4b93688 fs: improve scalability of pseudo filesystems
对VFS的改˜q›ï¼Œåœ¨æ‰€æå‡çš„æ€§èƒ½ä¸å 有超˜q?0%的比例,效果éžå¸¸æ˜Žæ˜¾åQ?/p>
å¯¹äºŽå¤šæ ¸å¤šæŽ¥æ”‰™˜Ÿåˆ—æ¥è¯ß_¼Œlinux原生的åè®®æ ˆåªèƒ½listen在一个socket上é¢åQŒåƈ且所有完æˆä¸‰‹Æ¡æ¡æ‰‹è¿˜æ²¡æ¥å¾—åŠè¢«åº”用accept的套接å—都会攑օ¥å…‰™™„带的accept队列ä¸ï¼Œaccept¾pÈ»Ÿè°ƒç”¨å¿…须串行的从队列å–出åQŒå½“òq¶å‘é‡è¾ƒå¤§æ—¶å¤šæ ¸ç«žäº‰åQŒè¿™ž®†æˆä¸ºæ€§èƒ½ç“‰™¢ˆåQŒåª„å“å¾ç«‹è¿žæŽ¥å¤„ç†é€Ÿåº¦ã€?/p>
Local Listen TableåQŒfastsocket为æ¯ä¸€ä¸ªCPUæ ¸å…‹éš†ç›‘å¬å¥—接å—åQŒåƈä¿å˜åˆ°å…¶æœ¬åœ°è¡¨ä¸åQŒCPUæ æ€¹‹é—´ä¸ä¼šå˜åœ¨accept的竞争关¾p…R€‚下é¢äؓ引用æè¿°å†…容åQ?/p>
使用‹¹ç¨‹å›¾æ¦‚æ‹¬ä¸Šé¢æ‰€˜qŽÍ¼š
Linuxå†…æ ¸ä½¿ç”¨ä¸€ä¸ªå…¨å±€çš„hash表以åŠé”æ“作æ¥ç»´æŠ¤establised socketsåQˆè¢«ç”¨æ¥è·Ÿè¸ª˜qžæŽ¥çš„socketsåQ‰ã€‚Fastsocket æƒÏx³•是把全局table分散到per-Core tableåQŒå½“一个core需è¦è®¿é—®socket的时候,åªåœ¨éš¶å±žäºŽè‡ªå·Þqš„table䏿œç´¢ï¼Œå› æ¤ä¸éœ€è¦é”æ“纵åQŒä¹Ÿä¸å˜åœ¨èµ„æºç«žäº‰ã€‚ç”±fastsocket建立的socket本地local established tableä¸ï¼Œå…¶ä»–çš„regular socketsä¿å˜åœ¨globalçš„tableä¸ã€‚core首先去自å·Þqš„local table䏿Ÿ¥æ‰¾ï¼ˆä¸éœ€è¦é”åQ‰ï¼Œç„¶åŽåŽ»global䏿Ÿ¥æ‰¾ã€?/p>
默认情况下,应用½E‹åºä¸ÕdЍå‘包的时候,å‘出åŽÈš„包是通过æ£åœ¨æ‰§è¡Œæœ¬è¿›½E‹çš„那个CPU æ ¸ï¼ˆ¾pÈ»Ÿåˆ†é…的)æ¥å®Œæˆçš„åQ›è€ŒæŽ¥æ”¶æ•°æ®åŒ…的时CPU æ ¸æ˜¯ç”±å‰é¢æåˆ°çš„RSS或RPSæ¥ä¼ é€’ã€‚è¿™æ ·ä¸€æ¥ï¼Œ˜qžæŽ¥å¯èƒ½ç”׃¸åŒçš„两个CPUæ ¸æ¥å®Œæˆã€‚连接应该在本地化处ç†ã€‚RFSå’ŒIntel¾|‘å¡çš„FlowDirectorå¯ä»¥ä»ŽèÊYä»¶å’Œ¼‹¬äšg上缓解这¿U情况,但是ä¸å®Œå¤‡ã€?/p>
RFDåQˆReceive Flow DeliveråQ‰ä¸»è¦çš„æ€æƒ³æ˜¯CPUæ ¸æ•°ä¸ÕdЍå‘è“v˜qžæŽ¥çš„æ—¶å€™å¯ä»¥æŠŠCPU coreçš„æ ‡è¯†å’Œ˜qžæŽ¥çš„source port¾~–ç åˆîC¸€èµ—÷€‚CPU coreså’Œports的关¾pÈ”±ä¸€ä¸ªå…³¾p»é›†åˆæ¥å†›_®šã€coresåQŒports】, 对于一个portåQŒæœ‰å”¯ä¸€çš„一个core与之对应。当一个coreæ¥å¾ç«‹connection的时候,RFDéšæœºé€‰æ‹©ä¸€ä¸ªè·Ÿå½“å‰core匚w…çš„port。接收包的时候,RFD负责军_®š˜q™ä¸ªåŒ…应该让哪一个coreæ¥å¤„ç†ï¼Œå¦‚果当å‰core䏿˜¯è¢«é€‰ä¸çš„cpu coreåQŒé‚£ä¹ˆå°±deliver到选ä¸çš„cpu coreã€?/p>
一般æ¥è¯ß_¼ŒRFD对代ç†ç¨‹åºæ”¶ç›Šæ¯”较大åQŒå•¾U¯çš„WEBæœåС噍å¯ä»¥é€‰æ‹©¼›ç”¨ã€?/p>
以上å‚考了大é‡çš„外部资料进行整ç†è€ŒæˆåQŒè¿›è€Œå¯ä»¥èŽ·å¾—ä¸€ä¸ªè¾ƒä¸ºæ•´ä½“çš„Fastsocketå†…æ ¸æž¶æž„å°è±¡ã€?/p>
Fastsocket的努力,在å•个TCP˜qžæŽ¥çš„管ç†ä»Ž¾|‘å¡è§¦å‘çš„ç¡¬ä¸æ–ã€èÊY䏿–ã€ä¸‰‹Æ¡æ¡æ‰‹ã€æ•°æ®ä¼ 输ã€å››‹Æ¡æŒ¥æ‰‹ç‰å®Œæ•´çš„过½E‹åœ¨å®Œæ•´åœ¨ä¸€ä¸ªCPUæ æ€¸Š˜q›è¡Œå¤„ç†åQŒä»Žè€Œå®žçŽîCº†æ¯ä¸€ä¸ªCPUæ ¸å¿ƒTCPèµ„æºæœ¬åœ°åŒ–,˜q™æ ·ä¸ºå¤šæ ¸æ°´òqÏx‰©å±•打好了基础åQŒå‡ž®‘全局资æºç«žäº‰åQŒåã^行化处熘qžæŽ¥åQŒåŒæ—‰™™ä½Žæ–‡ä»‰™”的副作用åQŒåšåˆîCº†æžäؓ高效的矘qžæŽ¥å¤„ç†æ–ÒŽ¡ˆåQŒä¸å¾—ä¸èµžå•Šã€?/p>
本篇å¦ä¹ Fastsocketå†…æ ¸æ¨¡å—fastsocket.so
åQŒä½œä¸ºç”¨æˆäh€?code>libfsocket.soçš„å†…æ ¸æ€çš„æ”¯æŒåQŒå¤„ç?code>ioctlä¼ é€’åˆ°/dev/fastsocket
的数æ®ï¼Œéžå¸¸æ ¸å¿ƒå’ŒåŸº¼‹€ã€‚å—¯åQŒè¿˜æ˜¯å…ˆ¾˜»è¯‘åQŒéšåŽæŒŸå¸¦äº›ç‚¹è¯„˜q›æ¥ã€?/p>
Fastsocketå†…æ ¸æ¨¡å— (fastsocket.ko
) æä¾›è‹¥å¹²ç‰ÒŽ€§ï¼Œòq¶å„自具有开å¯å’Œå…³é—½{‰ä¸°å¯Œé€‰é¡¹å¯é…¾|®ã€?/p>
CentOS 6.5带æ¥çš„å†…æ ”R”竞争处处å¯è§åQŒå¯¼è‡´æ— 论如何优化TCP/IP¾|‘ç»œå †æ ˆéƒ½ä¸èƒ½å¤Ÿå¸¦æ¥å¾ˆå¥½çš„æ€§èƒ½æ‰©å±•。比较严é‡é”竞争例ååQ?code>inode_lockå’?code>dcache_lockåQŒé’ˆå¯¹å¥—æŽ¥å—æ–‡äšg¾pÈ»Ÿsockfs而言åQŒåÆˆä¸æ˜¯å¿…须。fastsocket通过在VFSåˆå§‹åŒ–结构时æä¾›fastpath快速èµ\径用以解å†Ïx¤™åšw—®é¢˜ï¼Œå·²ç»å‘代å·äؓ香è‰åQˆvanillaåQ‰çš„å†…æ ¸æäº¤äº†ä¸¤å¤„修改:
a209dfc vfs: dont chain pipe/anon/socket on superblock s_inodes list
4b93688 fs: improve scalability of pseudo filesystems
æ¤é¡¹ä¿®æ”¹æ²¡æœ‰æä¾›é€‰é¡¹å¯ä¾›é…ç½®åQŒå› æ¤æ‰€æœ‰fastsocket创å¾çš„套接å—sockets都会强制¾lç”±fastpathä¼ è¾“ã€?/p>
fastsocket为æ¯ä¸ªCPU创å¾äº†ä¸€ä¸ªæœ¬åœ°socket监å¬è¡¨ï¼ˆlocal listen tableåQ‰ï¼Œåº”用½E‹åºå¯ä»¥å†›_®šåœ¨ä¸€ä¸ªç‰¹å®šCPUå†…æ ¸ä¸Šå¤„ç†æŸä¸ªæ–°çš„连接,具体ž®±æ˜¯é€šè¿‡æ‹¯‚´åŽŸå§‹ç›‘å¬å¥—接å—socketåQŒç„¶åŽæ’入到本地套接å—socket监å¬è¡¨ä¸ã€‚当新徘qžæŽ¥åœ¨æŸCPUå¤„ç†æ—Óž¼Œ¾pÈ»Ÿå†…æ ¸ž®è¯•匚w…本地socket监å¬è¡¨ï¼ŒåŒšw…æˆåŠŸä¼šæ’入到本地accept队列ä¸ã€‚ç¨åŽï¼ŒCPU会从本地accept队列ä¸èŽ·å–进行处ç†ã€?/p>
˜q™ç§æ–¹å¼æ¯ä¸€ä¸ªç½‘¾lœèÊY䏿–都会有隶属于自己本地套接å—队列当新的˜qžæŽ¥˜q›æ¥æ—¶å¯ä»¥åŽ‹å…¥ï¼Œæ¯ä¸€ä¸ªè¿›½E‹ä»Žæœ¬åœ°é˜Ÿåˆ—ä¸å¼¹å‡ø™¿žæŽ¥è¿›è¡Œå¤„ç†ã€‚当˜q›ç¨‹å’ŒCPU˜q›è¡Œ¾l‘定åQŒä¸€æ—¦æœ‰¾|‘塿ޥå£å†›_®šæŠ•递到æŸä¸ªCPUå†…æ ¸ä¸Šï¼Œé‚£ä¹ˆåŒ…æ‹¬¼‹¬ä¸æ–ã€èÊY䏿–ã€ç³»¾lŸè°ƒç”¨ä»¥åŠç”¨æˆ¯‚¿›½E‹ï¼Œéƒ½ä¼šæœ‰è¿™ä¸ªCPU全程负责。好处就是客æˆïL«¯è¯äh±‚˜qžæŽ¥åœ¨æ²¡æœ‰é”的竞争环境下分散到å„个CPUä¸Šè¢«åŠ¨å¤„ç†æœ¬åœ°è¿žæŽ¥ã€?/p>
本特性更适åˆä»¥ä¸‹æƒ…况åQ?/p>
½W¬ä¸€¿U情况下åQŒRPSå¯ä»¥åœ¨ç½‘å¡æŽ¥æ”‰™˜Ÿåˆ—å°äºŽCPUæ ¸æ•°æ—¶è¢«ä½¿ç”¨ã€‚ç¬¬äºŒç§æ–ÒŽ¡ˆå¯ä»¥æ»¡èƒö两个斚w¢åQ?/p>
å› æ¤åQ?code>enable_listen_spawnå…ähœ‰ä¸‰ä¸ªå€¼å¯ä¾›é…¾|®ï¼š
一旦开å¯ï¼Œéœ€è¦äؓ文äšg¾l“æž„é¢å¤–æ·ÕdР䏀嗿®µç”¨ä»¥ä¿å˜æ–‡äšg与epitemçš„æ˜ ž®„å…³¾p»ï¼Œ˜q™æ ·å¯çœåŽÕdœ¨epoll_ctl
æ–ÒŽ³•被调用时从epoll¾U¢é»‘æ ‘æŸ¥æ‰¾epitem的开销ã€?/p>
虽然æ¤é¡¹ä¼˜åŒ–有所修改epollè¯ä¹‰åQŒä½†å¸¦æ¥äº†å¥—æŽ¥å—æ€§èƒ½æå‡ã€‚å¼€å¯çš„å‰ææ˜¯ä¸€ä¸ªå¥—æŽ¥å—åªå…è®¸æ·»åŠ åˆ°ä¸€ä¸ªepoll实例ä¸ï¼Œä½†ä¸åŒ…括监å¬å¥—接å—。默认å€égØ“trueå¯ä»¥é€‚用于ç»å¤§å¤šæ•°åº”用程åºï¼Œè‹¥ä½ 的程åºä¸æ»¡èƒöæ¡äšgž®±å¾—需è¦ç¦ç”¨äº†ã€?/p>
enable_fast_epoll 为布ž®”åž‹boolean选项:
RFDåQˆReceive Flow DeliveråQ‰ä¼šæŠŠäؓ新徘qžæŽ¥åˆ†é…çš„CPU IDž®è£…到其˜qžæŽ¥çš„端å£å·ä¸ï¼Œè€Œä¸æ˜¯éšæœºé€‰æ‹©æ–°åˆ›å»ºçš„ä¸ÕdЍ˜qžæŽ¥çš„æºç«¯å£˜q›è¡Œåˆ†é…到CPU上ã€?/p>
当应用从‹zÕdЍ˜qžæŽ¥æ”¶åˆ°æ•°æ®åŒ…RFDè§£ç æ—Óž¼Œä¼šä»Žç›®çš„地端å£ä¸Šè§£æžå‡ºå¯¹åº”çš„CPUå†…æ ¸IDåQŒç‘ô而è{å‘给对应的CPUå†…æ ¸ã€‚å†åŠ ä¸Šlisten_spawnåQŒä¿è¯äº†ä¸€ä¸ªè¿žæŽ¥CPU处ç†çš„完全本地化ã€?/p>
enable_receive_flow是一个布ž®”型选项:
注æ„事项åQ?/p>
以上åQŒç¿»è¯‘完毕ã€?/em>
fastsocketçš„å†…æ ¸æ¨¡å—相对èµ\径䨓fastsocket/module/åQŒé™¤äº†README.md外,ž®±æ˜¯ä¸¤ä¸ªè½¯è¿žæŽ¥æ–‡ä»¶äº†åQ?/p>
æ¢ç§è¯´æ³•åQŒfastsocketå†…æ ¸æ¨¡å—真æ£è·¯å¾„ä¸?code>fastsocket/kernel/net/fastsocketåQŒå…·ä½“æ–‡ä»¶åˆ—è¡¨äØ“åQ?/p>
fastsocket_api.cå®žçŽ°å†…æ ¸æ¨¡å—æŽ¥å£åQŒåœ¨æºç é‡Œé¢æ³¨å†Œäº†å¥½å¤šæ–‡æ¡£æš‚时没有公开的å¯é…ç½®™å¹ç›®åQ?/p>
int enable_fastsocket_debug = 3;
/* Fastsocket feature switches */
int enable_listen_spawn = 2;
int enable_receive_flow_deliver;
int enable_fast_epoll = 1;
int enable_skb_pool;
int enable_rps_framework;
int enable_receive_cpu_selection = 0;
int enable_direct_tcp = 0;
int enable_socket_pool_size = 0;
module_param(enable_fastsocket_debug,int, 0);
module_param(enable_listen_spawn, int, 0);
module_param(enable_receive_flow_deliver, int, 0);
module_param(enable_fast_epoll, int, 0);
module_param(enable_direct_tcp, int, 0);
module_param(enable_skb_pool, int, 0);
module_param(enable_receive_cpu_selection, int, 0);
module_param(enable_socket_pool_size, int, 0);
MODULE_PARM_DESC(enable_fastsocket_debug, " Debug level [Default: 3]" );
MODULE_PARM_DESC(enable_listen_spawn, " Control Listen-Spawn: 0 = Disabled, 1 = Process affinity required, 2 = Autoset process affinity[Default]");
MODULE_PARM_DESC(enable_receive_flow_deliver, " Control Receive-Flow-Deliver: 0 = Disabled[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_fast_epoll, " Control Fast-Epoll: 0 = Disabled, 1 = Enabled[Default]");
MODULE_PARM_DESC(enable_direct_tcp, " Control Direct-TCP: 0 = Disbale[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_skb_pool, " Control Skb-Pool: 0 = Disbale[Default], 1 = Receive skb pool, 2 = Send skb pool, 3 = Both skb pool");
MODULE_PARM_DESC(enable_receive_cpu_selection, " Control RCS: 0 = Disabled[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_socket_pool_size, "Control socket pool size: 0 = Disabled[Default], other are the pool size");
接收用户æ€çš„libfsocket.so通过ioctlä¼ é€’è¿‡æ¥çš„æ•°æ®åQŒæ ¹æ®å‘½ä»¤è¿›è¡Œæ•°æ®åˆ†å‘:
static long fastsocket_ioctl(struct file *filp, unsigned int cmd, unsigned long __user u_arg)
{
struct fsocket_ioctl_arg k_arg;
if (copy_from_user(&k_arg, (struct fsocket_ioctl_arg *)u_arg, sizeof(k_arg))) {
EPRINTK_LIMIT(ERR, "copy ioctl parameter from user space to kernel failed\n");
return -EFAULT;
}
switch (cmd) {
case FSOCKET_IOC_SOCKET:
return fastsocket_socket(&k_arg);
case FSOCKET_IOC_LISTEN:
return fastsocket_listen(&k_arg);
case FSOCKET_IOC_SPAWN_LISTEN:
return fastsocket_spawn_listen(&k_arg);
case FSOCKET_IOC_ACCEPT:
return fastsocket_accept(&k_arg);
case FSOCKET_IOC_CLOSE:
return fastsocket_close(&k_arg);
case FSOCKET_IOC_SHUTDOWN_LISTEN:
return fastsocket_shutdown_listen(&k_arg);
//case FSOCKET_IOC_EPOLL_CTL:
// return fastsocket_epoll_ctl((struct fsocket_ioctl_arg *)arg);
default:
EPRINTK_LIMIT(ERR, "ioctl [%d] operation not support\n", cmd);
break;
}
return -EINVAL;
}
fastsocket/library/libsocket.h
头文件定义的FSOCKET_IOC_*
æ“作状æ€ç ž®Þpƒ½å¤Ÿä¸€ä¸€å¯¹åº”的上ã€?ioctl
ä¼ è¾“æ•°æ®ä»Žç”¨æˆäh€?>å†…æ ¸æ€ï¼Œéœ€è¦ç»˜q‡ä¸€‹Æ¡æ‹·è´è¿‡½E‹ï¼ˆcopy_from_user
åQ‰ï¼Œç„¶åŽæ ÒŽ®cmd命ä×o˜q›è¡ŒåŠŸèƒ½è·¯ç”±ã€?/p>
通过指定的设备通é“/dev/fastsocket˜q›è¡Œäº¤äº’åQ?/p>
/dev/fastsocket
讑֤‡èŽ·å¾—æ–‡äšg奿Ÿ„åQŒå¼€å§?code>ioctlæ•°æ®ä¼ é€?½Ž€å•梳ç†äº†fastsocketå†…æ ¸æ¨¡å—åQŒä½†ä¸€æ ähœ‰å¾ˆå¤šçš„点没有涉åŠåQŒåŽé¢å¯èƒ½ä¼šåœ¨Fastsocketå†…æ ¸½‹‡ä¸å†æ¬¡æ¢³ç†ä¸€ä¸‹ã€?/p>
å‰é¢¾~–è¯‘å®‰è£…å¥½äº†åŒ…å«æœ‰fastsocketçš„å†…æ ¸æ¨¡å—,以åŠfastsocket的动æ€é“¾æŽ¥åº“libfsocket.soåQŒä¸‹é¢å…¶å®žå°±å¯ä»¥è®„¡½®¾|‘å¡äº†ã€?/p>
下é¢ä¸ÞZ¸€äº›åè¯è§£é‡Šï¼Œä¸Šä¸‹æ–‡ä¸éœ€è¦ä‹É用到åQ?/p>
本文¾|‘å¡è®„¡½®½W”记内容åQŒå¤§éƒ¨åˆ†æ¥è‡ªäºŽfastsocketæºç 相对路径fastsocket/scripts/
åQ›è€è§„矩,先翻译ã€?/p>
nic.sh
脚本负责¾|‘å¡é…置以尽å¯èƒ½çš„æœ€å¤§åŒ–å—益于fastsocket带æ¥çš„é—®é¢˜ã€‚ç»™å®šä¸€ä¸ªç½‘å¡æŽ¥å£ï¼Œ 它调整接å£çš„å„ç§ç‰ÒŽ€§ä»¥åŠä¸€äº›ç³»¾lŸé…¾|®ã€?/p>
æ¯ä¸ª¾|‘塼‹¬äšg队列åŠå…¶å…Œ™”䏿–¾l‘定åˆîC¸åŒçš„CPUæ ¸å¿ƒã€‚è‹¥¼‹¬äšg队列数大于CPUæ ¸æ•°åQŒé˜Ÿåˆ—需è¦é…¾|®æˆå¾ªçޝround-robinæ–¹å¼åQ?IrqbalanceæœåŠ¡éœ€è¦è¢«¼›ç”¨ä»¥é˜²å…¶æ›´æ”šw…¾|®ã€?/p>
nic.sh
脚本通过ethtool
命ä×o讄¡½®æ¯ç§’䏿–æ•îC¸Šé™ï¼Œé˜²æ¢ä¸æ–风暴。两个Rx䏿–间隔讄¡½®æˆè‡³ž®?33usåQŒçº¦3000ä¸ªä¸æ–毿U’ã€?/p>
为æ¯ä¸ªCPUæ ¸å¿ƒä¸Žä¸åŒçš„¾|‘塼‹¬äšgé˜Ÿåˆ—ä¹‹é—´å»ºç«‹ä¸€ä¸€æ˜ å°„å¯¹åº”å…³ç³»åQŒè¿™æ ·CPUæ ¸å¿ƒž®±å¯ä»¥å¾ˆå‡åŒ€åœ°å¤„ç†ç½‘¾lœæ•°æ®åŒ…。当¾|‘塼‹¬äšg队列ž®äºŽCPUå†…æ ¸æ•ŽÍ¼Œnic.sh
脚本利用RPS (Receive Packet Steering)软äšgæ–¹å¼òqŒ™¡¡˜q›å…¥‹¹é‡è´Ÿè²åQŒè¿™æ ·CPU和硬仉™˜Ÿåˆ—ä¸å˜åœ¨å¯¹åº”关系。RPS机制å¯ä»¥è®©è¿›å…¥çš„æ•°æ®åŒ…自由分å‘到ä»ÖM¸€CPUæ æ€¸Šã€?/p>
¾|‘å¡æŽ¥æ”¶äº§ç”Ÿçš„ä¸æ–å¯ä»¥å‡è¡¡åˆ†é…到对应CPU上ã€?/p>
XPS (Transmit Packet Steering) 建立CPUå†…æ ¸å’ŒTxå‘é€é˜Ÿåˆ—æ˜ ž®„对应关¾p»ï¼ŒæŽŒæŽ§å‡ºç«™æ•°æ®åŒ…。系¾lŸæœ‰N个CPUæ ¸å¿ƒåQŒè„šæœ¬ä¼šè®„¡½®XPS臛_°‘å˜åœ¨N个Txé˜Ÿåˆ—åœ¨ç½‘å¡æŽ¥å£ä¸ŠåQŒè¿™æ ·å°±å¯ä»¥å»ºç«‹CPUå†…æ ¸å’ŒTx队列1å¯?çš„æ˜ ž®„å…³¾p…R€?/p>
¾|‘å¡ä¼ 逿•°æ®äñ”ç”Ÿçš„ä¸æ–ä¸€æ ·å¯ä»¥å‡å¾ˆåˆ†é…到CPU上,é¿å…å•个CPUæ ¸å¿ƒ˜q‡äºŽ¾Jå¿™ã€?/p>
压测æ—Óž¼Œé˜²ç«å¢™iptables的规则会å 用更多的CPU周期åQŒæœ‰æ‰€é™ä½Ž¾|‘ç»œå †æ ˆæ€§èƒ½ã€‚å› æ?code>nic.sh脚本若检‹¹‹åˆ°iptablesåŽå°˜q行ä¸ä¼šç›´æŽ¥è¾“出报è¦ä¿¡æ¯åQŒæ½Cºå…³é—之ã€?/p>
nic.sh
脚本脚本分枾l过验è¯å¥½ç”¨çš„Intelå’Œåšé€šç³»åˆ—åƒå…†å’Œä¸‡å…†¾|‘å¡åˆ—表åQ?/p>
# igb
"Intel Corporation 82576 Gigabit Network Connection (rev 01)"
"Intel Corporation I350 Gigabit Network Connection (rev 01)"
# ixgbe
"Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)"
"Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)"
# tg3
"Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe"
"Broadcom Corporation NetXtreme BCM5761 Gigabit Ethernet PCIe (rev 10)"
# bnx2
"Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)"
"Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)"
è‹¥å½“å‰æœåŠ¡å™¨æ²¡æœ‰ä»¥ä¸Š¾|‘å¡åQŒä¼šè¦å‘Šä¸€ä¸‹ï¼Œæ— ç¢ã€?/p>
˜q™é‡ŒæŠŠä¸€äº›å¸¸è§„性的CPUã€ç½‘å¡é©±åЍã€ç½‘¾lœé˜Ÿåˆ—情冉|£€æŸ¥å•独抽å–出æ¥ï¼Œé‡æ¸©å¥½å¤šå·²ç»é—忘的命令,有改å˜ï¼Œ˜q™æ ·å†™è¾ƒ½Ž€å•嘛åQŒä¾¿äºŽä»¥åŽä‹É用:
egrep -c eth0 /proc/interrupts
脚本先是获å–CPUã€ç½‘å¡ç‰ä¿¡æ¯åQŒæŽ¥ç€è®„¡½®ä¸æ–å•使U’内åžåé‡ï¼š ethtool -C eth0 rx-usecs 333 > /dev/null 2>&1
å¯ç”¨XPSåQŒå……分借助¾|‘å¡å‘é€é˜Ÿåˆ—,æå‡¾|‘å¡å‘é€åžåé‡åQŒæ˜¯æœ‰æ¡ä»‰™™åˆ¶çš„åQŒå‘é€é˜Ÿåˆ—æ•°è¦å¤§äºŽCPUæ ¸æ•°åQ?/p>
if [[ $TX_QUEUES -ge $CORES ]]; then
for i in $(seq 0 $((CORES-1))); do
cpuid_to_mask $((i%CORES)) | xargs -i echo {} > /sys/class/net/$IFACE/queues/tx-$i/xps_cpus
done
info_msg " XPS enabled"
fi
接ç€åˆ¤æ–是å¦å¯ä»¥å¯ç”¨PRSåQŒçœåŽÀL‰‹åŠ¨è®¾¾|®çš„éºÈƒ¦åQŒä½†å¯ç”¨RPSå‰ææ˜¯CPUæ ¸æ•°ä¸Žç½‘å¡ç¡¬ä»‰™˜Ÿåˆ—ä¸ç›¸ç‰åQ?/p>
if [[ ! $HW_QUEUES == $CORES ]]; then
for i in /sys/class/net/$IFACE/queues/rx-*; do
printf "%x\n" $((2**CORES-1)) | xargs -i echo {} > $i/rps_cpus;
done
info_msg " RPS enabled"
else
for i in /sys/class/net/$IFACE/queues/rx-*; do
echo 0 > $i/rps_cpus;
done
info_msg " RPS disabled"
fi
若没有ä‹É用fastsocketåQŒå•¾U¯å€ŸåŠ©äºŽRPSåQŒä¼šå¸¦æ¥å¤„ç†ä¸æ–çš„CPU和处ç†å½“剿•°æ®åŒ…çš„CPU䏿˜¯åŒä¸€ä¸ªï¼Œè‡ªç„¶ä¼šé€ æˆCPU Cache MissåQˆCPU¾~“å˜ä¸¢å¤±åQ‰ï¼Œé€ 戞®‘许的性能影å“åQŒäؓ了é¿å…è¿™¿U情况,äºÞZ»¬ä¼šä¾èµ–于RFSåQˆReceive Flow SteeringåQ‰ã€?/p>
使用了fastsocketåŽï¼Œž®×ƒ¸ç”¨è¿™ä¹ˆéº»çƒ¦äº†ã€?/p>
irqbalanceå’Œfastsocket有冲½H,会强制ç¦ç”¨ï¼š
if ps aux | grep irqbalance | grep -v grep; then
info_msg "Disable irqbalance..."
# XXX Do we have a more moderate way to do this?
killall irqbalance > /dev/null 2>&1
fi
脚本也包å«äº†è®„¡½®ä¸æ–å’ŒCPU的亲和性:
i=0
intr_list $IFACE $DRIVER | while read irq; do
cpuid_to_mask $((i%CORES)) | xargs -i echo {} > /proc/irq/$irq/smp_affinity
i=$((i+1))
done
è‹¥iptablesæœåŠ¡å˜åœ¨åQŒä¼šå‹å–„廸™®®¼›ç”¨ä¼šå¥½ä¸€äº›ï¼Œæ¯•ç«Ÿä¼šå¸¦æ¥æ€§èƒ½æŸè€—ã€‚æ–‡ä»¶æ‰“å¼€å¥æŸ„ä¸å¤§äº?024åQŒè„šæœ¬åŒæ ·ä¼šæé†’åQŒæ€Žä¹ˆè®„¡½®æ–‡äšgæ‰“å¼€å¥æŸ„åQŒå¯ä»¥å‚考以å‰åšæ–‡ã€?/p>
针对ä¸ä‹É用fastsocketçš„æœåС噍åQŒå½“剿¯”较æµè¡Œçš„针对¾|‘å¡çš„网¾lœå †æ ˆæ€§èƒ½æ‰©å±•ã€ä¼˜åŒ–措施,一般会使用到RSSã€RPSã€RFSã€XFS½{‰æ–¹å¼ï¼Œä»¥ä¾¿å……分利用CPUå¤šæ ¸å’Œç¡¬ä»¶ç½‘å¡ç‰è‡ªèín性能åQŒè¾¾åˆ°åƈè¡?òq¶å‘处ç†çš„ç›®çš„ã€‚ä¸‹é¢æ€È»“一个表æ û|¼Œå¯ä»¥å‡‘åˆçœ‹ä¸€ä¸‹ã€?/p>
RSS (Receive Side Scaling) |
RPS (Receive Packet Steering) |
RFS (Receive Flow Steering) |
Accelerated RFS (Accelerated Receive Flow Steering) |
XPS (Transmit Packet Steering) | |
---|---|---|---|---|---|
解决问题 | ¾|‘å¡å’Œé©±åŠ¨æ”¯æŒ?/td> | 软äšgæ–¹å¼å®žçްRSS | æ•°æ®åŒ…äñ”ç”Ÿçš„ä¸æ–和应用处ç†åœ¨åŒä¸€ä¸ªCPUä¸?/td> | åŸÞZºŽRFS¼‹¬äšgåŠ é€Ÿçš„è´Ÿè²òqŒ™¡¡æœºåˆ¶ | æ™ø™ƒ½é€‰æ‹©¾|‘å¡å¤šé˜Ÿåˆ—的队列快速å‘åŒ?/td> |
å†…æ ¸æ”¯æŒ | 2.6.36开始引入,需è¦ç¡¬ä»¶æ”¯æŒ?/td> | 2.6.35 | 2.6.35 | 2.6.35 | 2.6.38 |
廸™®® | ¾|‘å¡é˜Ÿåˆ—æ•°å’Œç‰©ç†æ ¸æ•°ä¸€ç›?/td> | è‡Ïx¤å¤šé˜Ÿåˆ—çš„¾|‘å¡è‹¥RSSå·²ç»é…置了,则ä¸éœ€è¦RPSäº?/td> | 需è¦rps_sock_flow_entrieså’Œrps_flow_cnt属æ€?/td> | 需è¦ç½‘å¡è®¾å¤‡å’Œé©±åŠ¨éƒ½æ”¯æŒåŠ é€Ÿã€‚åÆˆä¸”è¦æ±‚ntuple˜q‡æ×oå·²ç»é€šè¿‡ethtoolå¯ç”¨ | å•ä¼ è¾“é˜Ÿåˆ—çš„¾|‘塿— 效åQŒè‹¥é˜Ÿåˆ—比CPUž®‘,å…׃ín指定队列的CPU最好是与处ç†ä¼ è¾“ç¡¬ä¸æ–çš„CPUå…׃ín¾~“å˜çš„CPU |
fastsocket | ¾|‘å¡ç‰ÒŽ€?/td> | 改进版RPSåQŒæ€§èƒ½æå‡ | æºç 包å«åQŒæ–‡æ¡£æ²¡æœ‰æ¶‰å?/td> | æ–‡æ¡£æ²¡æœ‰æ¶‰åŠ | è¦æ±‚å‘é€é˜Ÿåˆ—æ•°è¦å¤§äºŽCPUæ ¸æ•° |
ä¼ é€æ–¹å?/td> | ¾|‘å¡æŽ¥æ”¶ | å†…æ ¸æŽ¥æ”¶ | CPUæŽ¥æ”¶å¤„ç† | åŠ é€ŸåÆˆæŽ¥æ”¶ | ¾|‘å¡å‘逿•°æ?/td> |
更具体优化措施,å¯ä»¥å‚考文档:Scaling in the Linux Networking Stackã€?/p>
å¦ï¼Œè‹¥ç½‘塿”¯æŒ?code>Flow Director Filtersç‰ÒŽ€§ï¼ˆ˜q™é‡Œæœ‰ä¸€ä¸ªéžå¸¸æœ‰‘£çš„动画介ç»åQ?a >Intel® Ethernet Flow DirectoråQŒå€¼å¾—一看)åQŒé‚£ä¹ˆå¯ä»¥ç»“åˆFastsocketä¸€èµ·åŠ é€Ÿã€‚æ¯”å¦‚ï¼Œåœ¨å…¶æ‰€ä½œRedis长连接测试ä¸åQŒå¯ç”¨Flow-Directorç‰ÒŽ€§è¦æ¯”ç¦ç”¨å¯ä»¥å¸¦æ?5%的性能æå‡ã€?/p>
自然软硬¾l“åˆåQŒå¯ä»¥åšçš„æ›´å¥½ä¸€äº›å˜›ã€?/p>
å»¶äŽ×阅读åQ?a >多队列网å¡ç®€ä»?/a>
以上记录了å¦ä¹ fastsocket的网å¡è®¾¾|®è„šæœ¬æ–¹é¢ç¬”è®°ã€?/p>
ä¸è¿‡å‘¢ï¼Œnic.sh
脚本åQŒå€¼å¾—æ”¶è—åQŒæ— è®ÞZ‹Éä¸ä‹É用fastsocketåQŒå¯¹¾U¿ä¸ŠæœåŠ¡å™¨ç½‘å¡è°ƒä¼˜éƒ½æ˜¯ä¸é”™é€‰æ‹©å“¦ã€?/p>
˜q行环境为Centos 6.5¾pÈ»ŸåQŒé»˜è®¤å†…æ æ€Ø“2.6.32-431.el6.x86_64åQŒä¸‹é¢æ‰€æœ‰ç¼–译安装æ“作是ä»?code>root用户æƒé™˜q›è¡Œæ“作ã€?/p>
½W¬ä¸€æ¥éœ€è¦ä¸‹è½½ä»£ç ,当然˜q™æ˜¯åºŸè¯äº†ï¼Œä¸‹è²åˆ?opt目录下:
git clone https://github.com/fastos/fastsocket.git
下è²ä¹‹åŽåQŒéœ€è¦è¿›å…¥å…¶ç›®å½•ä¸ï¼š
cd fastsocket/kernel
å› äØ“æ˜¯æ¶‰åŠåˆ°å†…æ ¸å˜›ï¼Œ¾~–译之å‰éœ€è¦åšä¸€äº›å‚数选项é…ç½®åQŒä‹Éç”?code>make configä¼šç¯æÖMhçš„ï¼Œå¥½å‡ åƒä¸ªé€‰é¡¹å‚数需è¦ä½ 一一é…ç½®åQŒå¤§éƒ¨åˆ†æ—‰™—´åQŒé»˜è®¤é…¾|®å°±æŒºå¥½çš„:
make defconfig
ç„¶åŽå˜›ï¼Œ¾~–è¯‘å†…æ ¸çš„èŠ‚å¥ï¼š
make
å†…æ ¸¾~–译相当耗费旉™—´åQŒè‡³ž®?0分钟旉™—´ã€‚之åŽç´§æŽ¥ç€æ˜¯ç¼–è¯‘æ‰€éœ€çš„å†…æ ¸æ¨¡å—,fastsocket模å—åQ?/p>
make modules_install
¾~–译完æˆä¹‹åŽåQŒæœ€åŽä¸€æ¡è¾“出,会看刎ͼš
DEPMOD 2.6.32-431.17.1.el6.FASTSOCKET
fastsocketå†…æ ¸æ¨¡å—¾~–译好之åŽï¼Œéœ€è¦å®‰è£…å†…æ ¸ï¼š
make install
上é¢å‘½ä×o其实执行shell脚本˜q›è¡Œå®‰è£…åQ?/p>
sh /opt/fastsocket/kernel/arch/x86/boot/install.sh 2.6.32-431.17.1.el6.FASTSOCKET arch/x86/boot/bzImage \ System.map "/boot"
基本上,fastsocketå†…æ ¸æ¨¡å—å·²ç»æž„å¾å®‰è£…完毕了,但需è¦å‘ŠçŸ¥Linux¾pÈ»Ÿåœ¨ä¸‹‹Æ¡å¯åŠ¨çš„æ—¶å€™åˆ‡æ¢åˆ°æ–°ç¼–译的ã€åŒ…嫿œ‰fastsocket模å—çš„å†…æ ¸ã€?/p>
˜q™éƒ¨åˆ†éœ€è¦åœ¨/etc/grup.confä¸é…¾|®ï¼ŒçŽ°åœ¨çœ‹ä¸€ä¸‹å…¶æ–‡äšg内容åQ?/p>
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.32-431.17.1.el6.FASTSOCKET)
root (hd0,0)
kernel /vmlinuz-2.6.32-431.17.1.el6.FASTSOCKET ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS rd_NO_MD rd_LVM_LV=vg_centos6/lv_swap crashkernel=auto LANG=zh_CN.UTF-8 rd_LVM_LV=vg_centos6/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
initrd /initramfs-2.6.32-431.17.1.el6.FASTSOCKET.img
title CentOS (2.6.32-431.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-431.el6.x86_64 ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS rd_NO_MD rd_LVM_LV=vg_centos6/lv_swap crashkernel=auto LANG=zh_CN.UTF-8 rd_LVM_LV=vg_centos6/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
initrd /initramfs-2.6.32-431.el6.x86_64.img
defautl=1
åQŒè¡¨½Cºç›®å‰ç³»¾lŸé€‰æ‹©çš„ä»¥åŽŸå…ˆå†…æ ¸ä½œä½œä¸ºå¯åЍ项åQŒåŽŸå…ˆä½äºŽç¬¬äºŒä¸ªroot (hd0,0)
åŽé¢åQŒéœ€è¦åˆ‡æ¢åˆ°æ–°çš„å†…æ ¸ä¸‹é¢åQŒéœ€è¦ä¿®æ”?code>default=0åQŒä¿å˜åŽåQŒrebooté‡å¯¾pÈ»ŸåQŒä‹É之生效ã€?/p>
¾pÈ»Ÿé‡å¯åŽï¼Œéœ€è¦åŠ è½½fastsocket模å—到系¾lŸè¿è¡Œä¸åŽ»ï¼Œä¸‹é¢ä»¥é»˜è®¤é€‰é¡¹å‚æ•°æ–¹å¼åŠ è²åQ?/p>
modprobe fastsocket
åŠ è²ä¹‹åŽåQŒåˆ—出当å‰ç³»¾lŸæ‰€åŠ è²æ¨¡å—列表åQŒæ£€æŸ¥æ˜¯å¦æˆåŠ?/p>
lsmod | grep fastsocket
若能看到¾cÖM¼¼è¾“出信æ¯åQŒè¡¨½CºOKåQ?/p>
fastsocket 39766 0
上é¢å†…æ ¸æ¨¡å—安装好之åŽï¼Œå¯ä»¥æž„å¾fastsocket的动æ€é“¾æŽ¥åº“æ–‡äšg了:
cd /opt/fastsocket/library/
make
å¯èƒ½ä¼šæ”¶åˆîC¸€äº›è¦å‘Šä¿¡æ¯ï¼Œæ— ç¢åQ?/p>
gcc -g -shared -ldl -fPIC libsocket.c -o libfsocket.so -Wall
libsocket.c: 在函æ•?#8216;fastsocket_init’ä¸?
libsocket.c:59: è¦å‘ŠåQšéšå¼å£°æ˜Žå‡½æ•?#8216;open’
libsocket.c: 在函æ•?#8216;fastsocket_expand_fdset’ä¸?
libsocket.c:109: è¦å‘ŠåQšéšå¼å£°æ˜Žå‡½æ•?#8216;ioctl’
libsocket.c: 在函æ•?#8216;accept’ä¸?
libsocket.c:186: è¦å‘ŠåQšå¯¹æŒ‡é’ˆèµ‹å€¼æ—¶ç›®æ ‡ä¸ŽæŒ‡é’ˆç¬¦å·ä¸ä¸€è‡?
libsocket.c: 在函æ•?#8216;accept4’ä¸?
libsocket.c:214: è¦å‘ŠåQšå¯¹æŒ‡é’ˆèµ‹å€¼æ—¶ç›®æ ‡ä¸ŽæŒ‡é’ˆç¬¦å·ä¸ä¸€è‡?
最åŽï¼Œå¯ä»¥çœ‹åˆ°gcc¾~–译之åŽç”Ÿæˆçš?code>libfsocket.so库文ä»Óž¼Œè¯´æ˜Ž¾~–译æˆåŠŸã€?/p>
OKåQŒç¼–译安装到æ¤ç»“æŸï¼ŒåŽé¢ž®±æ˜¯å¦‚何使用fastsocket的示范程åºè¿›è¡Œæµ‹è¯•了ã€?/p>
上篇介ç»äº†å¦‚何构建安装fastsocketå†…æ ¸æ¨¡å—åQŒä¸‹é¢å°†åŸÞZºŽfastsocket/demo/README.md
æ–‡äšg¾˜»è¯‘æ•´ç†è€Œæˆã€?/p>
嗯,下题q›å…¥¾˜»è¯‘½‹‡ã€?/p>
½Cø™Œƒä¸ÞZ¸€ä¸ªç®€å•TCP ServeræœåŠ¡å™¨ç¨‹åºï¼Œç”¨äºŽåŸºå‡†‹¹‹è¯•和剖æžLiunxå†…æ ¸¾|‘ç»œå †æ ˆæ€§èƒ½è¡¨çŽ°åQŒå½“ç„¶ä¹Ÿæ˜¯äØ“äº†æ¼”½CºFastsocket坿‰©å±•和其性能改进ã€?/p>
½Cø™Œƒåº”用åŸÞZºŽepoll模型和éžé˜Õd¡žæ€§IOåQŒå¤„ç†ç½‘¾lœè¿žæŽ¥ï¼Œä½†åªæœ‰åœ¨å¤šæ ¸çš„æ¨¡å¼ä¸‹æ‰èƒ½å¤Ÿå·¥ä½œå¾—很好åQšç¨‹åºçš„æ¯ä¸€ä¸ªè¿›½E‹è¢«¾l‘定到CPUçš„ä¸åŒæ ¸åQŒè“v始于CPU core 0åQŒå„自独立处ç†å®¢æˆïL«¯˜qžæŽ¥è¯äh±‚ã€?/p>
½Cø™Œƒ½E‹åºå…ähœ‰ä¸¤ç§å·¥ä½œæ¨¡å¼åQ?/p>
˜q™æ˜¯ä¸€ä¸ªç®€å•傻瓜åÅžå¼çš„Tcp ServeråQŒä»…仅用于测试ä‹Éç”¨ï¼Œä½¿ç”¨æ—¶è¦æ±‚客æˆïL«¯å’ŒæœåŠ¡å™¨ç«¯åªèƒ½å¤Ÿæºå¸¦ä¸€ä¸ªpacket包大ž®çš„æ•°æ®åQŒå¦åˆ™ç¨‹åºä¼šå¤„ç†ä¸äº†ã€?/p>
以䏋颿–¹å¼è¿›è¡Œæž„建:
cd demo && make
最½Ž€å•æ–¹å¼ä»¥é»˜è®¤é…ç½®æ— å‚æ•°åÅžå¼è¿è¡Œï¼š
./server
傿•°å¦‚下:
在è¿è¡Œä¹‹å‰ï¼Œéœ€è¦æ³¨æ„两点:
script
目录 æœåŠ¡å™¨æ¨¡å¼è‡³ž®‘需è¦ä¸¤åîC¸»æœºï¼š
讑֮šæ¯å°ä¸ÀLœºCPU 12æ ¸ï¼Œ¾|‘络大概讄¡½®å¦‚下åQ?/p>
+--------------------+ +--------------------+
| Host A | | Host B |
| | | |
| 10.0.0.1/24 |-----| 10.0.0.2/24 |
| | | |
+--------------------+ +--------------------+
䏋颿˜¯è¿è¡Œä¸¤åîC¸»æœºçš„æ¥éª¤åQ?/p>
ä¸ÀLœºBåQ?/p>
WebæœåŠ¡å™¨æ¨¡å¼å•独è¿è¡Œï¼Œå¼€å?2个工作进½E‹ï¼Œå’ŒCPUæ ¸å¿ƒæ•îC¸€è‡ß_¼š
./server -w 12 -a 10.0.0.2:80
或者测试借助于Fastsocket所带æ¥çš„æ€§èƒ½
LD_PRELOAD=../library/libfsocket.so ./server -w 12 -a 10.0.0.2:80
ä¸ÀLœºAåQ?/p>
ab -n 1000000 -c 100 http://10.0.0.2:80/
N=12; for i in $(seq 1 $N); do ab -n 1000000 -c 100 http://10.0.0.2:80/ > /dev/null 2>&1; done
ä»£ç†æ¨¡å¼ä¸‹ï¼Œéœ€è¦ä¸‰å°æœºå™¨ï¼š
讑֮šæ¯å°æœºå™¨CPUå†…æ ¸æ•?2åQŒç½‘¾lœç»“构如下:
+--------------------+ +--------------------+ +--------------------+
| Host A | | Host B | | Host C |
| | | | | |
| 10.0.0.1/24 | | 10.0.0.2/24 | | 10.0.0.3/24 |
+---------+----------+ +---------+----------+ +----------+---------+
| | |
+---------+--------------------------+---------------------------+---------+
| switch |
+--------------------------------------------------------------------------+
下é¢ä¸ºå…·ä½“çš„˜q行æ¥éª¤åQ?/p>
ä¸ÀLœºBåQ?/p>
./server -w 12 -a 10.0.0.2:80 -x 10.0.0.3:80
LD_PRELOAD=../library/libsocket.so ./server -w 12 -a 10.0.0.2:80 -x 10.0.0.3:80
ä¸ÀLœºCåQ?/p>
./server -w 12 -a 10.0.0.3:80
ä¸ÀLœºAåQ?/p>
N=12; for i in $(seq 1 $N); do ab -n 1000000 -c 100 http://10.0.0.2:80/ > /dev/null 2>&1; done
以上¾˜»è¯‘完毕åQŒä¸‹é¢å°†æ˜¯æ ¹æ®ä¸Šé¢å†…容进行动手测试æ˜q°å§ã€?/p>
‹‚€æŸ¥ä¸€ä¸‹åŒ…å«Apache ab命ä×oçš„èÊY件包åQ?/p>
yum provides /usr/bin/ab
å¯ä»¥çœ‹åˆ°¾cÖM¼¼äºŽå¦‚䏋嗿 øP¼š
httpd-tools-2.2.15-39.el6.centos.x86_64 : Tools for use with the Apache HTTP Server
安装它就å¯ä»¥äº?/p>
yum install httpd-tools
Windows 7专业版跑VMware Workstation 10.04虚拟机,两个Centos 6.5¾pÈ»ŸåQŒé…¾|®ä¸€è‡ß_¼Œ2G内å˜åQ?个CPU逻辑处ç†å™¨æ ¸å¿ƒã€?/p>
客户端安装Apache ab命ä×o‹¹‹è¯•åQŒè·‘8个实例: for i in $(seq 1 8); do ab -n 10000 -c 100 http://192.168.192.16:80/ > /dev/null 2>&1; done
æœåŠ¡å™¨ç«¯åQŒåˆ†åˆ«è®°å½•:/opt/fast/server -w 8
LD_PRELOAD=../library/libfsocket.so ./server -w 8
两组数æ®å¯Òޝ”åQ?/p>
˜qè¡Œæ–¹å¼ | å¤„ç†æ¶ˆè€—æ—¶é—?¿U? | å¤„ç†æ€ÀL•° | òq›_‡æ¯ç§’å¤„ç†æ•?/th> | 最大å€?/th> |
---|---|---|---|---|
å•独˜q行 | 34s | 80270 | 2361 | 2674 |
åŠ è²fasocket | 28s | 80399 | 2871 | 2964 |
‹¹‹è¯•æ–¹å¼å¦‚上åQŒä¸‰å°æœåС噍åQˆæµ‹è¯•端+代ç†ç«?æœåŠ¡å™¨ç«¯åQ‰é…¾|®ä¸€æ —÷€‚第一‹Æ¡ä»£ç†å•独å¯åŠ¨ï¼Œ½W¬äºŒ‹Æ¡ä»£ç†é¢„åŠ è²fastsocketæ–¹å¼ã€?/p>
˜qè¡Œæ–¹å¼ | å¤„ç†æ¶ˆè€—æ—¶é—?¿U? | å¤„ç†æ€ÀL•° | òq›_‡æ¯ç§’å¤„ç†æ•?/th> | 最大å€?/th> |
---|---|---|---|---|
½W¬ä¸€‹Æ¡æµ‹è¯•åŽç«?/td> | 44s | 80189 | 1822 | 2150 |
½W¬ä¸€‹Æ¡æµ‹è¯•代ç?/td> | 44s | 80189 | 1822 | 2152 |
½W¬äºŒ‹Æ¡æµ‹è¯•åŽç«?/td> | 42s | 80051 | 1906 | 2188 |
½W¬äºŒ‹Æ¡æµ‹è¯•代ç?/td> | 42s | 80051 | 1906 | 2167 |
备注åQšè™šæ‹Ÿæœºä¸Šæ•°æ®ï¼Œä¸ä»£è¡¨çœŸå®žæœåŠ¡å™¨ä¸Šæ•°æ®ï¼Œä»…ä¾›å‚考ã€?/p>
虽然åŸÞZºŽè™šæ‹Ÿæœºï¼Œ‹¹‹è¯•环境å—é™åQŒä½†ä¸€æ ·å¯ä»¥çœ‹åˆ°åŸºäºŽfastsocketæœåŠ¡å™¨æ¨¡åž‹ï¼Œå¤„ç†æ€§èƒ½æœ‰æ‰€æå‡åQšæ€ÖM½“å¤„ç†æ—‰™—´åQŒæ¯¿U’åã^å‡å¤„ç†æ•°åQŒä»¥åŠå¤„ç†ä¸Šé™ç‰ã€?/p>
动æ€é“¾æŽ¥é¢„å…ˆåŠ è½½LD_PRELOAD虽是利器åQŒä½†ä¸æ˜¯ä¸‡èƒ½è¯ï¼ŒLD_PRELOADé‡åˆ°ä¸‹é¢æƒ…况会失效:
æƒ…å†µå¾ˆå¤æ‚,ž®å¿ƒä¸ÞZ¸Šã€?/p>
å¦ä¹ òq¶æµ‹è¯•了fastsocketçš„æºç 示范部分,å‰åŽå¯Òޝ”å¯ä»¥çœ‹åˆ°fastsocket带æ¥äº†å¤„ç†æ€§èƒ½çš„æå‡ã€?/p>
¾|‘络延迟是客观å˜åœ¨çš„åQŒä½†¾|‘络游æˆè¡Œä¸šå·²ç»¿U¯ç¯äº†å¤§é‡ä¼˜è´¨ç»éªŒï¼Œä½¿ç”¨ä¸€äº›ç–ç•¥ã€æŠ€æœ¯æ‰‹ŒDµåœ¨å®¢æˆ·ç«¯æ¶ˆé™?éšè—掉åšg˜qŸå¸¦æ¥çš„ä¸ä¾¿åQŒä»¥ž®½å¯èƒ½çš„æŽ©ç›–实际å˜åœ¨çš„åšgæ—Óž¼ŒåŒæ—¶å®žçŽ°å®žæ—¶æ¸²æŸ“åQŒå°†ç”¨æˆ·å¸¦å…¥å¿«é€Ÿçš„交互å¼å®žæ—¶æ¸¸æˆä¸åQŒä½“验完¾ŸŽçš„互动å¨×ƒ¹ä¸ã€?/p>
˜q™æ ·å¤„熾l“æžœåQŒç¨é«˜åšg˜qŸçš„玩家也ä¸ä¼šå› 为网¾lœä¸æ˜¯é‚£ä¹ˆå¥½åQŒä¹Ÿèƒ½å¤Ÿå¾ˆå’Œè°çš„与其它网¾lœå‚å·®ä¸åŠçީ家䏀èµäh¸¸æˆä¸ã€?/p>
虽然延时军_®šäº†å®žæ—¶æ¸¸æˆçš„æœ€ä½Žå应时é—ß_¼Œä½†æœ€é‡è¦çš„æ˜¯å®¢æˆ·ç«¯çœ‹èµäh¥è¦æµç•…。第一人称设计游æˆåQˆFPSåQ‰å¯å·§å¦™çš„化解与规é¿åQŒæœ€¾lˆåœ¨é€‚åˆæ™®é用户¾|‘络环境ä¸?200ms)åQŒå®žçŽ°å®žæ—¶å¿«é€Ÿäº’åŠ¨æ¸¸æˆã€?/p>
嗯,下颞®±æ˜¯˜q‘期脑补¾l“æžœã€?/p>
æ—©å…ˆ¾|‘游使用P2P¾|‘ç»œæ‹“æ‰‘åœ¨çŽ©å®¶ä¹‹é—´è¿›è¡Œäº¤æ¢æ•°æ®é€šä¿¡ã€‚但P2P模型引è“v的高延迟在FPS游æˆä¸æ— 法被很好掩盖åQŒæ‰€æœ‰çŽ©å®¶çš„å»¶è¿Ÿå–决于当å‰çީ家ä¸å»¶è¿Ÿæœ€çƒ‚的那个。好比木桶ç†è®ºï¼Œä½Žåšg˜qŸç½‘¾lœå¥½çš„玩家会被高延迟å网¾lœçš„玩家拖ç¯ã€‚最¾lˆç»“果导è‡ß_¼Œæ‰€æœ‰çީ安™ƒ½ä¸å¤ªå¼€å¿ƒäº†ã€‚但在局域网环境下,ä¸ä¼šæ„Ÿè§‰åˆ°åšg˜qŸå¸¦æ¥çš„问题。å¦åQŒæ¸¸æˆé€»è¾‘大部分都集ä¸åœ¨å®¢æˆïL«¯äº†ï¼Œå¾ˆéš¾é¿å…ä½œå¼Šè¡ŒäØ“ã€?/p>
C/S¾l“æž„¾|‘游åQ?/p>
æœåС噍å¯ä»¥å…许æŸäº›æƒ…å†µä¸‹å®¢æˆ·ç«¯æœ¬åœ°å³æ—¶æ‰§è¡Œç§»åЍæ“作,˜q™ç§æ–ÒŽ³•å¯ä»¥¿UîCؓ客户端预‹¹‹ã€?/p>
比如游æˆä¸é”®ç›˜æŽ§åˆ¶è§’色行赎ͼŒ˜q™ä¸ªæ—¶å€™å¯ä»¥åœ¨å¾ˆå°çš„æ—¶é—´æ®µåQˆæ—¶é—´å¾ˆçŸï¼Œæ¯”如1-3¿U’)内预‹¹‹ç”¨æˆ¯‚¡ŒåŠ¨è½¨˜q¹ï¼ˆæ–¹å‘+åŠ é€Ÿåº¦åQŒè§’色行走结果)åQŒè¿™éƒ¨åˆ†çš„命令客æˆïL«¯ä¼šå…¨éƒ¨å‘é€åˆ°æœåŠ¡å™¨ç«¯æ ¡éªŒæ£ç¡®ä¸Žå¦åQˆé¿å…瞬间è{¿Uȉ外挂åQ‰ã€‚但客户端预‹¹‹æœ‰æ—¶ä¹Ÿä¸æ˜¯ç™‘Öˆ†ç™‘Ö‡†¼‹®ï¼Œéœ€è¦æœåС噍˜q›è¡Œ¾U æ£åQˆæ‰€è°“æœåС噍ž®±æ˜¯ä¸Šå¸åQŒThe sever is the manåQï¼‰ã€‚çº æ£ç»“æžœå¯èƒ½å°±æ˜¯æ¸¸æˆè§’色行走轨˜q¹å’Œå®¢æˆ·ç«¯é¢„‹¹‹è½¨˜qÒŽœ‰æ‰€åå·®åQŒå®¢æˆïL«¯å¯ä»¥ä½¿ç”¨æ’值方å¼ï¼ˆ¾_—ç•¥æ¥è®²åQŒå°±æ˜¯è§’色在两点之间¿UÕdŠ¨æ¸²æŸ“çš„æ–¹å¼ï¼‰æ¸²æŸ“游æˆè§’色在游æˆä¸–界ä¸çš„ä½¾|®è{¿UÕdã^滑一些,é¿å…游æˆè§’色从一个佾|®çž¬é—´æ‹‰å›žåˆ°å¦ä¸€ä¸ªä½¾|®ï¼Œè®©äh有些莫å其妙ã€?/p>
æ’å€û|¼Œæœ‰ähä¹Ÿç§°ä¹‹äØ“è·¯å¾„è¡¥å¿åQŒéƒ½æ˜¯ä¸€å›žäº‹ã€‚æ’值的æ–ÒŽ³•会涉åŠåˆ°å¾ˆå¤šæ•°å¦å…¬å¼åQŒçº¿æ€§æ’倹{€ä¸‰‹Æ¡çº¿æ€§æ’值ç‰åQŒæ¯”如这½‹‡æ–‡ç« 所讲到çš?a >æ’值那些事ã€?/p>
ž®ç»“åQšå®¢æˆïL«¯é¢„测åQŒæœåŠ¡å™¨ç«¯çº æ£ï¼Œå®¢æˆ·ç«¯é‡‡ç”¨æ’值方å¼å¾®è°ƒã€?/p>
针对交互的一¾Ÿ¤çŽ©å®Óž¼Œ¾|‘络好å层次ä¸é½åQŒæ¸¸æˆçš„一些æ“作效果å¯èƒ½éœ€è¦?#8221;延迟补忓½{–ç•¥˜q›è¡Œ
å»¶è¿Ÿè¡¥å¿æ˜¯æ¸¸æˆæœåŠ¡å™¨ç«¯æ‰§è¡Œçš„ä¸€¿Uç–略,处ç†ç”¨æˆ·å‘½ä×o回退到客æˆïL«¯å‘é€å‘½ä»¤çš„准确旉™—´åQˆåšg˜qŸå¯¼è‡ß_¼‰åQŒæ ¹æ®å®¢æˆïL«¯çš„具体情况进行修æ£ï¼Œä»¥ç‰ºç‰²æ¸¸æˆåœ¨ä¼¤å®³åˆ¤å®šæ–šw¢çš„真实感æ¥å×Iè¡¥æ”»å‡»è¡Œä¸ºç‰æ–šw¢çœŸå®žæ„Ÿï¼Œæœ¬è´¨ä¸Šæ˜¯ä¸€¿U折ä¸é€‰æ‹©ã€?/p>
ä¸»è¦æ³¨æ„åQŒåšg˜qŸè¡¥å¿ä¸æ˜¯å‘生在客户端ã€?/p>
关于延迟补å¿çš„一个例å:
若游æˆåšg˜qŸè¡¥å¿è¢«¼›ç”¨åQŒé‚£ä¹ˆå°±ä¼šæœ‰è®¸å¤šçŽ©å®¶æŠ±æ€¨è‡ªå·±æ˜Žæ˜Žæ‰“ä¸äº†å¯ÒŽ–¹å´æ²¡æœ‰é€ æˆä»ÖM½•伤害。ã€?/p>
有所得,有所失:但这对低延时玩家貌似有些ä¸å…¬òq»I¼Œ¿UÕdŠ¨é€Ÿåº¦å¿«ï¼Œå¯èƒ½å·²ç»è·‘到角è½é‡Œåƈ且已íy²åœ¨ä¸€ä¸ªç®±ååŽé¢éšè—è“væ¥æ—¶è¢«å¯¹æ‰‹å‡»ä¸çš„错觉åQˆåå¼ÒŽ— 视掩体,玩家隔ç€å¢™è¢«ž®„击åQ‰ï¼Œ¼‹®å®žæœ‰äº›ä¸ä¹æ„ã€?/p>
延迟补å¿åQŒç½‘¾lœé«˜å»¶è¿Ÿçš„玩家有利,低åšg˜qŸçš„玩家优势å¯èƒ½ä¼šè¢«é™ä½ŽåQˆä½Žå»¶è¿ŸçŽ©å®¶åˆ©ç›Šå—æŸåQ‰ï¼Œä½†å¯¹¾l´æŠ¤æ¸¸æˆä¸–界的åã^衡还是有利的ã€?/p>
客户端和æœåŠ¡å™¨éœ€è¦å¯¹æ—Óž¼Œäº’相知é“å½¼æ¤å»¶è¿Ÿæƒ…况åQŒæ¯”如云风定义的æŸä¸ªæ¥éª¤åQ?/p>
客户端å‘é€ä¸€ä¸ªæœ¬åœ°æ—¶é—´é‡¾l™æœåС噍åQŒæœåŠ¡æ”¶åˆ°åŒ…åŽï¼Œå¤¹å¸¦ä¸€ä¸ªæœåŠ¡å™¨æ—‰™—´˜q”回¾l™å®¢æˆïL«¯ã€‚当客户端收到这个包åŽï¼Œå¯ä»¥ä¼°ç®—出包在èµ\½E‹ä¸Š¾lè¿‡çš„æ—¶é—´ã€‚åŒæ—¶æŠŠæœ¬åœ°æ–°æ—¶é—´å¤¹å¸¦è¿›åŽ»ï¼Œå†æ¬¡å‘é€ç»™æœåŠ¡å™¨ã€‚æœåŠ¡å™¨ä¹Ÿå¯ä»¥è¿›ä¸€æ¥çš„了解å“应旉™—´ã€?/p>
C/S两端通过¾cÖM¼¼æ¥éª¤˜q›è¡Œè®¡ç®—å½¼æ¤å»¶æ—¶/æ—¶å·®åQŒåŒæ—¶ä¼šå¯¹å®žæ—¶åŒæ¥è®¾¾|®ä¸€ä¸ªé˜€å€û|¼Œæ¯”如对åšg˜qŸä½Žäº?0msåQ?.01¿U’)的交互认为是åÏx—¶åŒæ¥å‘生åQŒä¸ä¼šè®¤ä¸ºæ˜¯å»¶è¿Ÿã€?/p>
ä¸åŒ¾cÕdž‹çš„æ¸¸æˆä¼šé’Ÿçˆ±ä¸åŒçš„å议呢åQŒä¸ä¸€è€ŒèƒöåQ?/p>
TCPä¼šè®¤å®šä¸¢åŒ…æ˜¯å› äØ“æœ¬åœ°å¸¦å®½ä¸èƒö坯D‡´åQˆæœ¬åœ°å¸¦å®½ä¸‘Ïx˜¯ä¸¢åŒ…çš„ä¸€éƒ¨åˆ†åŽŸå› åQ‰ï¼Œä½†å›½å†…ISPå¯èƒ½ä¼šåœ¨è‡ªèín机房¾|‘络拥挤时丢弃数æ®åŒ…åQŒè¿™æ—¶å€™å¯èƒ½éœ€è¦å¿«é€Ÿå‘包争抢通é“åQŒè€ŒéžTCP½H—壿”¶ç¾ƒåQŒUDP没有TCP½H—壿”¶ç¾ƒçš„负担,å¯ä»¥å¾ˆå®¹æ˜“åšåˆ°è¿™ä¸€ç‚V€?/p>
è¦æ±‚实时性放在第一ä½çš„FPS游æˆåQˆegåQšQuakeåQŒCSåQ‰ï¼Œòq¿åŸŸ¾|‘一般采用UDPåQŒå› å¯å®¹è®¸æœ‰ä¸¢å¤±æ•°æ®åŒ…å˜åœ¨ï¼ˆå¦å®¢æˆïL«¯è‹¥ç‰å¾…一ŒD‰|—¶é—´ä¸é—´ä¸¢åŒ…,å¯ä»¥é€šè¿‡æ’å€¼ç‰æ‰‹æ®µå¿½ç•¥æŽ‰ï¼‰åQŒä¸€æ—¦æ£€‹¹‹åˆ°å¯ä»¥å¿«é€Ÿå‘é€ï¼Œå¦ä¸æ¶‰åŠåˆ°é‡å‘的时候UDP比TCPè¦å¿«ä¸€ç‚¹å˜›ã€‚但会在UDPåº”ç”¨å±‚é¢æœ‰æ‰€å¢žåŠ å议控制åQŒæ¯”如ACK½{‰ã€?/p>
很多时候åè®®æØœç”¨ï¼Œæ¯”å¦‚MMO客户端也讔R¦–å…ˆä‹É用HTTP去获å–上一‹Æ¡çš„æ›´æ–°å†…容åQ?é‡è¦ä¿¡æ¯å¦‚角色获得的物å“å’Œç»éªŒéœ€è¦é€šè¿‡TCPä¼ è¾“åQŒè€Œå‘¨å›´äh物的动å‘ã€NPC¿UÕdŠ¨ã€æŠ€èƒ½åŠ¨ç”ÀLŒ‡ä»¤ç‰åˆ™å¯ä»¥ä‹É用UDPä¼ è¾“åQŒè™½ç„¶å¯èƒ½ä¸¢åŒ…,但媄å“ä¸å¤§ã€?/p>
¾|‘游通过客户端预‹¹‹ã€æ’值和æœåŠ¡å™¨ç«¯å»¶è¿Ÿè¡¥è„“½{‰ï¼ŒåŒ–è§£/消除用户端网¾lœåšg˜qŸé€ æˆçš„圙åÑ€‚我们虽然å¯èƒ½æ²¡æœ‰æœºä¼šæŽ¥è§¦æ¸¸æˆå¼€å‘,å¦ä¹ 跨界的优良ç»éªŒå’Œå®žè·µåQŒè¯´ä¸å‡†ä¼šå¯¹å½“å‰å·¥ä½œæŸäº›ä¸šåŠ¡ç‚¹çš„å¤„ç†æœ‰æ‰€å¯å‘å‘¢ã€?/p>
本集由韩国宇航局赞助æ’出åQšæˆ‘ä»¬è¦åŽ»è¿œæ–¹çœ‹çœ‹ï¼Œ˜q˜æœ‰ä»€ä¹ˆæ˜¯æˆ‘们的æ€å¯†è¾¾ã€?------ 《万万没惛_ˆ°ã€‹çދ大锤
¾U¿ä¸Šæƒ…况åQ?/p>
改进工作åQ?/p>
实际效果åQ?/p>
一般命令剾~€è‹¥æ·»åŠ ä¸Šmå—符ä¸ÔŒ¼Œè¡¨ç¤ºæ”¯æŒå¤šä¸ªã€æ‰¹é‡å‘½ä»¤æäº¤äº†ã€?/p>
昑ּ�..
MSET key value [key value ...]
MSETNX key value [key value ...]
HMGET key field [field ...]
HMSET key field value [field value ...]
一般方å¼çš„...
HDEL key field [field ...]
SREM key member [member ...]
RPUSH key value [value ...]
......
更多åQŒè¯·å‚考:http://redis.cn/commands.html
官方文档åQ?a >http://redis.io/topics/pipelining
ä¸€èˆ¬ä¸šåŠ¡ã€æŽ¥å…¥å‰ç«¯è¯·æ±‚釘q‡å¤§åQŒç”Ÿäº§è€…速度˜q‡å¿«åQŒè¿™æ—¶å€™ä‹É用队列暂时缓å˜ä¼šæ¯”较好一些,消费者直接直接从队列获å–ä»ÕdŠ¡åQŒé€šè¿‡é˜Ÿåˆ—让生产者和消费者进行分¼›»è¿™ä¹Ÿæ˜¯ä¸šç•Œæ™®é€šé‡‡ç”¨çš„æ–¹å¼ã€?/p>
有的时候,若å¯ä»¥ç›‘控一下队列消è´ÒŽƒ…况,å¯ä»¥ç›‘控一下,ž®±å¾ˆç›´è§‚。åŒäº‹äؓ队列æ·ÕdŠ äº†ä¸€ä¸ªç›‘æŽ§çº¿½E‹ï¼Œæ¸…晰明了了解队列消费情况ã€?/p>
½Cø™Œƒä½¿ç”¨äº†Redis PipelineåQŒçº¿½E‹æ± åQŒå‡†å¤‡æ•°æ®ï¼Œç”Ÿäñ”è€?消费者队列,队列监控½{‰ï¼Œæ¶ˆè´¹å®Œæ¯•åQŒç¨‹åºå…³é—ã€?/p>
/**
* 以下‹¹‹è¯•在Jedis 2.6下测试通过
*
* @author nieyong
*
*/
public class TestJedisPipeline {
private static final int NUM = 512;
private static final int MAX = 1000000; // 100W
private static JedisPool redisPool;
private static final ExecutorService pool = Executors.newCachedThreadPool();
protected static final BlockingQueue<String> queue = new ArrayBlockingQueue<String>(
MAX); // 100W
private static boolean finished = false;
static {
JedisPoolConfig config = new JedisPoolConfig();
config.setMaxActive(64);
config.setMaxIdle(64);
try {
redisPool = new JedisPool(config, "192.168.192.8", 6379, 10000,
null, 0);
} catch (Exception e) {
System.err.println("Init msg redis factory error! " + e.toString());
}
}
public static void main(String[] args) throws InterruptedException {
System.out.println("prepare test data 100W");
prepareTestData();
System.out.println("prepare test data done!");
// 生äñ”者,模拟è¯äh±‚100W‹Æ?
pool.execute(new Runnable() {
@Override
public void run() {
for (int i = 0; i < MAX; i++) {
if (i % 3 == 0) {
queue.offer("del_key key_" + i);
} else {
queue.offer("get_key key_" + i);
}
}
}
});
// CPUæ ¸æ•°*2 个工作者线½E?
int threadNum = 2 * Runtime.getRuntime().availableProcessors();
for (int i = 0; i < threadNum; i++)
pool.execute(new ConsumerTask());
pool.execute(new MonitorTask());
Thread.sleep(10 * 1000);// 10sec
System.out.println("going to shutdown server ...");
setFinished(true);
pool.shutdown();
pool.awaitTermination(1, TimeUnit.MILLISECONDS);
System.out.println("colse!");
}
private static void prepareTestData() {
Jedis redis = redisPool.getResource();
Pipeline pipeline = redis.pipelined();
for (int i = 0; i < MAX; i++) {
pipeline.set("key_" + i, (i * 2 + 1) + "");
if (i % (NUM * 2) == 0) {
pipeline.sync();
}
}
pipeline.sync();
redisPool.returnResource(redis);
}
// queue monitoråQŒç”Ÿäº§è€?消费队列监控
private static class MonitorTask implements Runnable {
@Override
public void run() {
while (!Thread.interrupted() && !isFinished()) {
System.out.println("queue.size = " + queue.size());
try {
Thread.sleep(500); // 0.5 second
} catch (InterruptedException e) {
break;
}
}
}
}
// consumeråQŒæ¶ˆè´¹è€?
private static class ConsumerTask implements Runnable {
@Override
public void run() {
while (!Thread.interrupted() && !isFinished()) {
if (queue.isEmpty()) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
}
continue;
}
List<String> tasks = new ArrayList<String>(NUM);
queue.drainTo(tasks, NUM);
if (tasks.isEmpty()) {
continue;
}
Jedis jedis = redisPool.getResource();
Pipeline pipeline = jedis.pipelined();
try {
List<Response<String>> resultList = new ArrayList<Response<String>>(
tasks.size());
List<String> waitDeleteList = new ArrayList<String>(
tasks.size());
for (String task : tasks) {
String key = task.split(" ")[1];
if (task.startsWith("get_key")) {
resultList.add(pipeline.get(key));
waitDeleteList.add(key);
} else if (task.startsWith("del_key")) {
pipeline.del(key);
}
}
pipeline.sync();
// 处熘q”回列表
for (int i = 0; i < resultList.size(); i++) {
resultList.get(i).get();
// handle value here ...
// System.out.println("get value " + value);
}
// è¯Õd–完毕åQŒç›´æŽ¥åˆ 除之
for (String key : waitDeleteList) {
pipeline.del(key);
}
pipeline.sync();
} catch (Exception e) {
redisPool.returnBrokenResource(jedis);
} finally {
redisPool.returnResource(jedis);
}
}
}
}
private static boolean isFinished(){
return finished;
}
private static void setFinished(boolean bool){
finished = bool;
}
}
代ç ä½œäØ“½Cø™Œƒã€‚è‹¥¾U¿ä¸Šåˆ™éœ€è¦å¤„ç†ä¸€äº›å¼‚常ç‰ã€?/p>
若能够批é‡è¯·æ±‚进行åˆòq¶æ“作,自然å¯ä»¥èŠ‚çœå¾ˆå¤šçš„网¾lœå¸¦å®½ã€CPU½{‰èµ„æºã€‚有¾cÖM¼¼é—®é¢˜çš„åŒå¦ï¼Œä¸å¦¨è€ƒè™‘一下ã€?/p>
新申è¯ïLš„æœåŠ¡å™¨å†…æ æ€Ø“2.6.32åQŒåŽŸå…ˆçš„TCP Serverç›´æŽ¥åœ¨æ–°å†…æ ¸çš„LinxuæœåŠ¡å™¨ä¸Š˜q行åQŒè¿è¡Œdmesg命ä×oåQŒå¯ä»¥çœ‹åˆ°å¤§é‡çš„SYN floodingè¦å‘ŠåQ?/p>
possible SYN flooding on port 8080. Sending cookies.
原先çš?.6.18å†…æ ¸çš„å‚æ•°åœ¨2.6.32å†…æ ¸ç‰ˆæœ¬æƒ…å†µä¸‹ï¼Œ½Ž€å•è°ƒæ•?net.ipv4.tcp_max_syn_backlog"å·²ç»æ²¡æœ‰ä½œç”¨ã€?/p>
怎么办,åªèƒ½å†æ¬¡é˜…读2.6.32æºç åQŒä»¥ä¸‹å³æ˜¯ã€?/p>
最åŽå°¾l“å¤„æœ‰ç›´æŽ¥ç»“è®ºï¼Œå¿ƒæ€¥çš„ä½ å¯ä»¥ç›´æŽ¥é˜…è¯ÀL€È»“好了ã€?/p>
net/Socket.c:
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
struct socket *sock;
int err, fput_needed;
int somaxconn;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (sock) {
somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
if ((unsigned)backlog > somaxconn)
backlog = somaxconn;
err = security_socket_listen(sock, backlog);
if (!err)
err = sock->ops->listen(sock, backlog);
fput_light(sock->file, fput_needed);
}
return err;
}
net/ipv4/Af_inet.c:
/*
* Move a socket into listening state.
*/
int inet_listen(struct socket *sock, int backlog)
{
struct sock *sk = sock->sk;
unsigned char old_state;
int err;
lock_sock(sk);
err = -EINVAL;
if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
goto out;
old_state = sk->sk_state;
if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
goto out;
/* Really, if the socket is already in listen state
* we can only allow the backlog to be adjusted.
*/
if (old_state != TCP_LISTEN) {
err = inet_csk_listen_start(sk, backlog);
if (err)
goto out;
}
sk->sk_max_ack_backlog = backlog;
err = 0;
out:
release_sock(sk);
return err;
}
inet_listen调用inet_csk_listen_start函数åQŒæ‰€ä¼ 入的backlog傿•°æ”¹å¤´æ¢é¢åQŒå˜æˆäº†ä¸å¯ä¿®æ”¹çš„常é‡nr_table_entries了ã€?/p>
net/ipv4/Inet_connection_sock.c:
int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
{
struct inet_sock *inet = inet_sk(sk);
struct inet_connection_sock *icsk = inet_csk(sk);
int rc = reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries);
if (rc != 0)
return rc;
sk->sk_max_ack_backlog = 0;
sk->sk_ack_backlog = 0;
inet_csk_delack_init(sk);
/* There is race window here: we announce ourselves listening,
* but this transition is still not validated by get_port().
* It is OK, because this socket enters to hash table only
* after validation is complete.
*/
sk->sk_state = TCP_LISTEN;
if (!sk->sk_prot->get_port(sk, inet->num)) {
inet->sport = htons(inet->num);
sk_dst_reset(sk);
sk->sk_prot->hash(sk);
return 0;
}
sk->sk_state = TCP_CLOSE;
__reqsk_queue_destroy(&icsk->icsk_accept_queue);
return -EADDRINUSE;
}
下é¢å¤„ç†çš„æ˜¯TCP SYN_RECV状æ€çš„˜qžæŽ¥åQŒå¤„äºŽæ¡æ‰‹é˜¶ŒDµï¼Œä¹Ÿå¯ä»¥è¯´æ˜¯åŠ˜qžæŽ¥æ—Óž¼Œ½{‰å¾…瀘qžæŽ¥æ–¹ç¬¬ä¸‰æ¬¡æ¡æ‰‹ã€?/p>
/*
* Maximum number of SYN_RECV sockets in queue per LISTEN socket.
* One SYN_RECV socket costs about 80bytes on a 32bit machine.
* It would be better to replace it with a global counter for all sockets
* but then some measure against one socket starving all other sockets
* would be needed.
*
* It was 128 by default. Experiments with real servers show, that
* it is absolutely not enough even at 100conn/sec. 256 cures most
* of problems. This value is adjusted to 128 for very small machines
* (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb).
* Note : Dont forget somaxconn that may limit backlog too.
*/
int reqsk_queue_alloc(struct request_sock_queue *queue,
unsigned int nr_table_entries)
{
size_t lopt_size = sizeof(struct listen_sock);
struct listen_sock *lopt;
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
nr_table_entries = max_t(u32, nr_table_entries, 8);
nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
lopt_size += nr_table_entries * sizeof(struct request_sock *);
if (lopt_size > PAGE_SIZE)
lopt = __vmalloc(lopt_size,
GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
PAGE_KERNEL);
else
lopt = kzalloc(lopt_size, GFP_KERNEL);
if (lopt == NULL)
return -ENOMEM;
for (lopt->max_qlen_log = 3;
(1 << lopt->max_qlen_log) < nr_table_entries;
lopt->max_qlen_log++);
get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
rwlock_init(&queue->syn_wait_lock);
queue->rskq_accept_head = NULL;
lopt->nr_table_entries = nr_table_entries;
write_lock_bh(&queue->syn_wait_lock);
queue->listen_opt = lopt;
write_unlock_bh(&queue->syn_wait_lock);
return 0;
}
关键è¦çœ‹nr_table_entrieså˜é‡åQŒåœ¨reqsk_queue_alloc函数ä¸nr_table_entrieså˜æˆäº†æ— ½W¦å·å˜é‡åQŒå¯ä¿®æ”¹çš„,å˜åŒ–å—é™ã€?/p>
æ¯”å¦‚å®žé™…å†…æ ¸å‚æ•°å€égØ“åQ?/p>
net.ipv4.tcp_max_syn_backlog = 65535
æ‰€ä¼ å…¥çš„backlogåQˆä¸å¤§äºŽnet.core.somaxconn = 65535åQ‰äØ“8102åQŒé‚£ä¹?/p>
// å–listen函数的backlogå’Œsysctl_max_syn_backlog最ž®å€û|¼Œ¾l“æžœä¸?102
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
// å–nr_table_entrieså’?˜q›è¡Œæ¯”较的最大å€û|¼Œ¾l“æžœä¸?102
nr_table_entries = max_t(u32, nr_table_entries, 8);
// å¯çœ‹å?nr_table_entries*2åQŒç»“æžœäØ“8102*2=16204
nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
计算¾l“æžœåQŒmax_qlen_log = 14
for (lopt->max_qlen_log = 6;
(1 << lopt->max_qlen_log) < sysctl_max_syn_backlog;
lopt->max_qlen_log++);
ä½œäØ“listen_sock¾l“构定义了需è¦å¤„ç†çš„处ç†åŠè¿žæŽ¥çš„é˜Ÿåˆ—å…ƒç´ ä¸ªæ•°ä¸ºnr_table_entriesåQŒæ¤ä¾‹ä¸ä¸?6204长度ã€?/p>
/** struct listen_sock - listen state
*
* @max_qlen_log - log_2 of maximal queued SYNs/REQUESTs
*/
struct listen_sock {
u8 max_qlen_log;
/* 3 bytes hole, try to use */
int qlen;
int qlen_young;
int clock_hand;
u32 hash_rnd;
u32 nr_table_entries;
struct request_sock *syn_table[0];
};
¾læ˜q°è€ŒçŸ¥åQ?^max_qlen_log = åŠè¿žæŽ¥é˜Ÿåˆ—长度qlen倹{€?/p>
å†å›žå¤´çœ‹çœ‹æŠ¥å‘ŠSYN flooding的函敎ͼš
net/ipv4/Tcp_ipv4.c
#ifdef CONFIG_SYN_COOKIES
static void syn_flood_warning(struct sk_buff *skb)
{
static unsigned long warntime;
if (time_after(jiffies, (warntime + HZ * 60))) {
warntime = jiffies;
printk(KERN_INFO
"possible SYN flooding on port %d. Sending cookies.\n",
ntohs(tcp_hdr(skb)->dest));
}
}
#endif
被调用的处,已精½Ž€è‹¥å¹²ä»£ç åQ?/p>
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
......
#ifdef CONFIG_SYN_COOKIES
int want_cookie = 0;
#else
#define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
#endif
......
/* TW buckets are converted to open requests without
* limitations, they conserve resources and peer is
* evidently real one.
*/
// 判æ–åŠè¿žæŽ¥é˜Ÿåˆ—是å¦å·²æ»?&& !0
if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
#ifdef CONFIG_SYN_COOKIES
if (sysctl_tcp_syncookies) {
want_cookie = 1;
} else
#endif
goto drop;
}
/* Accept backlog is full. If we have already queued enough
* of warm entries in syn queue, drop request. It is better than
* clogging syn queue with openreqs with exponentially increasing
* timeout.
*/
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
goto drop;
req = inet_reqsk_alloc(&tcp_request_sock_ops);
if (!req)
goto drop;
......
if (!want_cookie)
TCP_ECN_create_request(req, tcp_hdr(skb));
if (want_cookie) {
#ifdef CONFIG_SYN_COOKIES
syn_flood_warning(skb);
req->cookie_ts = tmp_opt.tstamp_ok;
#endif
isn = cookie_v4_init_sequence(sk, skb, &req->mss);
} else if (!isn) {
......
}
......
}
判æ–åŠè¿žæŽ¥é˜Ÿåˆ—已满的函数很关键,å¯ä»¥çœ‹çœ‹˜q算法则åQ?/p>
include/net/Inet_connection_sock.h:
static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
{
return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
}
include/net/Rquest_sock.h:
static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
{
// å‘峿UÖM½max_qlen_log个å•ä½?
return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
}
˜q”回1åQŒè‡ªç„¶è¡¨½CºåŠ˜qžæŽ¥é˜Ÿåˆ—已满ã€?/p>
以上仅仅是分æžäº†åŠè¿žæŽ¥é˜Ÿåˆ—å·²æ»¡çš„åˆ¤æ–æ¡äšgåQŒæ€ÖM¹‹åº”用½E‹åºæ‰€ä¼ 入的backlog很关键,如值太ž®ï¼Œå¾ˆå®¹æ˜“å¾—åˆ?.
è‹?somaxconn = 128åQŒsysctl_max_syn_backlog = 4096åQŒbacklog = 511 则最¾l?nr_table_entries = 256åQŒmax_qlen_log = 8。那么超˜q?56个劘qžæŽ¥çš„队列,257 >> 8 = 1åQŒé˜Ÿåˆ—已满ã€?/p>
如何讄¡½®backlogåQŒè¿˜å¾—需è¦ç»“åˆå…·ä½“应用程åºï¼Œéœ€è¦äؓ其调用listenæ–ÒŽ³•赋倹{€?/p>
Tcp Server使用Netty 3.7 版本åQŒç‰ˆæœ¬è¾ƒä½Žï¼Œåœ¨å¤„ç†backlogåQŒè‹¥æˆ‘们䏿‰‹åŠ¨æŒ‡å®šbacklogå€û|¼ŒJDK 1.6默认ä¸?0ã€?/p>
有è¯å¦‚下åQ?java.net.ServerSocket:
public void bind(SocketAddress endpoint, int backlog) throws IOException {
if (isClosed())
throw new SocketException("Socket is closed");
if (!oldImpl && isBound())
throw new SocketException("Already bound");
if (endpoint == null)
endpoint = new InetSocketAddress(0);
if (!(endpoint instanceof InetSocketAddress))
throw new IllegalArgumentException("Unsupported address type");
InetSocketAddress epoint = (InetSocketAddress) endpoint;
if (epoint.isUnresolved())
throw new SocketException("Unresolved address");
if (backlog < 1)
backlog = 50;
try {
SecurityManager security = System.getSecurityManager();
if (security != null)
security.checkListen(epoint.getPort());
getImpl().bind(epoint.getAddress(), epoint.getPort());
getImpl().listen(backlog);
bound = true;
} catch(SecurityException e) {
bound = false;
throw e;
} catch(IOException e) {
bound = false;
throw e;
}
}
nettyä¸ï¼Œå¤„ç†backlog的地方:
org/jboss/netty/channel/socket/DefaultServerSocketChannelConfig.java:
@Override
public boolean setOption(String key, Object value) {
if (super.setOption(key, value)) {
return true;
}
if ("receiveBufferSize".equals(key)) {
setReceiveBufferSize(ConversionUtil.toInt(value));
} else if ("reuseAddress".equals(key)) {
setReuseAddress(ConversionUtil.toBoolean(value));
} else if ("backlog".equals(key)) {
setBacklog(ConversionUtil.toInt(value));
} else {
return false;
}
return true;
}
æ—¢ç„¶éœ€è¦æˆ‘们手动指定backlogå€û|¼Œé‚£ä¹ˆå¯ä»¥˜q™æ ·åšï¼š
bootstrap.setOption("backlog", 8102); // 讄¡½®å¤§ä¸€äº›æ²¡æœ‰å…³¾p»ï¼Œ¾pÈ»Ÿå†…æ ¸ä¼šè‡ªåŠ¨ä¸Žnet.core.somaxconnç›¸æ¯”è¾ƒï¼Œå–æœ€ä½Žå€?
相对比Netty 4.0åQŒæœ‰äº›ä¸æ™ø™ƒ½åQŒå¯å‚考:http://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.html
在linuxå†…æ ¸2.6.32åQŒè‹¥åœ¨æ²¡æœ‰éå—到SYN floodingæ”Õd‡»çš„æƒ…况下åQŒå¯ä»¥é€‚当调整åQ?/p>
sysctl -w net.core.somaxconn=32768
sysctl -w net.ipv4.tcp_max_syn_backlog=65535
sysctl -p
å¦åƒä¸‡åˆ«å¿˜è®°ä¿®æ”¹TCP Serverçš„listenæŽ¥å£æ‰€ä¼ 入的backlogå€û|¼Œè‹¥ä¸è®„¡½®æˆ–者过ž®ï¼Œéƒ½ä¼šæœ‰å¯èƒ½é€ æˆSYN floodingçš„è¦å‘Šä¿¡æ¯ã€‚开始ä¸å¦¨è®¾¾|®æˆ1024åQŒç„¶åŽè§‚察一ŒD‰|—¶é—´æ ¹æ®å®žé™…情况需è¦å†æ…¢æ…¢å¾€ä¸Šè°ƒã€?/p>
æ— è®ºä½ å¦‚ä½•è®¾¾|®ï¼Œæœ€¾lˆbacklog倯DŒƒå›´äØ“åQ?/p>
backlog <= net.core.somaxconn
åŠè¿žæŽ¥é˜Ÿåˆ—长度约为:
åŠè¿žæŽ¥é˜Ÿåˆ—é•¿åº?≈ 2 * min(backlog, net.ipv4.tcpmax_syn_backlog)
å¦ï¼Œè‹¥å‡ºçްSYN floodingæ—Óž¼Œæ¤æ—¶TCP SYN_RECVæ•°é‡è¡¨ç¤ºåŠè¿žæŽ¥é˜Ÿåˆ—å·²¾l满åQŒå¯ä»¥æŸ¥çœ‹ä¸€ä¸‹ï¼š
ss -ant | awk 'NR>1 {++s[$1]} END {for(k in s) print k,s[k]}'
感谢˜q维书夞®ä¼™æä¾›çš„æ¯”较好用查看命令ã€?/p>
最˜q‘线上æœåС噍åQŒdmesg会给å‡ÞZ¸€äº›è¦å‘Šä¿¡æ¯ï¼š
possible SYN flooding on port 8080. Sending cookies.
åˆçœ‹ä»¥äؓ是å—到DOSæ‹’ç»æ€§æ”»å‡»ï¼Œä½†ä»”¾l†ä¸€åˆ†æžåQŒä¸€å¤©é‡ä¹Ÿå°±æ˜¯åœ¨1000多æ¡å·¦å³åQŒæ„Ÿè§‰ä¸Šå±žäºŽæ£å¸¸å¯æŽ¥å—范围ã€?/p>
下é¢éœ€è¦æ‰¾å‡ºæ¥æºï¼Œä»¥åŠåŽŸå› åQŒä»¥ä¸‹å†…容基于Linux 2.6.18å†…æ ¸ã€?/p>
net/ipv4/Tcp_ipv4.c:
#ifdef CONFIG_SYN_COOKIES
static void syn_flood_warning(struct sk_buff *skb)
{
static unsigned long warntime; // ½W¬ä¸€‹Æ¡åŠ è½½åˆå§‹åŒ–为零åQŒå޾lwarntime = jiffies
if (time_after(jiffies, (warntime + HZ * 60))) {
warntime = jiffies;
printk(KERN_INFO
"possible SYN flooding on port %d. Sending cookies.\n",
ntohs(skb->h.th->dest));
}
}
#endif
很显ç„Óž¼ŒCONFIG_SYN_COOKIES在Linux¾pÈ»Ÿ¾~–译æ—Óž¼Œå·²è¢«è®„¡½®trueã€?/p>
time_afterå®å®šä¹‰ï¼š
#define time_after(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(b) - (long)(a) < 0))
ä¸¤ä¸ªæ— ç¬¦åïLš„æ—‰™—´æ¯”较åQŒç¡®å®šå…ˆåŽé¡ºåºã€?/p>
jiffies真èínåQ?/p>
# define jiffies raid6_jiffies()
#define HZ 1000
......
static inline uint32_t raid6_jiffies(void)
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec*1000 + tv.tv_usec/1000; // ¿U?1000 + 微秒/1000
}
回过头æ¥åQŒå†çœ‹çœ‹syn_flood_warning函数åQ?/p>
static void syn_flood_warning(struct sk_buff *skb)
{
static unsigned long warntime; // ½W¬ä¸€‹Æ¡åŠ è½½åˆå§‹åŒ–为零åQŒå޾lwarntime = jiffies
if (time_after(jiffies, (warntime + HZ * 60))) {
warntime = jiffies;
printk(KERN_INFO
"possible SYN flooding on port %d. Sending cookies.\n",
ntohs(skb->h.th->dest));
}
}
warntime为static¾cÕdž‹åQŒç¬¬ä¸€‹Æ¡è°ƒç”¨æ—¶è¢«åˆå§‹åŒ–为零åQŒä¸‹‹Æ¡è°ƒç”¨å°±æ˜¯ä¸Š‹Æ¡çš„jiffieså€égº†åQŒå‰åŽé—´éš”倯D¶…˜q‡HZ*60ž®×ƒ¸ä¼šè¾“凸™¦å‘Šä¿¡æ¯äº†ã€?/p>
有关time_afterå’ŒjiffiesåQŒåˆ†äº«å‡ ½‹‡æ–‡ç« :
http://wenku.baidu.com/view/c75658d480eb6294dd886c4e.html
注æ„观察want_cookie=1æ—¶çš„æ¡äšgã€?/p>
net/ipv4/Tcp_ipv4.c:
int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
struct inet_request_sock *ireq;
struct tcp_options_received tmp_opt;
struct request_sock *req;
__u32 saddr = skb->nh.iph->saddr;
__u32 daddr = skb->nh.iph->daddr;
__u32 isn = TCP_SKB_CB(skb)->when; // when在tcp_v4_rcv()ä¸ä¼šè¢«ç½®ä¸?
struct dst_entry *dst = NULL;
#ifdef CONFIG_SYN_COOKIES
int want_cookie = 0;
#else
#define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
#endif
/* Never answer to SYNs send to broadcast or multicast */
if (((struct rtable *)skb->dst)->rt_flags &
(RTCF_BROADCAST | RTCF_MULTICAST))
goto drop;
/* TW buckets are converted to open requests without
* limitations, they conserve resources and peer is
* evidently real one.
*/
// if(判æ–åŠè¿žæŽ¥é˜Ÿåˆ—å·²æ»?&& !0)
if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
#ifdef CONFIG_SYN_COOKIES
if (sysctl_tcp_syncookies) { // net.ipv4.tcp_syncookies = 1
want_cookie = 1;
} else
#endif
goto drop;
}
/* Accept backlog is full. If we have already queued enough
* of warm entries in syn queue, drop request. It is better than
* clogging syn queue with openreqs with exponentially increasing
* timeout.
*/
// if(˜qžæŽ¥é˜Ÿåˆ—是å¦å·²æ»¡ && åŠè¿žæŽ¥é˜Ÿåˆ—丘q˜æœ‰æœªé‡ä¼ ACKåŠè¿žæŽ¥æ•°å?> 1)
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
goto drop;
......
tcp_openreq_init(req, &tmp_opt, skb);
ireq = inet_rsk(req);
ireq->loc_addr = daddr;
ireq->rmt_addr = saddr;
ireq->opt = tcp_v4_save_options(sk, skb);
if (!want_cookie)
TCP_ECN_create_request(req, skb->h.th);
if (want_cookie) { // åŠè¿žæŽ¥é˜Ÿåˆ—已满会触å‘
#ifdef CONFIG_SYN_COOKIES
syn_flood_warning(skb);
#endif
isn = cookie_v4_init_sequence(sk, skb, &req->mss);
} else if (!isn) {
......
}
/* Kill the following clause, if you dislike this way. */
// net.ipv4.tcp_syncookies未设¾|®æƒ…况下åQŒsysctl_max_syn_backlogå‘生的作ç”?
else if (!sysctl_tcp_syncookies &&
(sysctl_max_syn_backlog - inet_csk_reqsk_queue_len(sk) <
(sysctl_max_syn_backlog >> 2)) &&
(!peer || !peer->tcp_ts_stamp) &&
(!dst || !dst_metric(dst, RTAX_RTT))) {
/* Without syncookies last quarter of
* backlog is filled with destinations,
* proven to be alive.
* It means that we continue to communicate
* to destinations, already remembered
* to the moment of synflood.
*/
LIMIT_NETDEBUG(KERN_DEBUG "TCP: drop open "
"request from %u.%u.%u.%u/%u\n",
NIPQUAD(saddr),
ntohs(skb->h.th->source));
dst_release(dst);
goto drop_and_free;
}
isn = tcp_v4_init_sequence(sk, skb);
}
tcp_rsk(req)->snt_isn = isn;
if (tcp_v4_send_synack(sk, req, dst))
goto drop_and_free;
if (want_cookie) {
reqsk_free(req);
} else {
inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
}
return 0;
drop_and_free:
reqsk_free(req);
drop:
return 0;
}
æ€ÖM¹‹åQŒå¦‚¾pÈ»Ÿå‡ºçްåQ?/p>
possible SYN flooding on port 8080. Sending cookies.
è‹¥é‡ä¸å¤§åQŒæ˜¯åœ¨æé†’ä½ éœ€è¦å…³å¿ƒä¸€ä¸‹sysctl_max_syn_backlog其值是å¦è¿‡ä½?
sysctl -a | grep 'max_syn_backlog'
ä¸å¦¨æˆå€å¢žåР䏀ä¸?/p>
sysctl -w net.ipv4.tcp_max_syn_backlog=8192
sysctl -p
若进½E‹æ— 法åšåˆ°é‡æ–°åŠ è½½ï¼Œé‚£å°±éœ€è¦é‡å¯åº”ç”¨ï¼Œä»¥é€‚åº”æ–°çš„å†…æ ¸å‚æ•°ã€‚进而挾l观察一ŒD‰|—¶é—´ã€?/p>
貌似tcp_max_syn_backlog傿•°å…¶å®Œæ•´ä½œç”¨åŸŸ˜q˜æ²¡æœ‰ç†è§£å®Œæ•ß_¼Œä¸‹æ¬¡æœ‰æ—¶é—´å†å†™å§ã€?/p>
有些东西æ€ÀL˜¯å¾ˆå®¹æ˜“é—忘,一时记得了åQŒè¿‡ä¸¤å¤©ž®ÞqœŸæ£è¿˜¾l™å‘¨å…¬äº†ã€‚零零碎¼„Žçš„ä¸å¦‚一òq¶è®°ä¸‹æ¥åQŒä»¥åŽå¯ä»¥ç›´æŽ¥æ‹¿˜q‡æ¥æŸ¥è¯¢å›_¯ã€?/p>
以下内容åŸÞZºŽLinux 2.6.18å†…æ ¸ã€?/p>
˜q™ä¸ªå‚数具体æ„义åQŒå…ˆçœ‹çœ‹Linux Socketçš„listen解释
man listen
#include <sys/socket.h>
int listen(int sockfd, int backlog);
int¾cÕdž‹çš„backlog傿•°åQŒlistenæ–ÒŽ³•çš„backlogæ„义为,已ç»å®Œæˆä¸‰æ¬¡æ¡æ‰‹ã€å·²¾læˆåŠŸå¾ç«‹è¿žæŽ¥çš„套接å—å°†è¦è¿›å…¥é˜Ÿåˆ—的长度ã€?/p>
一般我们自己定义设定backlogå€û|¼Œè‹¥æˆ‘们设¾|®çš„backlog值大于net.core.somaxconnå€û|¼Œž®†è¢«¾|®äØ“net.core.somaxconn值大ž®ã€‚è‹¥ä¸æƒ³ç›´æŽ¥¼‹¬æ€§æŒ‡å®šï¼Œè·Ÿéš¾pÈ»Ÿè®‘Ö®šåQŒåˆ™éœ€è¦è¯»å?proc/sys/net/core/somaxconnã€?/p>
net\Socket.c :
/*
* Perform a listen. Basically, we allow the protocol to do anything
* necessary for a listen, and if that works, we mark the socket as
* ready for listening.
*/
int sysctl_somaxconn = SOMAXCONN;
asmlinkage long sys_listen(int fd, int backlog)
{
struct socket *sock;
int err, fput_needed;
if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) {
if ((unsigned) backlog > sysctl_somaxconn)
backlog = sysctl_somaxconn;
err = security_socket_listen(sock, backlog);
if (!err)
err = sock->ops->listen(sock, backlog);
fput_light(sock->file, fput_needed);
}
return err;
}
比如¾l常使用的netty(4.0)框架åQŒåœ¨Linux下å¯åŠ¨æ—¶åQŒä¼šç›´æŽ¥è¯Õd–/proc/sys/net/core/somaxconn值然åŽä½œä¸ºlistençš„backlog傿•°˜q›è¡Œè°ƒç”¨Linux¾pÈ»Ÿçš„listen˜q›è¡Œåˆå§‹åŒ–ç‰ã€?/p>
int somaxconn = 3072;
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader("/proc/sys/net/core/somaxconn"));
somaxconn = Integer.parseInt(in.readLine());
logger.debug("/proc/sys/net/core/somaxconn: {}", somaxconn);
} catch (Exception e) {
// Failed to get SOMAXCONN
} finally {
if (in != null) {
try {
in.close();
} catch (Exception e) {
// Ignored.
}
}
}
SOMAXCONN = somaxconn;
......
private volatile int backlog = NetUtil.SOMAXCONN;
一般ç¨å¾®å¢žå¤§net.core.somaxconn值就昑־—很有必è¦ã€?/p>
讄¡½®å…¶å€¼æ–¹æ³•:
sysctl -w net.core.somaxconn=65535
较大内å˜çš„LinuxåQ?5535æ•°å€ég¸€èˆ¬å°±å¯ä»¥äº†ã€?/p>
若让其生效,sysctl -p å›_¯åQŒç„¶åŽé‡å¯ä½ çš„Server应用å›_¯ã€?/p>
å†…æ ¸ä»£ç ä¸sysctl.cæ–‡äšg解释åQ?/p>
number of unprocessed input packets before kernel starts dropping them, default 300
我所ç†è§£çš„å«ä¹‰ï¼Œæ¯ä¸ª¾|‘ç»œæŽ¥å£æŽ¥æ”¶æ•°æ®åŒ…çš„é€ŸçŽ‡æ¯”å†…æ ¸å¤„ç†è¿™äº›åŒ…的速率快时åQŒå…è®”R€åˆ°é˜Ÿåˆ—的最大数目,一旦超˜q‡å°†è¢«ä¸¢å¼ƒã€?/p>
所起作用处åQŒnet/core/Dev.cåQ?/p>
int netif_rx(struct sk_buff *skb)
{
struct softnet_data *queue;
unsigned long flags;
/* if netpoll wants it, pretend we never saw it */
if (netpoll_rx(skb))
return NET_RX_DROP;
if (!skb->tstamp.off_sec)
net_timestamp(skb);
/*
* The code is rearranged so that the path is the most
* short when CPU is congested, but is still operating.
*/
local_irq_save(flags);
queue = &__get_cpu_var(softnet_data);
__get_cpu_var(netdev_rx_stat).total++;
if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
if (queue->input_pkt_queue.qlen) {
enqueue:
dev_hold(skb->dev);
__skb_queue_tail(&queue->input_pkt_queue, skb);
local_irq_restore(flags);
return NET_RX_SUCCESS;
}
netif_rx_schedule(&queue->backlog_dev);
goto enqueue;
}
__get_cpu_var(netdev_rx_stat).dropped++;
local_irq_restore(flags);
kfree_skb(skb);
return NET_RX_DROP;
}
以上代ç 看一下,大概会明白netdev_max_backlog会在什么时候è“v作用ã€?/p>