ï»??xml version="1.0" encoding="utf-8" standalone="yes"?>污视频网站在线,国产视频二区在线观看,免费在线国产精品http://www.aygfsteel.com/yongboy/category/54837.html记录工作/学习的点ç‚ÒŽ»´æ»´ã€?/description>zh-cnMon, 01 Jun 2015 03:42:48 GMTMon, 01 Jun 2015 03:42:48 GMT60SO_REUSEPORT学习½W”记补遗http://www.aygfsteel.com/yongboy/archive/2015/02/25/423037.htmlnieyongnieyongWed, 25 Feb 2015 14:23:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/02/25/423037.htmlhttp://www.aygfsteel.com/yongboy/comments/423037.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/02/25/423037.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/423037.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/423037.html

前言

å› äØ“èƒ½åŠ›æœ‰é™åQŒè¿˜æ˜¯æœ‰å¾ˆå¤šä¸œè¥¿åQˆSO_REUSEADDRå’ŒSO_REUSEPORT的区别等åQ‰æ²¡æœ‰èƒ½å¤Ÿåœ¨ä¸€½‹‡æ–‡å­—中表达清楚åQŒä½œä¸ø™¡¥é—,也方便以后自己回˜q‡å¤´æ¥å¤ä¹ ã€?/p>

SO_REUSADDR VS SO_REUSEPORT

两者不是一码事åQŒæ²¡æœ‰å¯æ¯”性。有时也会被其搞晕,自己æ€È»“的不好,推荐StackOverflowçš?a >Socket options SO_REUSEADDR and SO_REUSEPORT, how do they differ?资料åQŒæ€È»“的很全面ã€?/p>

½Ž€å•来è¯ß_¼š

  • 讄¡½®äº†SO_REUSADDR的应用可以避免TCP çš?TIME_WAIT 状æ€?æ—‰™—´˜q‡é•¿æ— æ³•复用端口åQŒå°¤å…¶è¡¨çŽ°åœ¨åº”ç”¨½E‹åºå…³é—­-重启交替的瞬é—?
  • SO_REUSEPORTæ›´å¼ºå¤§ï¼Œéš¶å±žäºŽåŒä¸€ä¸ªç”¨æˆøP¼ˆé˜²æ­¢ç«¯å£åŠ«æŒåQ‰çš„多个˜q›ç¨‹/¾U¿ç¨‹å…׃ín一个端口,同时在内核层面替上层应用做数据包˜q›ç¨‹/¾U¿ç¨‹çš„处理均è¡?

若有困惑åQŒæŽ¨èä¸¤è€…都讄¡½®åQŒä¸ä¼šæœ‰å†²çªã€?/p>

Netty多线½E‹ä‹É用SO_REUSEPORT

上一½‹‡è®²åˆ°SO_REUSEPORTåQŒå¤šä¸ªç¨‹¾l‘定同一个端口,可以æ ÒŽ®éœ€è¦æŽ§åˆ¶è¿›½E‹çš„æ•°é‡ã€‚这里讲讲基äº?code>Netty 4.0.25+Epoll navtie transport在单个进½E‹å†…多个¾U¿ç¨‹¾l‘定同一个端口的情况åQŒä¹Ÿæ˜¯æ¯”较实用的ã€?/p>

TCP服务器,同一个进½E‹å¤š¾U¿ç¨‹¾l‘定同一个端å?/h4>

˜q™æ˜¯ä¸€ä¸ªPING-PONG½Cø™Œƒåº”用åQ?/p>

     public void run() throws Exception {
            final EventLoopGroup bossGroup = new EpollEventLoopGroup();
            final EventLoopGroup workerGroup = new EpollEventLoopGroup();
            ServerBootstrap b = new ServerBootstrap();

           b.group(bossGroup, workerGroup)
                     .channel(EpollServerSocketChannel. class)
                     .childHandler( new ChannelInitializer<SocketChannel>() {
                            @Override
                            public void initChannel(SocketChannel ch) throws Exception {
                                ch.pipeline().addLast(
                                            new StringDecoder(CharsetUtil.UTF_8 ),
                                            new StringEncoder(CharsetUtil.UTF_8 ),
                                            new PingPongServerHandler());
                           }
                     }).option(ChannelOption. SO_REUSEADDR, true)
                     .option(EpollChannelOption. SO_REUSEPORT, true)
                     .childOption(ChannelOption. SO_KEEPALIVE, true);

            int workerThreads = Runtime.getRuntime().availableProcessors();
           ChannelFuture future;
            for ( int i = 0; i < workerThreads; ++i) {
                future = b.bind( port).await();
                 if (!future.isSuccess())
                      throw new Exception(String. format("fail to bind on port = %d.",
                                 port), future.cause());
           }
           Runtime. getRuntime().addShutdownHook (new Thread(){
                 @Override
                 public void run(){
                     workerGroup.shutdownGracefully();
                     bossGroup.shutdownGracefully();
                }
           });
     }

打成jar包,在CentOS 7下面˜qè¡ŒåQŒæ£€æŸ¥åŒä¸€ä¸ªç«¯å£æ‰€æ‰“开的文件句柄ã€?/p>

# lsof -i:8000
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3515 root   42u  IPv6  29040      0t0  TCP *:irdmi (LISTEN)
java    3515 root   43u  IPv6  29087      0t0  TCP *:irdmi (LISTEN)
java    3515 root   44u  IPv6  29088      0t0  TCP *:irdmi (LISTEN)
java    3515 root   45u  IPv6  29089      0t0  TCP *:irdmi (LISTEN)

同一˜q›ç¨‹åQŒä½†æ‰“开的文件句柄是不一æ ïLš„ã€?/p>

UDP服务器,多个¾U¿ç¨‹¾l‘同一个端å?/h4>
/**
 * UDP谚语服务器,单进½E‹å¤š¾U¿ç¨‹¾l‘定同一端口½Cø™Œƒ
 */
public final class QuoteOfTheMomentServer {

       private static final int PORT = Integer.parseInt(System. getProperty("port" ,
                   "9000" ));

       public static void main(String[] args) throws Exception {
             final EventLoopGroup group = new EpollEventLoopGroup();

            Bootstrap b = new Bootstrap();
            b.group(group).channel(EpollDatagramChannel. class)
                        .option(EpollChannelOption. SO_REUSEPORT, true )
                        .handler( new QuoteOfTheMomentServerHandler());

             int workerThreads = Runtime.getRuntime().availableProcessors();
             for (int i = 0; i < workerThreads; ++i) {
                  ChannelFuture future = b.bind( PORT).await();
                   if (!future.isSuccess())
                         throw new Exception(String.format ("Fail to bind on port = %d.",
                                     PORT), future.cause());
            }

            Runtime. getRuntime().addShutdownHook(new Thread() {
                   @Override
                   public void run() {
                        group.shutdownGracefully();
                  }
            });
      }
}
}

@Sharable
class QuoteOfTheMomentServerHandler extends
            SimpleChannelInboundHandler<DatagramPacket> {

       private static final String[] quotes = {
                   "Where there is love there is life." ,
                   "First they ignore you, then they laugh at you, then they fight you, then you win.",
                   "Be the change you want to see in the world." ,
                   "The weak can never forgive. Forgiveness is the attribute of the strong.", };

       private static String nextQuote() {
             int quoteId = ThreadLocalRandom.current().nextInt( quotes .length );
             return quotes [quoteId];
      }

       @Override
       public void channelRead0(ChannelHandlerContext ctx, DatagramPacket packet)
                   throws Exception {
             if ("QOTM?" .equals(packet.content().toString(CharsetUtil. UTF_8))) {
                  ctx.write( new DatagramPacket(Unpooled.copiedBuffer( "QOTM: "
                              + nextQuote(), CharsetUtil. UTF_8), packet.sender()));
            }
      }

       @Override
       public void channelReadComplete(ChannelHandlerContext ctx) {
            ctx.flush();
      }

       @Override
       public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
            cause.printStackTrace();
      }
}

同样也要‹‚€‹¹‹ä¸€ä¸‹ç«¯å£æ–‡ä»¶å¥æŸ„打开情况åQ?/p>

# lsof -i:9000
COMMAND  PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
java    3181 root   26u  IPv6  27188      0t0  UDP *:cslistener
java    3181 root   27u  IPv6  27217      0t0  UDP *:cslistener
java    3181 root   28u  IPv6  27218      0t0  UDP *:cslistener
java    3181 root   29u  IPv6  27219      0t0  UDP *:cslistener

ž®ç»“

以上为Netty+SO_REUSEPORT多线½E‹ç»‘å®šåŒä¸€ç«¯å£çš„ä¸€äº›æƒ…å†µï¼Œæ˜¯äØ“è®°è²ã€?/p>

]]>
SO_REUSEPORT学习½W”è®°http://www.aygfsteel.com/yongboy/archive/2015/02/12/422893.htmlnieyongnieyongThu, 12 Feb 2015 08:50:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/02/12/422893.htmlhttp://www.aygfsteel.com/yongboy/comments/422893.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/02/12/422893.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/422893.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422893.html

前言

本篇用于记录学习SO_REUSEPORT的笔记和心得åQŒæœ«ž®¾è¿˜ä¼šæä¾›ä¸€ä¸ªbindpž®å·¥å…·ä¹Ÿèƒ½äؓ已有的程序äín受这个新的特性ã€?/p>

当前Linux¾|‘络应用½E‹åºé—®é¢˜

˜qè¡Œåœ¨Linux¾pȝ»Ÿä¸Šç½‘¾lœåº”用程序,ä¸ÞZº†åˆ©ç”¨å¤šæ ¸çš„优势,一般ä‹É用以下比较典型的多进½E?多线½E‹æœåŠ¡å™¨æ¨¡åž‹åQ?/p>

  1. 单线½E‹listen/acceptåQŒå¤šä¸ªå·¥ä½œçº¿½E‹æŽ¥æ”¶ä“Q务分发,虽CPU的工作负载不再是问题åQŒä½†ä¼šå­˜åœ¨ï¼š
    • 单线½E‹listeneråQŒåœ¨å¤„理高速率‹¹·é‡˜qžæŽ¥æ—Óž¼Œä¸€æ ·ä¼šæˆäؓ瓉™¢ˆ
    • CPU¾~“存行丢失套接字¾l“æž„(socket structure)现象严重
  2. 所有工作线½E‹éƒ½accept()在同一个服务器套接字上呢,一样存在问题:
    • 多线½E‹è®¿é—®server socket锁竞争严é‡?/li>
    • 高负载下åQŒçº¿½E‹ä¹‹é—´å¤„理不均衡åQŒæœ‰æ—‰™«˜è¾?:1不均衡比ä¾?/li>
    • 坯D‡´CPU¾~“存行蟩è·?cache line bouncing)
    • 在繁忙CPU上存在较大åšg˜q?/li>

上面模型虽然可以做到¾U¿ç¨‹å’ŒCPU核绑定,但都会存在:

  • 单一listener工作¾U¿ç¨‹åœ¨é«˜é€Ÿçš„˜qžæŽ¥æŽ¥å…¥å¤„ç†æ—¶ä¼šæˆäØ“ç“‰™¢ˆ
  • ¾~“存行蟩è·?/li>
  • 很难做到CPU之间的负载均è¡?/li>
  • 随着核数的扩展,性能òq¶æ²¡æœ‰éšç€æå‡

比如HTTP CPS(Connection Per Second)åžåé‡åÆˆæ²¡æœ‰éšç€CPU核数增加呈现¾U¿æ€§å¢žé•¿ï¼š 
Image(2)

Linux kernel 3.9带来了SO_REUSEPORTç‰ÒŽ€§ï¼Œå¯ä»¥è§£å†³ä»¥ä¸Šå¤§éƒ¨åˆ†é—®é¢˜ã€?/p>

SO_REUSEPORT解决了什么问�/h3>

linux man文档中一ŒD‰|–‡å­—描˜q°å…¶ä½œç”¨åQ?/p>

The new socket option allows multiple sockets on the same host to bind to the same port, and is intended to improve the performance of multithreaded network server applications running on top of multicore systems.

SO_REUSEPORT支持多个˜q›ç¨‹æˆ–者线½E‹ç»‘定到同一端口åQŒæé«˜æœåС噍½E‹åºçš„æ€§èƒ½åQŒè§£å†³çš„问题åQ?/p>

  • 允许多个套接å­?bind()/listen() 同一个TCP/UDP端口
    • 每一个线½E‹æ‹¥æœ‰è‡ªå·Þqš„æœåŠ¡å™¨å¥—æŽ¥å­—
    • 在服务器套接字上没有了锁的竞äº?/li>
  • 内核层面实现负蝲均衡
  • 安全层面åQŒç›‘听同一个端口的套接字只能位于同一个用户下é?/li>

其核心的实现主要有三点:

  • 扩展 socket optionåQŒå¢žåŠ?SO_REUSEPORT 选项åQŒç”¨æ¥è®¾¾|?reuseportã€?/li>
  • 修改 bind ¾pȝ»Ÿè°ƒç”¨å®žçްåQŒä»¥ä¾¿æ”¯æŒå¯ä»¥ç»‘定到相同çš?IP 和端å?/li>
  • 修改处理新徏˜qžæŽ¥çš„实玎ͼŒæŸ¥æ‰¾ listener 的时候,能够支持在监听相å?IP 和端口的多个 sock 之间均衡选择ã€?/li>

代码分析åQŒå¯ä»¥å‚考引用资æ–?[多个˜q›ç¨‹¾l‘定相同端口的实现分析[Google Patch]]ã€?/p>

CPU之间òqŒ™¡¡å¤„理åQŒæ°´òqÏx‰©å±?/h4>

以前通过fork形式创徏多个子进½E‹ï¼ŒçŽ°åœ¨æœ‰äº†SO_REUSEPORTåQŒå¯ä»¥ä¸ç”¨é€šè¿‡forkçš„åŞ式,让多˜q›ç¨‹ç›‘听同一个端口,各个˜q›ç¨‹ä¸?code>accept socket fdä¸ä¸€æ øP¼Œæœ‰æ–°˜qžæŽ¥å»ºç«‹æ—Óž¼Œå†…核只会唤醒一个进½E‹æ¥acceptåQŒåƈ且保证唤醒的均衡性ã€?/p>

模型½Ž€å•,¾l´æŠ¤æ–¹ä¾¿äº†ï¼Œ˜q›ç¨‹çš„管理和应用逻辑解耦,˜q›ç¨‹çš„管理水òqÏx‰©å±•权限下攄¡»™½E‹åºå‘?½Ž¡ç†å‘˜ï¼Œå¯ä»¥æ ÒŽ®å®žé™…˜q›è¡ŒæŽ§åˆ¶˜q›ç¨‹å¯åЍ/关闭åQŒå¢žåŠ äº†ç‰|´»æ€§ã€?/p>

˜q™å¸¦æ¥äº†ä¸€ä¸ªè¾ƒä¸ºå¾®è§‚的水åã^扩展思èµ\åQŒçº¿½E‹å¤šž®‘是否合适,状态是否存在共享,降低单个˜q›ç¨‹çš„资源依赖,针对无状态的服务器架构最为适合了ã€?/p>

新特性测试或多个版本共存

可以很方便的‹¹‹è¯•新特性,同一个程序,不同版本同时˜qè¡Œä¸­ï¼Œæ ÒŽ®˜qè¡Œ¾l“果军_®šæ–°è€ç‰ˆæœ¬æ›´˜q­ä¸Žå¦ã€?/p>

针对对客æˆïL«¯è€Œè¨€åQŒè¡¨é¢ä¸Šæ„Ÿå—ä¸åˆ°å…¶å˜åŠ¨ï¼Œå› äØ“˜q™äº›å·¥ä½œå®Œå…¨åœ¨æœåŠ¡å™¨ç«¯è¿›è¡Œã€?/p>

服务器无¾~é‡å?切换

æƒÏx³•是,我们˜q­ä»£äº†ä¸€ç‰ˆæœ¬åQŒéœ€è¦éƒ¨¾|²åˆ°¾U¿ä¸ŠåQŒäؓ之启动一个新的进½E‹åŽåQŒç¨åŽå…³é—­æ—§ç‰ˆæœ¬˜q›ç¨‹½E‹åºåQŒæœåŠ¡ä¸€ç›´åœ¨˜qè¡Œä¸­ä¸é—´æ–­åQŒéœ€è¦åã^衡过度。这ž®±åƒErlang语言层面所提供的热更新一栗÷€?/p>

æƒÏx³•不错åQŒä½†æ˜¯å®žé™…操作è“v来,ž®×ƒ¸æ˜¯é‚£ä¹ˆåã^滑了åQŒè¿˜å¥½æœ‰ä¸€ä¸?a >hubtimeå¼€æºå·¥å…øP¼ŒåŽŸç†ä¸?code>SIGHUP信号处理å™?SO_REUSEPORT+LD_RELOADåQŒå¯ä»¥å¸®åŠ©æˆ‘ä»¬è½»æ‘ÖšåˆŽÍ¼Œæœ‰éœ€è¦çš„同学可以‹‚€å‡ø™¯•用一下ã€?/p>

SO_REUSEPORT已知问题

SO_REUSEPORTæ ÒŽ®æ•°æ®åŒ…的四元¾l„{src ip, src port, dst ip, dst port}和当前绑定同一个端口的服务器套接字数量˜q›è¡Œæ•°æ®åŒ…分发。若服务器套接字数量产生变化åQŒå†…æ æ€¼šæŠŠæœ¬è¯¥ä¸Šä¸€ä¸ªæœåŠ¡å™¨å¥—æŽ¥å­—æ‰€å¤„ç†çš„å®¢æˆïL«¯˜qžæŽ¥æ‰€å‘送的数据包(比如三次握手期间的半˜qžæŽ¥åQŒä»¥åŠå·²¾lå®Œæˆæ¡æ‰‹ä½†åœ¨é˜Ÿåˆ—中排队的连接)分发到其它的服务器套接字上面åQŒå¯èƒ½ä¼šå¯ÆD‡´å®¢æˆ·ç«¯è¯·æ±‚失败,一般可以ä‹É用:

  • 使用固定的服务器套接字数量,不要在负载繁忙期间轻易变åŒ?/li>
  • 允许多个服务器套接字å…׃ínTCPè¯äh±‚è¡?Tcp request table)
  • 不ä‹Éç”¨å››å…ƒç»„ä½œäØ“Hash倯D¿›è¡Œé€‰æ‹©æœ¬åœ°å¥—接字处理,挑选隶属于同一个CPU的套接字

与RFS/RPS/XPS-mq协作åQŒå¯ä»¥èŽ·å¾—è¿›ä¸€æ­¥çš„æ€§èƒ½åQ?/p>

  • 服务器线½E‹ç»‘定到CPUs
  • RPS分发TCP SYN包到对应CPUæ æ€¸Š
  • TCP˜qžæŽ¥è¢«å·²¾l‘定到CPU上的¾U¿ç¨‹accept()
  • XPS-mq(Transmit Packet Steering for multiqueue)åQŒä¼ è¾“队列和CPU¾l‘定åQŒå‘送数æ?/li>
  • RFS/RPS保证同一个连接后¾l­æ•°æ®åŒ…都会被分发到同一个CPUä¸?/li>
  • ¾|‘卡接收队列已经¾l‘定到CPUåQŒåˆ™RFS/RPS则无™å»è®¾¾|?/li>
  • 需要注意硬件支持与å?/li>

目的嘛,数据包的软硬中断、接收、处理等在一个CPUæ æ€¸ŠåQŒåƈ行化处理åQŒå°½å¯èƒ½åšåˆ°èµ„源利用最大化ã€?/p>

SO_REUSEPORT不是一贴万能膏�/h4>

虽然SO_REUSEPORT解决了多个进½E‹å…±åŒç»‘å®?监听同一端口的问题,但根据新‹¹ªæž—晓峰同学‹¹‹è¯•¾l“果来看åQŒåœ¨å¤šæ ¸æ‰©å±•层面也未能够做到理想的线性扩展:

可以参考Fastsocket在其基础之上的改˜q›ï¼Œé“¾æŽ¥åœ°å€ã€?/p>

支持SO_REUSEPORT的Tengine

淘宝的Tengine已经支持了SO_REUSEPORTç‰ÒŽ€§ï¼Œåœ¨å…¶‹¹‹è¯•报告中,有一个简单测试,可以看出来相å¯Òޝ”SO_REUSEPORT所带来的性能提升åQ?/p>

使用SO_REUSEPORT以后åQŒæœ€æ˜Žæ˜¾çš„æ•ˆæžœæ˜¯åœ¨åŽ‹åŠ›ä¸‹ä¸å®¹æ˜“å‡ºçŽîC¸¢è¯äh±‚的情况,CPU均衡性åã^½EŸë€?/p>

Java支持否?

JDK 1.6语言层面不支持,至于以后的版本,ç”׃ºŽæš‚时没有使用刎ͼŒä¸å¤šè¯´ã€?/p>

Netty 3/4版本默认都不支持SO_REUSEPORTç‰ÒŽ€§ï¼Œä½†Netty 4.0.19以及之后版本才真正提供了JNI方式单独包装的epoll native transport版本åQˆåœ¨Linux¾pȝ»Ÿä¸‹è¿è¡Œï¼‰åQŒå¯ä»¥é…¾|®ç±»ä¼égºŽSO_REUSEPORT½{‰ï¼ˆJAVA NIIO没有提供åQ‰é€‰é¡¹åQŒè¿™éƒ¨åˆ†æ˜¯åœ¨io.netty.channel.epoll.EpollChannelOption中定义(在线代码部分åQ‰ã€?/p>

在linux环境下ä‹É用epoll native transportåQŒå¯ä»¥èŽ·å¾—å†…æ ¸å±‚é¢ç½‘¾lœå †æ ˆå¢žå¼ºçš„¾U¢åˆ©åQŒå¦‚何ä‹É用可参è€?a >Native transports文档ã€?/p>

使用epoll native transport倒也½Ž€å•,¾cÕd½Eä½œæ›¿æ¢åQ?/p>

NioEventLoopGroup → EpollEventLoopGroup
NioEventLoop → EpollEventLoop
NioServerSocketChannel → EpollServerSocketChannel
NioSocketChannel → EpollSocketChannel

比如写一个PING-PONG应用服务器程序,¾cÖM¼¼ä»£ç åQ?/p>

public void run() throws Exception {
    EventLoopGroup bossGroup = new EpollEventLoopGroup();
    EventLoopGroup workerGroup = new EpollEventLoopGroup();
    try {
        ServerBootstrap b = new ServerBootstrap();
        ChannelFuture f = b
                .group(bossGroup, workerGroup)
                .channel(EpollServerSocketChannel.class)
                .childHandler(new ChannelInitializer<SocketChannel>() {
                    @Override
                    public void initChannel(SocketChannel ch)
                            throws Exception {
                        ch.pipeline().addLast(
                                new StringDecoder(CharsetUtil.UTF_8),
                                new StringEncoder(CharsetUtil.UTF_8),
                                new PingPongServerHandler());
                    }
                }).option(ChannelOption.SO_REUSEADDR, true)
                .option(EpollChannelOption.SO_REUSEPORT, true)
                .childOption(ChannelOption.SO_KEEPALIVE, true).bind(port)
                .sync();
        f.channel().closeFuture().sync();
    } finally {
        workerGroup.shutdownGracefully();
        bossGroup.shutdownGracefully();
    }
}

若不要这么折腾,˜q˜æƒ³è®©ä»¥å¾€Java/Netty应用½E‹åºåœ¨ä¸åšä“Q何改动的前提下顺利在Linux kernel >= 3.9下同样äín受到SO_REUSEPORT带来的好处,不妨ž®è¯•一ä¸?a >bindpåQŒæ›´ä¸ºç»‹¹Žï¼Œ˜q™ä¸€éƒ¨åˆ†ä¸‹é¢ä¼šè®²åˆ°ã€?/p>

bindpåQŒäؓ已有应用æ·ÕdŠ SO_REUSEPORTç‰ÒŽ€?/h3>

以前所å†?a >bindpž®ç¨‹åºï¼Œå¯ä»¥ä¸ºå·²æœ‰ç¨‹åºç»‘定指定的IP地址和端口,一斚w¢å¯ä»¥çœåŽ»¼‹¬ç¼–码,另一斚w¢ä¹ŸäØ“‹¹‹è¯•提供了一些方ä¾Ñ€?/p>

另外åQŒäؓ了让以前没有¼‹¬ç¼–ç ?code>SO_REUSEPORT的应用程序可以在Linux内核3.9以及之后Linux¾pȝ»Ÿä¸Šä¹Ÿèƒ½å¤Ÿå¾—到内核增强支持åQŒç¨åšä¿®æ”¹ï¼Œæ·ÕdŠ æ”¯æŒã€?/p>

但要求如下:

  1. Linux内核(>= 3.9)支持SO_REUSEPORTç‰ÒŽ€?/li>
  2. 需要配¾|?code>REUSE_PORT=1

不满­‘³ä»¥ä¸Šæ¡ä»Óž¼Œæ­¤ç‰¹æ€§å°†æ— æ³•生效ã€?/p>

使用½Cø™ŒƒåQ?/p>

REUSE_PORT=1 BIND_PORT=9999 LD_PRELOAD=./libbindp.so java -server -jar pingpongserver.jar &

当然åQŒä½ å¯ä»¥æ ÒŽ®éœ€è¦è¿è¡Œå‘½ä»¤å¤š‹Æ¡ï¼Œå¤šä¸ª˜q›ç¨‹ç›‘听同一个端口,单机˜q›ç¨‹æ°´åã^扩展ã€?/p>

使用½Cø™Œƒ

使用python脚本快速构å»ÞZ¸€ä¸ªå°çš„示范原型,两个˜q›ç¨‹åQŒéƒ½ç›‘听同一个端å?0000åQŒå®¢æˆïL«¯è¯äh±‚˜q”回不同内容åQŒä»…供娱乐ã€?/p>

server_v1.pyåQŒç®€å•PING-PONGåQ?/p>

# -*- coding:UTF-8 -*-

import socket
import os

PORT = 10000
BUFSIZE = 1024

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', PORT))
s.listen(1)

while True:
    conn, addr = s.accept()
    data = conn.recv(PORT)
    conn.send('Connected to server[%s] from client[%s]\n' % (os.getpid(), addr))
    conn.close()

s.close()

server_v2.pyåQŒè¾“出当前时é—ß_¼š

# -*- coding:UTF-8 -*-

import socket
import time
import os

PORT = 10000
BUFSIZE = 1024

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', PORT))
s.listen(1)

while True:
    conn, addr = s.accept()
    data = conn.recv(PORT)
    conn.send('server[%s] time %s\n' % (os.getpid(), time.ctime()))
    conn.close()

s.close()

借助于bindp˜qè¡Œä¸¤ä¸ªç‰ˆæœ¬çš„程序:

REUSE_PORT=1 LD_PRELOAD=/opt/bindp/libindp.so python server_v1.py &
REUSE_PORT=1 LD_PRELOAD=/opt/bindp/libindp.so python server_v2.py &

模拟客户端请æ±?0‹Æ¡ï¼š

for i in {1..10};do echo "hello" | nc 127.0.0.1 10000;done

看看¾l“果吧:

Connected to server[3139] from client[('127.0.0.1', 48858)]
server[3140] time Thu Feb 12 16:39:12 2015
server[3140] time Thu Feb 12 16:39:12 2015
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48862)]
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48864)]
server[3140] time Thu Feb 12 16:39:12 2015
Connected to server[3139] from client[('127.0.0.1', 48866)]
Connected to server[3139] from client[('127.0.0.1', 48867)]

可以看出来,CPU分配很均衡,各自分配50%的请求量�/p>

嗯,虽是ž®çީ典P¼Œæœ‰äº›æ„æ€?:))

bindpçš„ä‹É用方æ³?/h4>

更多使用说明åQŒè¯·å‚è€?a >READMEã€?/p>

参考资�/h3>

]]>Fastsocket学习½W”记之小¾l“篇http://www.aygfsteel.com/yongboy/archive/2015/02/05/422760.htmlnieyongnieyongThu, 05 Feb 2015 07:21:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/02/05/422760.htmlhttp://www.aygfsteel.com/yongboy/comments/422760.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/02/05/422760.html#Feedback3http://www.aygfsteel.com/yongboy/comments/commentRss/422760.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422760.html

前言

前面啰啰嗦嗦的几½‹‡æ–‡å­—,各个斚w¢ä»‹ç»äº†FastsocketåQŒç›²äººæ‘¸è±¡ä¸€èˆ¬ï¼Œèƒ½åŠ›æœ‰é™åQŒè¿˜å¾—ç‘ô¾l­æ·±å…¥å­¦ä¹ ä¸æ˜¯ã€‚这不,åˆîCº†è¯¥å°¾l“æ”¶ž®„¡š„时候了ã€?/p>

¾~˜è“våQŒå†…核已¾læˆä¸ºç“¶é¢?/h3>

使用Linuxä½œäØ“æœåŠ¡å™¨ï¼Œåœ¨è¯·æ±‚é‡å¾ˆå°çš„æ—¶å€™ï¼Œæ˜¯ä¸ç”¨æ‹…å¿ƒå…¶æ€§èƒ½ã€‚ä½†åœ¨æ“v量的数据è¯äh±‚下,Linux内核在TCP/IP¾|‘络处理斚w¢åQŒå·²¾læˆä¸ºç“¶é¢ˆã€‚比如新‹¹ªåœ¨æŸå°HAProxy服务器上取样åQ?0%çš„CPUæ—‰™—´è¢«å†…核占用,应用½E‹åºåªèƒ½å¤Ÿåˆ†é…åˆ°è¾ƒå°‘çš„CPUæ—‰™’Ÿå‘¨æœŸçš„资源ã€?/p>

¾lè¿‡Haproxy¾pȝ»Ÿè¯¦å°½åˆ†æžåŽï¼Œå‘现大部分CPU资源消耗在kernel里,òq¶ä¸”在多核åã^åîC¸‹åQŒkernel在网¾lœåè®®æ ˆå¤„理˜q‡ç¨‹ä¸­å­˜åœ¨ç€å¤§é‡åŒæ­¥å¼€é”€ã€?/p>

åŒæ—¶åœ¨å¤šæ æ€¸Š˜q›è¡Œ‹¹‹è¯•åQŒHTTP CPS(Connection Per Second)åžåé‡åÆˆæ²¡æœ‰éšç€CPU核数增加呈现¾U¿æ€§å¢žé•¿ï¼š

Image(2)

内核3.9之前的Linux TCP调用

Image(13)

  • kernel 3.9之前的tcp socket实现
  • bind¾pȝ»Ÿè°ƒç”¨ä¼šå°†socketå’Œport˜q›è¡Œ¾l‘定åQŒåƈ加入全局tcp_hashinfoçš„bhash链表ä¸?
  • 所有bind调用都会查询˜q™ä¸ªbhash链表åQŒå¦‚æžœport被占用,内核会导致bindå¤ÞpÓ|
  • listen则是æ ÒŽ®ç”¨æˆ·è®„¡½®çš„队列大ž®é¢„å…ˆäØ“tcp˜qžæŽ¥åˆ†é…å†…å­˜½Iºé—´
  • 一个应用在同一个port上只能listen一‹Æ¡ï¼Œé‚£ä¹ˆä¹Ÿå°±åªæœ‰ä¸€ä¸ªé˜Ÿåˆ—来保存已经建立的连æŽ?
  • nginx在listen之后会fork处多个workeråQŒæ¯ä¸ªworker会ç‘ô承listençš„socketåQŒæ¯ä¸ªworker会创å»ÞZ¸€ä¸ªepoll fdåQŒåƈž®†listen fdå’Œaccept的新˜qžæŽ¥çš„fd加入epoll fd
  • 但是一旦新的连接到来,多个nginx worker只能排队accept˜qžæŽ¥˜q›è¡Œå¤„理
  • 对于大量的短˜qžæŽ¥åQŒaccept昄¡„¶æˆäؓ了一个瓶é¢?

Linux¾|‘络堆栈所存在问题

  • TCP处理&多核

    • 一个完整的TCP˜qžæŽ¥åQŒä¸­æ–­å‘生在一个CPUæ æ€¸ŠåQŒä½†åº”用数据处理可能会在另外一个核ä¸?
    • 不同CPU核心处理åQŒå¸¦æ¥äº†é”ç«žäº‰å’ŒCPU Cache MissåQˆæ‡L动不òqŒ™¡¡åQ?
    • 多个˜q›ç¨‹ç›‘听一个TCP套接字,å…׃ín一个listen queue队列
    • 用于˜qžæŽ¥½Ž¡ç†å…¨å±€å“ˆå¸Œè¡¨æ ¼åQŒå­˜åœ¨èµ„源竞äº?
    • epoll IO模型多进½E‹å¯¹accept½{‰å¾…åQŒæƒŠ¾Ÿ¤çްè±?br />
  • Linux VFS的同步损耗严é‡?/p>

    • Socket被VFS½Ž¡ç†
    • VFSå¯ÒŽ–‡ä»¶èŠ‚ç‚¹Inode和目录Dentry有同步需æ±?
    • SOCKET只需要在内存中存在即可,非严格意义上文äšg¾pȝ»ŸåQŒä¸éœ€è¦Inodeå’ŒDentry
    • 代码层面略过不必™åȝš„常规锁,但又保持了èƒö够的兼容æ€?

Fastsocket所作改˜q?/h3>
  1. TCP单个˜qžæŽ¥å®Œæ•´å¤„理做到了CPU本地化,避免了资源竞äº?
  2. 保持完整BSD socket API

CPU之间不共享数据,òq¶è¡ŒåŒ–各自独立处理TCP˜qžæŽ¥åQŒä¹Ÿæ˜¯å…¶é«˜æ•ˆçš„主要原因。其架构囑֏¯ä»¥çœ‹å‡ºå…¶æ”¹è¿›åQ?/p>

20150205215656_12

Fastsocket架构囑֏¯ä»¥å¾ˆæ¸…晰说明其大致结构,内核态和用户态通过ioctl函数传输。记得netmap在重写网卡驱动里面通过ioctl函数直接透传到用æˆäh€ä¸­åQŒå…¶æ›´äؓ高效åQŒä½†æ²¡æœ‰å®Œæ•´çš„TCP/IP¾|‘络堆栈支持嘛ã€?/p>

Fastsocket的TCP调用�/h4>

Image

  • 多个˜q›ç¨‹å¯ä»¥åŒæ—¶listen在同一个portä¸?
  • 动态链接库libfsocket.so拦截socket、bind、listen½{‰ç³»¾lŸè°ƒç”¨åƈ˜q›å…¥˜q™ä¸ªé“¾æŽ¥åº“进行处ç?
  • 对于listen¾pȝ»Ÿè°ƒç”¨åQŒfastsocket会记录下˜q™ä¸ªfdåQŒå½“应用通过epollž®†è¿™ä¸ªfd加入到epoll fdset中时åQŒlibfsocket.so会通过ioctl䏸™¯¥˜q›ç¨‹clone listen fdå…Œ™”çš„socket、sock、file的系¾lŸèµ„æº?
  • 内核模块ž®†cloneçš„socket再次调用bindå’Œlisten
  • bind¾pȝ»Ÿè°ƒç”¨‹‚€‹¹‹åˆ°å¦å¤–一个进½E‹ç»‘定到已经被绑定的portæ—Óž¼Œä¼šè¿›è¡Œç›¸å…Ïx£€æŸ?
  • 通过‹‚€æŸ¥sockž®†ä¼šè¢«è®°å½•到port相关联的一个链表中åQŒé€šè¿‡è¯¥é“¾è¡¨å¯ä»¥çŸ¥é“所有bind同一个portçš„sock
  • 而sock是关联到fd的,˜q›ç¨‹åˆ™æŒæœ‰fdåQŒé‚£ä¹ˆæ‰€æœ‰çš„资源ž®±å·²¾lå…³è”到一èµ?
  • æ–°çš„˜q›ç¨‹å†æ¬¡è°ƒç”¨listen¾pȝ»Ÿè°ƒç”¨çš„æ—¶å€™ï¼Œfastsocket内核会再‹Æ¡äؓ其关联的sock分配accept队列
  • ¾l“果是多个进½E‹ä¹Ÿž®±æ‹¥æœ‰äº†å¤šä¸ªaccept队列åQŒå¯é¿å…cpu cache miss
  • fastsocket提供ž®†æ¯ä¸ªlistenå’Œaccept的进½E‹ç»‘定到用户指定的CPUæ ?
  • 如果用户未指定,fastsocketž®†ä¼šä¸ø™¯¥˜q›ç¨‹é»˜è®¤¾l‘定一个空闲的CPUæ ?

Fastsocket短连接性能

在新‹¹ªæµ‹è¯•中åQŒåœ¨24核的安装有Centos 6.5的服务器上,借助于FastsocketåQŒNginxå’ŒHAProxy每秒处理˜qžæŽ¥æ•°æŒ‡æ ‡ï¼ˆconnection/secondåQ‰æ€§èƒ½å¾ˆæƒŠäººï¼Œåˆ†åˆ«å¢žåŠ 290%å’?20%。这也证明了åQŒFastsocket带来了TCP˜qžæŽ¥å¿«é€Ÿå¤„理的能力ã€?除此之外åQŒå€ŸåŠ©äºŽç¡¬ä»¶ç‰¹æ€§ï¼š

  • 借助于Intel­‘…çñ”¾U¿ç¨‹åQŒå¯ä»¥èŽ·å¾—å¦å¤?0%的性能增长
  • HAProxy代理服务器借助于网卡Flow-Directorç‰ÒŽ€§æ”¯æŒï¼Œåžåé‡å¯å¢žåŠ 15%

Fastsocket V1.0正式版从2014òq?月䆾开始已¾låœ¨æ–°æµªç”Ÿäñ”环境中ä‹É用,用作代理服务器,因此大家可以考虑是否可以采用。针å¯?.0版本åQŒä»¥ä¸‹çŽ¯å¢ƒè¾ƒä¸ºæ”¶ç›Šï¼š

  • 服务器至ž®‘不ž®‘于8个CPU核心
  • 短连接被大量使用
  • CPU周期大部分消耗在¾|‘络软中断和套接字系¾lŸè°ƒç”¨ä¸Š
  • 应用½E‹åºä½¿ç”¨åŸÞZºŽepoll的非é˜Õd¡žIO
  • 应用½E‹åºä½¿ç”¨å¤šä¸ª˜q›ç¨‹å•独接受˜qžæŽ¥

多线½E‹å˜›åQŒå°±å¾—éœ€è¦å‚è€ƒç¤ºèŒƒåº”ç”¨æ‰€æä¾›å®žè·µå»ø™®®äº†ã€?/p>

Nginx‹¹‹è¯•服务器配¾|?/h4>
  • nginx工作˜q›ç¨‹æ•°é‡è®„¡½®æˆCPU核数ä¸?
  • http keep-aliveç‰ÒŽ€§è¢«¼›ç”¨
  • ‹¹‹è¯•端http_load从nginx获取64字节静态文ä»Óž¼Œòq¶å‘é‡äØ“500*CPU核数
  • 启用内存¾~“存静态文件访问,用于排除¼‚ç›˜å½±å“
  • 务必¼›ç”¨accept_mutexåQˆå¤šæ ¸è®¿é—®accept产生锁竞争,另fastsocket内核模块为其去除了锁竞争åQ?

从下表测试图片中åQŒå¯ä»¥çœ‹åˆŽÍ¼š

  1. Fastsocketåœ?4核服务器辑ֈ°äº?75K Connection/SecondåQŒèŽ·å¾—äº†21倍的提升
  2. Centos 6.5在CPU核数增长åˆ?2核时òq¶æ²¡æœ‰å‘ˆçŽ°çº¿æ€§å¢žé•¿åŠ¿å¤ß_¼Œåè€Œåœ¨24核时下降åˆ?59k CPS
  3. Linux kernel 3.13åœ?4核时获得了近乎两倍于Centos 6.5的吞吐量åQ?83K CPSåQŒä½†åœ?2核后呈现出扩展性瓶é¢?

HAProxy重要配置

  • 工作˜q›ç¨‹æ•°é‡½{‰åŒäºŽCPU核数ä¸?
  • 需要启用RFD(Receive Flow Deliver)
  • http keep-alive需要禁ç”?
  • ‹¹‹è¯•端http_loadòq¶å‘é‡äØ“500*CPU核数
  • 后端服务器响应外å›?4个字节的消息

‹¹‹è¯•¾l“果中:

  • fastsocket呈现å‡ÞZº†æƒŠäh的扩展性能
  • 24核,Linux kernel 3.13成ç‡Wä¸?39K CPS
  • 24核,Centos 6.5借助FastsocketåQŒèŽ·å¾—äº†370K CPS的吞吐量

Fastsocket Throughput

实际部çÖv环境的成¾l?/h4>

Fastsocket Online

8核服务器¾U¿ä¸ŠçŽ¯å¢ƒ˜qè¡Œäº?4ž®æ—¶çš„æˆ¾l©ï¼Œå›¾a展示了部¾|²fastsocket之前CPU利用率,图b为部¾|²äº†fastsocekt之后的CPU利用率ã€?Fastsocket带来的收益:

  • 每个CPU核心负蝲均衡
  • òq›_‡CPU利用率降ä½?0%
  • HAProxy处理能力增长85%

20150205215658_398

其实吧,˜q™ä¸€å—期待新‹¹ªå…¬å¸ƒæ›´å¤šçš„æ•°æ®ã€?/p>

长连接的支持正在开发中

长连接支持,˜q˜æ˜¯éœ€è¦ç­‰ä¸€½{‰çš„。但是要支持什么类型长˜qžæŽ¥åQŸç™¾ä¸‡çñ”别应用服务器¾cÕdž‹åQŒè¿˜æ˜¯redisåQŒå¯èƒ½æ˜¯åŽè€…。虽然目前正做,但目前没有时间表åQŒä½†ç›®å‰æ‰€åšç‰¹æ€§æ€È»“如下åQ?/p>

  1. ¾|‘络堆栈的定åˆ?
    • SKB-PoolåQŒæ¯ä¸€CPU核对应一个预分配skb poolåQŒæ›¿æ¢å†…核缓冲区kernel slab
      • Percore skb pool
      • åˆåÆˆskb头部和数æ?
      • 本地Pool和重复åó@环ä‹É用的PoolåQˆFlow-DirectoråQ?
    • Fast-Epoll
      • 多进½E‹ä¹‹é—´TCP˜qžæŽ¥å…׃ín变得½E€ž®?
      • 在file¾l“构体中保存Epoll entryåQŒç”¨ä»¥èŠ‚çœè°ƒç”¨epoll_ctl时红黑树查询的开销
  2. 跨层的设�
    • Direct-TCPåQŒæ•°æ®åŒ…隶属于已建立套接字会直接跌™¿‡è·¯ç”±˜q‡ç¨‹
      • 记录TCP套接字的输入路由信息åQˆRecord input route information in TCP socketåQ?
      • 直接查找¾|‘络套接字在˜q›å…¥¾|‘络堆栈之前åQˆLookup socket directly before network stackåQ?
      • 从套接字è¯Õd–输入路由信息åQˆRead input route information from socketåQ?
      • 标记数据包被路有˜q‡ï¼ˆMark the packet as routedåQ?
    • Receive-CPU-Selection ¾cÖM¼¼äºŽRFSåQŒä½†æ›´è½»å·§ã€ç²¾å‡†ä¸Žå¿«é€?
      • 把当前CPUæ ¸id¾~–码到套接字中(Application marks current CPU id in the socketåQ?
      • 直接查询套接字在˜q›å…¥¾|‘络堆栈之前åQˆLookup socket directly before network stackåQ?
      • è¯Õd–套接字中包含的CPU核,然后发送给它(Read CPU id from socket and deliver accordinglyåQ?
    • RPS-Framework 数据包在˜q›å…¥¾|‘络堆栈之前åQŒè®©å¼€å‘者在内核模块之外定制数据包投递规则,扩充RPS功能

Redis‹¹‹è¯•¾l“æžœ

‹¹‹è¯•环境:

  • CPU: Intel E5 2640 v2 (6 core) * 2
  • NIC: Intel X520

Redis配置选项:

  • TCP持久˜qžæŽ¥
  • 8个Redis实例åQŒç»‘定不同端å?
  • 使用åˆ?个CPU核心åQŒåƈ且绑定CPUæ ?

‹¹‹è¯•¾l“æžœåQ?/p>

  • 仅开启RSSåQ?0%的吞吐量增加
  • 启用¾|‘卡Flow-Directorç‰ÒŽ€§ï¼š45%吞吐量增åŠ?

但需要注意:

  • ä»…äØ“å®žéªŒ‹¹‹è¯•阶段
  • 为V1.0补充åQŒNginxå’ŒHAProxy同样会收ç›?

Fastsocket v1.1

V1.1版本要增加长˜qžæŽ¥çš„æ”¯æŒï¼Œé‚£ä¹ˆ¾cÖM¼¼äºŽRedis的服务器应用½E‹åºž®±å¾ˆå—ç›Šäº†ï¼Œå› äØ“æ²¡æœ‰å…·ä½“çš„æ—¶é—´è¡¨åQŒåªèƒ½å¤Ÿæ…¢æ…¢½{‰å¾…了ã€?/p>

以后一些优化措�/h3>
  1. 在上下文切换æ—Óž¼Œé¿å…æ‹¯‚´æ“ä½œåQŒZero-Copy
  2. 中断机制完善åQŒå‡ž®‘中æ–?
  3. 支持扚w‡æäº¤åQŒé™ä½Žç³»¾lŸå‡½æ•°è°ƒç”?
  4. 提交到Linux kernelä¸Õdˆ†æ”¯ä¸ŠåŽ?
  5. HugeTLB/HugePage½{?

Fastsocketå’ŒmTCP½{‰ç®€å•对æ¯?/h3>

说是å¯Òޝ”åQŒå…¶å®žæ˜¯æˆ‘从mTCP论文中摘取出来,增加了Fastsocket一栏,可以看出äºÞZ»¬ä¸€ç›´åŠªåŠ›çš„è„šæ­¥ã€?/p>
Types Accept queue Conn. Locality Socket API Event Handling Packet I/O Application Mod- ification Kernel Modification
PSIO ,
DPDK ,
PF RING ,
netmap
No TCP stack Batched No interface for transport layer No
(NIC driver)
Linux-2.6 Shared None BSD socket Syscalls Per packet Transparent No
Linux-3.9 Per-core None BSD socket Syscalls Per packet Add option SO REUSEPORT No
Affinity-Accept Per-core Yes BSD socket Syscalls Per packet Transparent Yes
MegaPipe Per-core Yes lwsocket Batched syscalls Per packet Event model to completion I/O Yes
FlexSC,VOS Shared None BSD socket Batched syscalls Per packet Change to use new API Yes
mTCP Per-core Yes User-level socket Batched function calls Batched Socket API to mTCP API No
(NIC driver)
Fastsocket Per-core Yes BSD socket Ioctl + kernel calls Per packet Transparent No

有一个大致的印象åQŒä¹Ÿæ–¹ä¾¿å¯Òޝ”åQŒä½†˜q™åªèƒ½æ˜¯ä¸€ä¸ªæš‚时的摘要而已åQŒäh¾cÕd¯¹æ€§èƒ½çš„æÍæ±‚æ€ÀL˜¯æœç€æ›´å¥½çš„æ–¹å‘发展着ã€?/p>

部çÖvž®è¯•

怎么说呢åQŒFastsocketæ˜¯äØ“å¤§å®¶è€³ç†Ÿèƒ½è¯¦æœåŠ¡å™¨ç¨‹åºNginxåQŒHAProxy½{‰è€Œå¼€å‘çš„ã€‚ä½†è‹¥åº”ç”¨çŽ¯å¢ƒäØ“å¤§é‡çš„çŸ­˜qžæŽ¥åQŒåƈ且是ž®æ–‡ä»¶ç±»åž‹è¯·æ±‚,不需要强制支持Keep-aliveç‰ÒŽ€§ï¼ˆçŸ­è¿žæŽ¥è¦çš„æ˜¯å¿«é€Ÿè¯·æ±?相应åQŒç„¶åŽå…³é—­ï¼‰åQŒé‚£ä¹ˆç®¡ç†å‘˜å¯ä»¥ž®è¯•一下FastsocketåQŒè‡³äºŽéƒ¨¾|²ç­–略,选择性部¾|²å‡ åîC½œä¸ºå®žéªŒçœ‹çœ‹ç»“æžœã€?/p>

ž®ç»“

本系列到此算是告一ŒDµè½å•¦ã€‚以后呢åQŒè‡ªç„¶æ˜¯å¸Œæœ›Fastsocketž®½å¿«å‘布寚w•¿˜qžæŽ¥çš„æ”¯æŒï¼Œ˜q˜æœ‰æ›´é«˜æ€§èƒ½çš„æå‡å’¯ :))

资源引用



]]>
Fastsocket学习½W”记之内核篇http://www.aygfsteel.com/yongboy/archive/2015/02/04/422732.htmlnieyongnieyongWed, 04 Feb 2015 06:22:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/02/04/422732.htmlhttp://www.aygfsteel.com/yongboy/comments/422732.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/02/04/422732.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/422732.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422732.html

前言

前面分析Fastsocket慢慢凑成了几½‹‡çƒ‚文字åQŒè¦æŠŠä¸€ä»¶äº‹æƒ…坚持做下来åQŒæœ‰æ—¶å‘³åŒçˆµèœ¡ï¼Œä½†æ—¢ç„‰™€‰æ‹©äº†ï¼Œä¹Ÿå¾—¼‹¬ç€å¤´çš®åšä¸‹åŽ…R€‚闲话少è¯ß_¼Œæ–‡å½’正文。本文接自上½‹‡å†…核模块篇åQŒç‘ô¾l­è®°å½•学习Fastsocket内核的笔记内宏V€?/p>

Fastsocket建立在SO_REUSEPORT支持基础�/h3>

Linux kernel 3.9包含TCP/UDP支持多进½E‹ã€å¤š¾U¿ç¨‹¾l‘定同一个IP和端口的ç‰ÒŽ€§ï¼Œå?code>SO_REUSEPORTåQ›åœ¨å†…核层面同时也让¾U¿ç¨‹/˜q›ç¨‹ä¹‹é—´å„自独äínSOCKETåQŒé¿å…CPUæ æ€¹‹é—´ä»¥é”èµ„源争å¤?code>accept queue的调用。在fastsocket/kernel/net/sock.h定义sock_common¾l“æž„æ—Óž¼Œå¯ä»¥çœ‹åˆ°å…¶èín影:

unsigned char          skc_reuse:4;
unsigned char          skc_reuseport:4;

在多个socket.hæ–‡äšg中(比如fastsocket/kernel/include/asm/socket.håQ‰ï¼Œå®šä¹‰äº†SO_REUSESORT的变量å€û|¼š

#define SO_REUSEPORT     15

在fastsocket/kernel/net/core/sock.cçš„sock_setsockoptå’Œsock_getsockopt函数中,都有SO_REUSEPORTçš„èín影:

sock_setsockopt函数中:

case SO_REUSEADDR:
  sk->sk_reuse = valbool;
  break;
case SO_REUSEPORT:
  sk->sk_reuseport = valbool;
  break;

sock_getsockopt函数体中åQ?/p>

case SO_REUSEADDR:
  v.val = sk->sk_reuse;
  break;
case SO_REUSEPORT:
  v.val = sk->sk_reuseport;
  break;

åœ?code>SO_REUSEPORTç‰ÒŽ€§æ”¯æŒä¹‹å‰çš„事äšg驱动驱动服务器资源竞争:

ä¹‹åŽå‘¢ï¼Œå¯ä»¥çœ‹åšæ˜¯åÆˆè¡Œçš„äº†ï¼š

Fastsocket没有重复发明轮子åQŒåœ¨SO_REUSEPORT基础上进行进一步的优化½{‰ã€?/p>

嗯,后面准备写一个动态链接库ž®ç¨‹åºï¼Œæ‰“算让以前的没有¼‹¬ç¼–ç ?code>SO_REUSEPORT的程序也能够在Linux kernel >= 3.9¾pȝ»Ÿä¸Šäín受真正的端口重用的新ç‰ÒŽ€§çš„æ”¯æŒã€?/p>

Fastsocket架构�/h3>

Image

下面按照其架构图所½Cºå†…核层面从上到下一一列出ã€?/p>

虚拟文äšg¾pȝ»ŸVFS的改˜q?/h3>

因䨓Linux Kernel VFS的同步损耗严é‡?/p>

  • VFSå¯ÒŽ–‡ä»¶èŠ‚ç‚¹Inode和目录Dentry有同步需æ±?
  • 但SOCKET只需要在内存中存在即可,非严格意义上文äšg¾pȝ»ŸåQŒå…¶ä¸éœ€è¦èµ\å¾„ï¼Œä¸éœ€è¦äØ“Inodeå’ŒDentry加锁
  • 代码层面略过不必™åȝš„常规锁,但又保持了èƒö够的兼容æ€?

提交记录åQ?/p>

a209dfc vfs: dont chain pipe/anon/socket on superblock s_inodes list
4b93688 fs: improve scalability of pseudo filesystems

对VFS的改˜q›ï¼Œåœ¨æ‰€æå‡çš„æ€§èƒ½ä¸­å æœ‰è¶…˜q?0%的比例,效果非常明显åQ?/p>

Local Listen Table

对于多核多接攉™˜Ÿåˆ—来è¯ß_¼Œlinux原生的协议栈只能listen在一个socket上面åQŒåƈ且所有完成三‹Æ¡æ¡æ‰‹è¿˜æ²¡æ¥å¾—及被应用accept的套接字都会攑օ¥å…‰™™„带的accept队列中,accept¾pȝ»Ÿè°ƒç”¨å¿…须串行的从队列取出åQŒå½“òq¶å‘量较大时多核竞争åQŒè¿™ž®†æˆä¸ºæ€§èƒ½ç“‰™¢ˆåQŒåª„响徏立连接处理速度ã€?/p>

Local Listen TableåQŒfastsocket为每一个CPU核克隆监听套接字åQŒåƈ保存到其本地表中åQŒCPUæ æ€¹‹é—´ä¸ä¼šå­˜åœ¨accept的竞争关¾p…R€‚ä¸‹é¢äØ“å¼•ç”¨æè¿°å†…å®¹åQ?/p>

  • 每个core有一个listen socket table。应用程序徏立连接的时候,执行˜q‡ç¨‹ä¼šè°ƒç”¨local_listen()函数åQŒæœ‰ä¸¤ä¸ªå‚æ•°åQŒä¸€ä¸ªæ˜¯socket FDåQŒä¸€ä¸ªæ˜¯core number. new socket从原始的listen socket(global)拯‚´åˆ°per-core local socket table. ˜q™äº›å¯¹äºŽåº”用½E‹åºæ¥è¯´éƒ½æ˜¯é€æ˜Žçš„,提供¾l™åº”用程序的socketFD是抽象过的,隐藏了底层的实现ã€?
  • 当一个TCP SYN到达本机åQŒkernel首先去local listen table中找匚w…çš„listen socketåQŒå¦‚果找刎ͼŒž®±é€šè¿‡¾|‘卡RSS传递这个socketåˆîC¸€ä¸ªcoreåQŒå¦åˆ™å°±åŽ»global listen table中找ã€?
  • 定w”™æ–šw¢åQŒå½“˜q›ç¨‹å´©æºƒçš„话åQŒlocal listen socket会被关闭åQŒè¿›å…¥çš„˜qžæŽ¥ž®†ä¼šè¢«å¼•导到global Listen socketåQ?˜q™æ ·çš„话åQŒåˆ«çš„process可以处理˜q™äº›˜qžæŽ¥ã€‚由于local listen socketå’Œglobal listen socketå…׃ínFDåQŒæ‰€ä»¥kernelž®†ä¼šæŠŠæ–°çš„connet通知到相应的processã€?
  • 如果应用½E‹åº˜q›ç¨‹ä½¿ç”¨accept()¾pȝ»Ÿè°ƒç”¨åQŒé‚£ä¹ˆå¤„理过½E‹æ˜¯é¦–先去global listen table中查扑֒Œæ“ä½œåQˆå› ä¸ºæ˜¯è¯ÀL“ä½œï¼Œæ²¡æœ‰ä½¿ç”¨é”ï¼‰åQŒå¦‚果没有找刎ͼŒé‚£ä¹ˆåŽ»coreçš„local table中查找。如果找刎ͼŒž®Þp¿”回给应用½E‹åºã€‚由于listen的时候把socket¾l‘定åˆîCº†ä¸€ä¸ªcoreåQŒæ‰€ä»¥æŸ¥æ‰„¡š„时候也去这个coreçš„local table中查找ã€?
  • epoll兼容性,如果应用½E‹åºä½¿ç”¨epoll_ctl()¾pȝ»Ÿè°ƒç”¨åQŒæ¥æŠŠä¸€ä¸ªlisten socketæ·ÕdŠ åˆ°Epoll set中,那么localçš„listen socketå’Œglobalçš„listen socket都被epoll监控。事件发生的时候,epoll_wait()¾pȝ»Ÿè°ƒç”¨ä¼šè¿”回listen socketåQŒaccept()¾pȝ»Ÿè°ƒç”¨ž®×ƒ¼šå¤„理˜q™ä¸ªsocket。这样就保证了epoll实现的兼å®ÒŽ€§ã€?

使用‹¹ç¨‹å›¾æ¦‚括上面所˜qŽÍ¼š

Image(9)

Local Established Table

Linux内核使用一个全局的hash表以及锁操作来维护establised socketsåQˆè¢«ç”¨æ¥è·Ÿè¸ª˜qžæŽ¥çš„socketsåQ‰ã€‚Fastsocket æƒÏx³•是把全局table分散到per-Core tableåQŒå½“一个core需要访问socket的时候,只在隶属于自å·Þqš„table中搜索,因此不需要锁操纵åQŒä¹Ÿä¸å­˜åœ¨èµ„源竞争。由fastsocket建立的socket本地local established table中,其他的regular sockets保存在globalçš„table中。core首先去自å·Þqš„local table中查找(不需要锁åQ‰ï¼Œç„¶åŽåŽ»global中查找ã€?/p>

Image(10)

Receive Flow Deliver

默认情况下,应用½E‹åºä¸ÕdŠ¨å‘åŒ…çš„æ—¶å€™ï¼Œå‘å‡ºåŽÈš„包是通过正在执行本进½E‹çš„那个CPU 核(¾pȝ»Ÿåˆ†é…çš„)来完成的åQ›è€ŒæŽ¥æ”¶æ•°æ®åŒ…的时CPU 核是由前面提到的RSS或RPS来传递。这样一来,˜qžæŽ¥å¯èƒ½ç”׃¸åŒçš„两个CPU核来完成。连接应该在本地化处理。RFSå’ŒIntel¾|‘卡的FlowDirector可以从èÊYä»¶å’Œ¼‹¬äšg上缓解这¿Uæƒ…况,但是不完备ã€?/p>

RFDåQˆReceive Flow DeliveråQ‰ä¸»è¦çš„æ€æƒ³æ˜¯CPU核数ä¸ÕdŠ¨å‘è“v˜qžæŽ¥çš„æ—¶å€™å¯ä»¥æŠŠCPU core的标识和˜qžæŽ¥çš„source port¾~–码åˆîC¸€èµ—÷€‚CPU coreså’Œports的关¾pȝ”±ä¸€ä¸ªå…³¾p»é›†åˆæ¥å†›_®šã€coresåQŒports】, 对于一个portåQŒæœ‰å”¯ä¸€çš„一个core与之对应。当一个core来徏立connection的时候,RFD随机选择一个跟当前core匚w…çš„port。接收包的时候,RFD负责军_®š˜q™ä¸ªåŒ…应该让哪一个core来处理,如果当前core不是被选中的cpu coreåQŒé‚£ä¹ˆå°±deliver到选中的cpu coreã€?/p>

Image

一般来è¯ß_¼ŒRFD对代理程序收益比较大åQŒå•¾U¯çš„WEB服务器可以选择¼›ç”¨ã€?/p>

ž®ç»“

以上参考了大量的外部资料进行整理而成åQŒè¿›è€Œå¯ä»¥èŽ·å¾—ä¸€ä¸ªè¾ƒä¸ºæ•´ä½“çš„Fastsocket内核架构印象ã€?/p>

Fastsocket的努力,在单个TCP˜qžæŽ¥çš„管理从¾|‘卡触发的硬中断、èÊY中断、三‹Æ¡æ¡æ‰‹ã€æ•°æ®ä¼ è¾“、四‹Æ¡æŒ¥æ‰‹ç­‰å®Œæ•´çš„过½E‹åœ¨å®Œæ•´åœ¨ä¸€ä¸ªCPUæ æ€¸Š˜q›è¡Œå¤„理åQŒä»Žè€Œå®žçŽîCº†æ¯ä¸€ä¸ªCPU核心TCP资源本地化,˜q™æ ·ä¸ºå¤šæ ¸æ°´òqÏx‰©å±•打好了基础åQŒå‡ž®‘全局资源竞争åQŒåã^行化处理˜qžæŽ¥åQŒåŒæ—‰™™ä½Žæ–‡ä»‰™”çš„副作用åQŒåšåˆîCº†æžäؓ高效的短˜qžæŽ¥å¤„理æ–ÒŽ¡ˆåQŒä¸å¾—不赞啊ã€?/p>

引用资料åQ?/h3>

]]>Fastsocket学习½W”记之模块篇http://www.aygfsteel.com/yongboy/archive/2015/02/03/422694.htmlnieyongnieyongTue, 03 Feb 2015 05:26:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/02/03/422694.htmlhttp://www.aygfsteel.com/yongboy/comments/422694.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/02/03/422694.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/422694.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422694.html

前言

本篇学习Fastsocket内核模块fastsocket.soåQŒä½œä¸ºç”¨æˆäh€?code>libfsocket.so的内核态的支持åQŒå¤„ç?code>ioctl传递到/dev/fastsocket的数据,非常核心和基¼‹€ã€‚å—¯åQŒè¿˜æ˜¯å…ˆ¾˜»è¯‘åQŒéšåŽæŒŸå¸¦äº›ç‚¹è¯„˜q›æ¥ã€?/p>

模块介绍

Fastsocket内核模块 (fastsocket.ko) 提供若干ç‰ÒŽ€§ï¼Œòq¶å„自具有开启和关闭½{‰ä¸°å¯Œé€‰é¡¹å¯é…¾|®ã€?/p>

VFS 优化

CentOS 6.5带来的内栔R”ç«žäº‰å¤„处可见åQŒå¯¼è‡´æ— è®ºå¦‚何优化TCP/IP¾|‘络堆栈都不能够带来很好的性能扩展。比较严重锁竞争例子åQ?code>inode_lockå’?code>dcache_lockåQŒé’ˆå¯¹å¥—接字文äšg¾pȝ»Ÿsockfs而言åQŒåƈ不是必须。fastsocket通过在VFS初始化结构时提供fastpath快速èµ\径用以解å†Ïx­¤™åšw—®é¢˜ï¼Œå·²ç»å‘ä»£å·äØ“é¦™è‰åQˆvanillaåQ‰çš„内核提交了两处修改:

a209dfc vfs: dont chain pipe/anon/socket on superblock s_inodes list
4b93688 fs: improve scalability of pseudo filesystems

此项修改没有提供选项可供配置åQŒå› æ­¤æ‰€æœ‰fastsocket创徏的套接字sockets都会强制¾lç”±fastpath传输ã€?/p>

内核模块参数

enable_listen_spawn

fastsocket为每个CPU创徏了一个本地socket监听表(local listen tableåQ‰ï¼Œåº”用½E‹åºå¯ä»¥å†›_®šåœ¨ä¸€ä¸ªç‰¹å®šCPU内核上处理某个新的连接,具体ž®±æ˜¯é€šè¿‡æ‹¯‚´åŽŸå§‹ç›‘å¬å¥—æŽ¥å­—socketåQŒç„¶åŽæ’入到本地套接字socket监听表中。当新徏˜qžæŽ¥åœ¨æŸCPU处理æ—Óž¼Œ¾pȝ»Ÿå†…æ ¸ž®è¯•匚w…æœ¬åœ°socket监听表,匚w…æˆåŠŸä¼šæ’å…¥åˆ°æœ¬åœ°accept队列中。稍后,CPU会从本地accept队列中获取进行处理ã€?/p>

˜q™ç§æ–¹å¼æ¯ä¸€ä¸ªç½‘¾lœèÊY中断都会有隶属于自己本地套接字队列当新的˜qžæŽ¥˜q›æ¥æ—¶å¯ä»¥åŽ‹å…¥ï¼Œæ¯ä¸€ä¸ªè¿›½E‹ä»Žæœ¬åœ°é˜Ÿåˆ—䏭弹凸™¿žæŽ¥è¿›è¡Œå¤„理。当˜q›ç¨‹å’ŒCPU˜q›è¡Œ¾l‘定åQŒä¸€æ—¦æœ‰¾|‘卡接口军_®šæŠ•递到某个CPU内核上,那么包括¼‹¬ä¸­æ–­ã€èÊY中断、系¾lŸè°ƒç”¨ä»¥åŠç”¨æˆ¯‚¿›½E‹ï¼Œéƒ½ä¼šæœ‰è¿™ä¸ªCPU全程负责。好处就是客æˆïL«¯è¯äh±‚˜qžæŽ¥åœ¨æ²¡æœ‰é”çš„竞争环境下分散到各个CPU上被动处理本地连接ã€?/p>

本特性更适合以下情况åQ?/p>

  • ž®½å¯èƒ½å¤šçš„网卡Rx接收队列和CPU核数
  • 应用½E‹åºå·¥ä½œ˜q›ç¨‹è¢«é™æ€ç»‘定到每一个CPUä¸?

½W¬ä¸€¿Uæƒ…况下åQŒRPS可以在网卡接攉™˜Ÿåˆ—小于CPU核数时被使用。第二种æ–ÒŽ¡ˆå¯ä»¥æ»¡èƒö两个斚w¢åQ?/p>

  • 应用½E‹åºåœ¨å¯åŠ¨æ—¶è‡ªå·±¾l‘定工作˜q›ç¨‹å’ŒCPU亲和æ€?
  • 允许fastsocket自动为工作进½E‹ç»‘定CPU亲和æ€?

å› æ­¤åQ?code>enable_listen_spawnå…ähœ‰ä¸‰ä¸ªå€¼å¯ä¾›é…¾|®ï¼š

  • enable_listen_spawn=0: å½Õdº•¼›æ­¢
  • enable_listen_spawn=1: 启用åQŒä½†è¦æ±‚应用½E‹åºè‡ªå·±¾l‘定CPU
  • enable_listen_spawn=2 (默认å€?: 启用此特性,允许fastsocket为每一个工作进½E‹ç»‘定到CPUä¸?

enable_fast_epoll

ä¸€æ—¦å¼€å¯ï¼Œéœ€è¦äØ“æ–‡äšg¾l“构额外æ·ÕdŠ ä¸€å­—æ®µç”¨ä»¥ä¿å­˜æ–‡äšg与epitem的映ž®„å…³¾p»ï¼Œ˜q™æ ·å¯çœåŽÕdœ¨epoll_ctlæ–ÒŽ³•被调用时从epoll¾U¢é»‘树查找epitem的开销ã€?/p>

虽然此项优化有所修改epoll语义åQŒä½†å¸¦æ¥äº†å¥—接字性能提升。开启的前提是一个套接字只允许添加到一个epoll实例中,但不包括监听套接字。默认å€égØ“true可以适用于绝大多数应用程序,若你的程序不满èƒö条äšgž®±å¾—需要禁用了ã€?/p>

enable_fast_epoll 为布ž®”åž‹boolean选项:

  • enable_fast_epoll=0: ¼›ç”¨fast-epoll
  • enable_fast_epoll=1 (默认å€?: 开启fast-epoll

enable_receive_flow_deliver

RFDåQˆReceive Flow DeliveråQ‰ä¼šæŠŠäؓ新徏˜qžæŽ¥åˆ†é…çš„CPU IDž®è£…到其˜qžæŽ¥çš„端口号中,而不是随机选择新创建的ä¸ÕdЍ˜qžæŽ¥çš„æºç«¯å£˜q›è¡Œåˆ†é…åˆ°CPU上ã€?/p>

当应用从‹zÕdЍ˜qžæŽ¥æ”¶åˆ°æ•°æ®åŒ…RFD解码æ—Óž¼Œä¼šä»Žç›®çš„地端口上解析出对应的CPU内核IDåQŒç‘ô而è{发给对应的CPU内核。再加上listen_spawnåQŒä¿è¯äº†ä¸€ä¸ªè¿žæŽ¥CPU处理的完全本地化ã€?/p>

enable_receive_flow是一个布ž®”型选项:

  • enable_receive_flow=0 (默认å€?: ¼›ç”¨RFD
  • enable_receive_flow=1: 启用RFD

注意事项åQ?/p>

  • 当启用时åQŒåœ¨å½“前的实玎ͼŒRFD完全覆盖RPS½{–ç•¥åQŒåƈ使得RPS无效。若使用RPSåQŒè¯·¼›ç”¨æ­¤ç‰¹æ€?
  • ç”׃ºŽRFDåªä¼šå¯¹è¯¸å¦‚ä»£ç†åº”ç”¨ç¨‹åºæœ‰åˆ©ï¼Œæˆ‘ä»¬å»ø™®®åœ¨Web服务器上¼›ç”¨æ­¤ç‰¹æ€?

以上åQŒç¿»è¯‘完毕ã€?/em>

源码½Ž€å•梳ç?/h3>

fastsocket的内核模块相对èµ\径䨓fastsocket/module/åQŒé™¤äº†README.md外,ž®±æ˜¯ä¸¤ä¸ªè½¯è¿žæŽ¥æ–‡ä»¶äº†åQ?/p>

  • fastsocket.c ../kernel/net/fastsocket/fastsocket.c 真实环境下不存在˜q™ä¸ªæ–‡äšgåQŒå¯èƒ½æ˜¯½E‹åºBUG
  • fastsocket.h ../kernel/net/fastsocket/fastsocket.h 有对应头文äšg存在

换种说法åQŒfastsocket内核模块真正路径ä¸?code>fastsocket/kernel/net/fastsocketåQŒå…·ä½“æ–‡ä»¶åˆ—è¡¨äØ“åQ?/p>

  • Kconfig
  • Makefile
  • fastsocket.h 定义内核模块所使用到变量和æ–ÒŽ³•
  • fastsocket_core.c è´Ÿè´£æ–ÒŽ³•实现åQŒä¾›fastsocket_api.c调用
  • fastsocket_api.c 内核模块加蝲/卸蝲½{‰æ“ä½œï¼Œå¤„理前端动态链接库¾lç”±ioctl传递的数据

fastsocket_api.c实现内核模块接口åQŒåœ¨æºç é‡Œé¢æ³¨å†Œäº†å¥½å¤šæ–‡æ¡£æš‚时没有公开的可配置™å¹ç›®åQ?/p>

int enable_fastsocket_debug = 3;
/* Fastsocket feature switches */
int enable_listen_spawn = 2;
int enable_receive_flow_deliver;
int enable_fast_epoll = 1;
int enable_skb_pool;
int enable_rps_framework;
int enable_receive_cpu_selection = 0;
int enable_direct_tcp = 0;
int enable_socket_pool_size = 0;

module_param(enable_fastsocket_debug,int, 0);
module_param(enable_listen_spawn, int, 0);
module_param(enable_receive_flow_deliver, int, 0);
module_param(enable_fast_epoll, int, 0);
module_param(enable_direct_tcp, int, 0);
module_param(enable_skb_pool, int, 0);
module_param(enable_receive_cpu_selection, int, 0);
module_param(enable_socket_pool_size, int, 0);

MODULE_PARM_DESC(enable_fastsocket_debug, " Debug level [Default: 3]" );
MODULE_PARM_DESC(enable_listen_spawn, " Control Listen-Spawn: 0 = Disabled, 1 = Process affinity required, 2 = Autoset process affinity[Default]");
MODULE_PARM_DESC(enable_receive_flow_deliver, " Control Receive-Flow-Deliver: 0 = Disabled[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_fast_epoll, " Control Fast-Epoll: 0 = Disabled, 1 = Enabled[Default]");
MODULE_PARM_DESC(enable_direct_tcp, " Control Direct-TCP: 0 = Disbale[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_skb_pool, " Control Skb-Pool: 0 = Disbale[Default], 1 = Receive skb pool, 2 = Send skb pool,  3 = Both skb pool");
MODULE_PARM_DESC(enable_receive_cpu_selection, " Control RCS: 0 = Disabled[Default], 1 = Enabled");
MODULE_PARM_DESC(enable_socket_pool_size, "Control socket pool size: 0 = Disabled[Default], other are the pool size");

接收用户态的libfsocket.so通过ioctl传递过来的数据åQŒæ ¹æ®å‘½ä»¤è¿›è¡Œæ•°æ®åˆ†å‘:

static long fastsocket_ioctl(struct file *filp, unsigned int cmd, unsigned long __user u_arg)
{
     struct fsocket_ioctl_arg k_arg;

     if (copy_from_user(&k_arg, (struct fsocket_ioctl_arg *)u_arg, sizeof(k_arg))) {
          EPRINTK_LIMIT(ERR, "copy ioctl parameter from user space to kernel failed\n");
          return -EFAULT;
     }

     switch (cmd) {
     case FSOCKET_IOC_SOCKET:
          return fastsocket_socket(&k_arg);
     case FSOCKET_IOC_LISTEN:
          return fastsocket_listen(&k_arg);
     case FSOCKET_IOC_SPAWN_LISTEN:
          return fastsocket_spawn_listen(&k_arg);
     case FSOCKET_IOC_ACCEPT:
          return fastsocket_accept(&k_arg);
     case FSOCKET_IOC_CLOSE:
          return fastsocket_close(&k_arg);
     case FSOCKET_IOC_SHUTDOWN_LISTEN:
          return fastsocket_shutdown_listen(&k_arg);
     //case FSOCKET_IOC_EPOLL_CTL:
     //     return fastsocket_epoll_ctl((struct fsocket_ioctl_arg *)arg);
     default:
          EPRINTK_LIMIT(ERR, "ioctl [%d] operation not support\n", cmd);
          break;
     }
     return -EINVAL;
}

fastsocket/library/libsocket.h头文件定义的FSOCKET_IOC_* 操作状态码ž®Þpƒ½å¤Ÿä¸€ä¸€å¯¹åº”的上ã€?ioctl传输数据从用æˆäh€?>内核态,需要经˜q‡ä¸€‹Æ¡æ‹·è´è¿‡½E‹ï¼ˆcopy_from_useråQ‰ï¼Œç„¶åŽæ ÒŽ®cmd命ä×o˜q›è¡ŒåŠŸèƒ½è·¯ç”±ã€?/p>

libfsocket.so如何与fastsocket内核模块交互

通过指定的设备通道/dev/fastsocket˜q›è¡Œäº¤äº’åQ?/p>

  1. fastsocket内核模块注册要监听的通道讑֤‡åç§°ä¸?code>/dev/fastsocket
  2. libfsocket打开/dev/fastsocket讑֤‡èŽ·å¾—æ–‡äšg句柄åQŒå¼€å§?code>ioctl数据传é€?

ž®ç»“

½Ž€å•梳理了fastsocket内核模块åQŒä½†ä¸€æ ähœ‰å¾ˆå¤šçš„点没有涉及åQŒåŽé¢å¯èƒ½ä¼šåœ¨Fastsocket内核½‹‡ä¸­å†æ¬¡æ¢³ç†ä¸€ä¸‹ã€?/p>

]]>
Fastsocket学习½W”记之动态链接库½‹?/title><link>http://www.aygfsteel.com/yongboy/archive/2015/02/02/422658.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Mon, 02 Feb 2015 06:16:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2015/02/02/422658.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/422658.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2015/02/02/422658.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/422658.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/422658.html</trackback:ping><description><![CDATA[<div id="wmqeeuq" class="wrap"> <h3 id="-">前言</h3> <p>本篇为fastsocket的动态链接库学习½W”è®°åQŒå¯¹åº”æºç ç›®å½•äØ“ fastsocket/libraryåQŒå…ˆ¾˜»è¯‘README.mdæ–‡äšg内容åQŒåŽé¢æ·»åŠ ä¸Šä¸ªäh学习心得ã€?/p> <h3 id="-">介绍</h3> <p>动态链接库<code>libfsocket.so</code>åQŒäؓ已有应用½E‹åºæä¾›åŠ é€ŸæœåŠ¡ï¼Œå…ähœ‰å¯ç»´æŠ¤æ€§å’Œå…¼å®¹æ€§ã€?/p> <ul> <li><strong>可维护æ€?/strong>åQšFastsocket优化在于重新实现套接字的¾pȝ»Ÿè°ƒç”¨ä»Žè€Œè¾¾åˆ°Linux内核¾|‘络堆栈效率的提高。而应用程序是不用修改˜q™äº›¾pȝ»Ÿè°ƒç”¨åQŒå€ŸåŠ©äºŽFastsocketž®±å¯ä»¥è¾¾åˆ°åŠ é€Ÿçš„ç›®çš„ã€‚Fastsocket在内核模块提供了一个新的ioctl接口åQŒä¾›ä¸Šå±‚应用½E‹åºè°ƒç”¨ã€? </li><li><strong>兼容æ€?/strong>åQšè‹¥è®©åº”用程序必™åÖM¿®æ”¹å…¶ä»£ç ä»¥é€‚应新的¾pȝ»Ÿè°ƒç”¨æŽ¥å£åQŒåœ¨çŽ°å®žä¸–ç•Œä¸­è¿™å¾ˆéº»çƒ¦ä¹Ÿä¸å¯è¡Œã€‚å€ŸåŠ©äºŽlibfsocket拦截¾pȝ»Ÿè°ƒç”¨òq¶æä¾›æ–°çš„æŽ¥å£è¿›è¡Œæ›¿æ¢ç³»¾lŸè°ƒç”¨ï¼ŒåŒæ—¶Fastsocket提供了与BSD socket完全兼容的调用接口,˜q™ä‹É得应用程序在无需更改ä»ÖM½•代码的情况下åQŒå¯ç›´æŽ¥ä½¿ç”¨FastsocketåQŒèŽ·å¾—ç½‘¾lœåŠ é€Ÿçš„æ•ˆæžœã€?</li></ul> <h3 id="-">¾~–译</h3> <p>很简单,˜q›å…¥ç›®å½•之后åQŒæ‰§è¡?code>make</code>命ä×o¾~–译卛_¯åQ?/p><pre><code><span id="wmqeeuq" class="hljs-keyword">cd</span> fastsocket/library <span id="wmqeeuq" class="hljs-keyword">make</span> </code></pre> <p>最后在当前目录下生æˆ?code>libfsocket.so</code>æ–‡äšgã€?/p> <h3 id="-">用法</h3> <p>很简单的è¯ß_¼Œå€ŸåŠ©äº?code>LD_PRELOAD</code>加蝲<code>libfsocket.so</code>åQŒå¯åŠ¨åº”ç”¨ç¨‹åºï¼Œä»¥nginxä¸ÞZ¾‹åQ?/p><pre><code><span id="wmqeeuq" class="hljs-constant">LD_PRELOAD</span>=<span id="wmqeeuq" class="hljs-regexp">/your_path/fastsocket</span><span id="wmqeeuq" class="hljs-regexp">/library/libfsocket</span>.so nginx </code></pre> <p>若回滚,ž®Þq®€å•了åQŒç›´æŽ¥å¯åЍnginxž®Þp¡ŒåQ?/p><pre><code>nginx </code></pre> <p>注意事项åQ?/p> <ul> <li>¼‹®ä¿<code>fastsocket.ko</code>内核模块已经加蝲成功 </li><li>只对启动时以预加è½?code>libfsocket.so</code>的上层应用程序有效果 </li></ul> <h3 id="-">内部构äšg</h3> <p>Fastsocket拦截¾|‘络套接字的常规¾pȝ»Ÿè°ƒç”¨åQŒåƈ使用ioctl接口取代之ã€?/p> <p>若不依赖äº?code>libfsocket.so</code>åQŒä¸Šå±‚应用程序要想ä‹É用Fastsocket Percore-Listen-Table的特点,应用½E‹åºéœ€è¦åœ¨çˆ¶æµ½E‹forking之后åQŒä»¥åŠæå‰åšäº‹äšg循环åQˆevent loopåQ‰å¤„理,应用工作˜q›ç¨‹éœ€è¦æ‰‹åŠ¨è°ƒç”?code>listen_spawn</code>函数åQŒå¤åˆ¶å…¨å±€çš„监听套接字òq¶æ’入到本地监听表中ã€?/p> <p><code>libfsocket.so</code>ä¸ÞZ¸Šå±‚应用程序做äº?code>listien_spawn</code>的工作,用以保持应用½E‹åºçš„代码不变,æ–ÒŽ³•如下:</p> <ul> <li><code>libfsocket.so</code>跟踪所有需要监听的套接字文件句æŸ? </li><li><code>libfsocket.so</code>拦截äº?code>epoll_ctl</code>¾pȝ»Ÿè°ƒç”¨ </li><li>当监听到应用½E‹åºè°ƒç”¨<code>epoll_ctl</code>æ·ÕdŠ ç›‘å¬å¥—æŽ¥å­—æ–‡ä»¶å¥æŸ„åˆ°epollæ—Óž¼Œ<code>libfsocket.so</code>会调ç”?code>listen_spawn</code>æ–ÒŽ³• </li></ul> <p>不是所有应用程序都适合本方案,但nginx、haproxy、lighttpd与之配合ž®±å·¥ä½œå¾—相当不错。因此当你在其他应用½E‹åºä¸­æƒ³ä½¿ç”¨Percore-Listen-Tableç‰ÒŽ€§æ—¶åQŒè¯·åŠ¡å¿…ž®å¿ƒ‹¹‹è¯•了,¼‹®ä¿æ˜¯å¦åˆé€‚ã€?/p> <p><em>OKåQŒç¿»è¯‘完毕ã€?/em></p> <h3 id="-">源码一è§?/h3> <p>fastsocket/library用于构徏<code>libfsocket.so</code>动态链接库åQŒä¸»è¦ç»„成:</p> <ul> <li>Makefile ¾~–译脚本 </li><li>libsocket.h 头文ä»Óž¼Œå®šä¹‰å˜é‡ã€ç»“构等 </li><li>libsocket.c 动态链接库实现 </li></ul> <h3 id="libsocket-h">libsocket.h</h3> <p>定义äº?code>ioctl</code>åQˆäØ“Input/Output ConTroL¾~©å†™åQ‰å‡½æ•°å’Œä¼ªè®¾å¤?<code>/dev/fastsocket</code>)交换数据所使用到的几个命ä×oåQ?/p><pre><code><span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine IOC_ID <span id="wmqeeuq" class="hljs-number">0</span>xf5 <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_SOCKET _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x01) <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_LISTEN _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x02) <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_ACCEPT _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x03) <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_CLOSE _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x04) <span id="wmqeeuq" class="hljs-comment">//#define FSOCKET_IOC_EPOLL_CTL _IO(IOC_ID, 0x05)</span> <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_SPAWN_LISTEN _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x06) <span id="wmqeeuq" class="hljs-hexcolor">#def</span>ine FSOCKET_IOC_SHUTDOWN_LISTEN _IO(IOC_ID, <span id="wmqeeuq" class="hljs-number">0</span>x07) </code></pre> <p>紧接着定义了需要在用户态和内核态通过<code>ioctl</code>˜q›è¡Œäº¤äº’的结构:</p><pre><code><span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">fsocket_ioctl_arg</span> { <span id="wmqeeuq" class="hljs-attribute">u32</span> fd; <span id="wmqeeuq" class="hljs-attribute">u32</span> backlog; <span id="wmqeeuq" class="hljs-tag">union</span> <span id="wmqeeuq" class="hljs-tag">ops_arg</span> { <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">socket_accept_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">void</span> *sockaddr; <span id="wmqeeuq" class="hljs-attribute">int</span> *sockaddr_len; <span id="wmqeeuq" class="hljs-attribute">int</span> flags; }<span id="wmqeeuq" class="hljs-attribute">accept_op</span>; <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">spawn_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">int</span> cpu; }<span id="wmqeeuq" class="hljs-attribute">spawn_op</span>; <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">io_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">char</span> *buf; <span id="wmqeeuq" class="hljs-attribute">u32</span> buf_len; }<span id="wmqeeuq" class="hljs-attribute">io_op</span>; <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">socket_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">u32</span> family; <span id="wmqeeuq" class="hljs-attribute">u32</span> type; <span id="wmqeeuq" class="hljs-attribute">u32</span> protocol; }<span id="wmqeeuq" class="hljs-attribute">socket_op</span>; <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">shutdown_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">int</span> how; }<span id="wmqeeuq" class="hljs-attribute">shutdown_op</span>; <span id="wmqeeuq" class="hljs-tag">struct</span> <span id="wmqeeuq" class="hljs-tag">epoll_op_t</span> { <span id="wmqeeuq" class="hljs-attribute">u32</span> epoll_fd; <span id="wmqeeuq" class="hljs-attribute">u32</span> size; <span id="wmqeeuq" class="hljs-attribute">u32</span> ep_ctl_cmd; <span id="wmqeeuq" class="hljs-attribute">u32</span> time_out; <span id="wmqeeuq" class="hljs-attribute">struct</span> epoll_event *ev; }<span id="wmqeeuq" class="hljs-attribute">epoll_op</span>; }<span id="wmqeeuq" class="hljs-attribute">op</span>; }; </code></pre> <p>˜q™æ ·çœ‹æ¥åQ?code>ioctl</code>函数原型调用为:</p><pre><code> <span id="wmqeeuq" class="hljs-function"><span id="wmqeeuq" class="hljs-title">ioctl</span><span id="wmqeeuq" class="hljs-params">(/dev/fastsocket讑֤‡æ–‡äšg句柄åQ?FSOCKET_IOC_具体宏命令, fsocket_ioctl_arg¾l“构指针)</span></span> </code></pre> <p>现在大致能够弄清楚了内核态和用户态之间通过<code>ioctl</code>传递结构化的数据的方式了ã€?/p> <h3 id="libsocket-c-">libsocket.c ½Ž€è¦åˆ†æž?/h3> <p>˜qžæŽ¥å†…核模块已经注册好的讑֤‡½Ž¡é“<code>/dev/fastsocket</code>åQŒèŽ·å–åˆ°æ–‡äšg描述½W¦ï¼ŒåŒæ—¶åšäº›CPU˜q›ç¨‹¾l‘定的工ä½?/p><pre><code><span id="wmqeeuq" class="hljs-preprocessor">#<span id="wmqeeuq" class="hljs-keyword">define</span> INIT_FDSET_NUM 65536</span> ...... __attribute__((constructor)) <span id="wmqeeuq" class="hljs-keyword">void</span> fastsocket_init(<span id="wmqeeuq" class="hljs-keyword">void</span>) { <span id="wmqeeuq" class="hljs-keyword">int</span> ret = <span id="wmqeeuq" class="hljs-number">0</span>; <span id="wmqeeuq" class="hljs-keyword">int</span> i; cpu_set_t cmask; ret = open(<span id="wmqeeuq" class="hljs-string">"/dev/fastsocket"</span>, O_RDONLY); <span id="wmqeeuq" class="hljs-comment">// 建立fastsocket通道</span> <span id="wmqeeuq" class="hljs-keyword">if</span> (ret < <span id="wmqeeuq" class="hljs-number">0</span>) { FSOCKET_ERR(<span id="wmqeeuq" class="hljs-string">"Open fastsocket channel failed, please CHECK\n"</span>); <span id="wmqeeuq" class="hljs-comment">/* Just exit for safty*/</span> <span id="wmqeeuq" class="hljs-built_in">exit</span>(-<span id="wmqeeuq" class="hljs-number">1</span>); } fsocket_channel_fd = ret; fsocket_fd_set = <span id="wmqeeuq" class="hljs-built_in">calloc</span>(INIT_FDSET_NUM, <span id="wmqeeuq" class="hljs-keyword">sizeof</span>(<span id="wmqeeuq" class="hljs-keyword">int</span>)); <span id="wmqeeuq" class="hljs-keyword">if</span> (!fsocket_fd_set) { FSOCKET_ERR(<span id="wmqeeuq" class="hljs-string">"Allocate memory for listen fd set failed\n"</span>); <span id="wmqeeuq" class="hljs-built_in">exit</span>(-<span id="wmqeeuq" class="hljs-number">1</span>); } fsocket_fd_num = INIT_FDSET_NUM; <span id="wmqeeuq" class="hljs-comment">// å€égØ“65535</span> CPU_ZERO(&cmask); <span id="wmqeeuq" class="hljs-keyword">for</span> (i = <span id="wmqeeuq" class="hljs-number">0</span>; i < get_cpus(); i++) CPU_SET(i, &cmask); ret = sched_setaffinity(<span id="wmqeeuq" class="hljs-number">0</span>, get_cpus(), &cmask); <span id="wmqeeuq" class="hljs-keyword">if</span> (ret < <span id="wmqeeuq" class="hljs-number">0</span>) { FSOCKET_ERR(<span id="wmqeeuq" class="hljs-string">"Clear process CPU affinity failed\n"</span>); <span id="wmqeeuq" class="hljs-built_in">exit</span>(-<span id="wmqeeuq" class="hljs-number">1</span>); } <span id="wmqeeuq" class="hljs-keyword">return</span>; } </code></pre> <p>ä¸»è§‚ä¸Šï¼Œä»…ä»…æ˜¯äØ“äº†çŸ­˜qžæŽ¥è€Œè®¾¾|®çš„åQŒå®šä¹‰çš„fastsocketæ–‡äšg句柄数组大小ä¸?5535åQŒé’ˆå¯¹ç±»ä¼égºŽWEB Server、HTTP API½{‰çŽ¯å¢ƒèƒö够了åQŒé’ˆå¯¹ç™¾ä¸‡çñ”别的长连接服务器环境ž®×ƒ¸é€‚合了ã€?/p> <p>socket/listen/accept/close/shutdown/epoll_ctl½{‰å‡½æ•ŽÍ¼Œé€šè¿‡<code>dlsym</code>方式替换已有套接字系¾lŸå‡½æ•°ç­‰åQŒå…·ä½“的交互˜q‡ç¨‹ä½¿ç”¨<code>ioctl</code>替代一些系¾lŸè°ƒç”¨ã€?/p> <p>除了重写socket/listen/accept/close/shutdown½{‰å¥—接字接口åQŒåŒæ—¶ä¹Ÿå¯?code>epoll_ctl</code>æ–ÒŽ³•动了手术åQˆæ±Ÿæ¹–ä¼ ­a€CPU多核多进½E‹çš„epoll服务器存在惊¾Ÿ¤çŽ°è±¡ï¼‰åQŒæ›´å¥½åˆ©ç”¨å¤šæ ¸ï¼š</p><pre><code><span id="wmqeeuq" class="hljs-function"><span id="wmqeeuq" class="hljs-keyword">int</span> <span id="wmqeeuq" class="hljs-title">epoll_ctl</span><span id="wmqeeuq" class="hljs-params">(<span id="wmqeeuq" class="hljs-keyword">int</span> efd, <span id="wmqeeuq" class="hljs-keyword">int</span> cmd, <span id="wmqeeuq" class="hljs-keyword">int</span> fd, <span id="wmqeeuq" class="hljs-keyword">struct</span> epoll_event *ev)</span> </span>{ <span id="wmqeeuq" class="hljs-function"><span id="wmqeeuq" class="hljs-keyword">static</span> <span id="wmqeeuq" class="hljs-title">int</span> <span id="wmqeeuq" class="hljs-params">(*real_epoll_ctl)</span><span id="wmqeeuq" class="hljs-params">(<span id="wmqeeuq" class="hljs-keyword">int</span>, <span id="wmqeeuq" class="hljs-keyword">int</span>, <span id="wmqeeuq" class="hljs-keyword">int</span>, <span id="wmqeeuq" class="hljs-keyword">struct</span> epoll_event *)</span> </span>= NULL; <span id="wmqeeuq" class="hljs-keyword">int</span> ret; <span id="wmqeeuq" class="hljs-keyword">struct</span> fsocket_ioctl_arg arg; <span id="wmqeeuq" class="hljs-keyword">if</span> (fsocket_channel_fd >= <span id="wmqeeuq" class="hljs-number">0</span>) { arg.fd = fd; arg.op.spawn_op.cpu = -<span id="wmqeeuq" class="hljs-number">1</span>; <span id="wmqeeuq" class="hljs-comment">/* "Automatically" do the spawn */</span> <span id="wmqeeuq" class="hljs-keyword">if</span> (fsocket_fd_set[fd] && cmd == EPOLL_CTL_ADD) { ret = ioctl(fsocket_channel_fd, FSOCKET_IOC_SPAWN_LISTEN, &arg); <span id="wmqeeuq" class="hljs-keyword">if</span> (ret < <span id="wmqeeuq" class="hljs-number">0</span>) { FSOCKET_ERR(<span id="wmqeeuq" class="hljs-string">"FSOCKET: spawn failed!\n"</span>); } } } <span id="wmqeeuq" class="hljs-keyword">if</span> (!real_epoll_ctl) real_epoll_ctl = dlsym(RTLD_NEXT, <span id="wmqeeuq" class="hljs-string">"epoll_ctl"</span>); ret = real_epoll_ctl(efd, cmd, fd, ev); <span id="wmqeeuq" class="hljs-keyword">return</span> ret; } </code></pre> <p>å› äØ“å®šä¹‰äº†ä½œç”¨äºŽå†…éƒ¨çš„é™æ€å˜é‡?code>real_epoll_ctl</code>åQŒåªæœ‰åœ¨½W¬ä¸€‹Æ¡åŠ è½½çš„æ—¶å€™æ‰ä¼šè¢«èµ‹å€û|¼Œ<code>real_epoll_ctl = dlsym(RTLD_NEXT, "epoll_ctl")</code>åQŒåŽé¢è°ƒç”¨æ—¶é€šè¿‡<code>ioctl</code>把fsocket_ioctl_arg传递到内核模块中去ã€?/p> <p>其它socket/listen/accept/close/shutdown½{‰å¥—接字接口åQŒæµ½E‹ç±»ä¼¹{€?/p> <h3 id="-">ž®ç»“</h3> <p>以上½Ž€å•翻译、粗略分析用æˆäh€fastsocket动态链接库大致情况åQŒè‹¥è¦è“v作用åQŒéœ€è¦å’Œå†…核态Fastsocket˜q›è¡Œäº¤äº’、传递数据才能够作用的很好ã€?/p></div><img src ="http://www.aygfsteel.com/yongboy/aggbug/422658.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2015-02-02 14:16 <a href="http://www.aygfsteel.com/yongboy/archive/2015/02/02/422658.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Fastsocket学习½W”记之网卡设¾|®ç¯‡http://www.aygfsteel.com/yongboy/archive/2015/01/30/422592.htmlnieyongnieyongFri, 30 Jan 2015 08:49:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/01/30/422592.htmlhttp://www.aygfsteel.com/yongboy/comments/422592.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/01/30/422592.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/422592.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422592.html

前言

前面¾~–译安装好了包含有fastsocket的内核模块,以及fastsocket的动态链接库libfsocket.soåQŒä¸‹é¢å…¶å®žå°±å¯ä»¥è®„¡½®¾|‘卡了ã€?/p>

下面ä¸ÞZ¸€äº›åè¯è§£é‡Šï¼Œä¸Šä¸‹æ–‡ä¸­éœ€è¦ä‹É用到åQ?/p>

  • RxåQšæŽ¥æ”‰™˜Ÿåˆ?
  • TxåQšå‘送队åˆ?

本文¾|‘卡讄¡½®½W”记内容åQŒå¤§éƒ¨åˆ†æ¥è‡ªäºŽfastsocket源码相对路径fastsocket/scripts/åQ›è€è§„矩,先翻译ã€?/p>

¾|‘卡讄¡½®½‹‡ç¿»è¯‘原æ–?/h3>

介绍

nic.sh脚本负责¾|‘卡配置以尽可能的最大化受益于fastsocket带来的问题。给定一个网卡接口, 它调整接口的各种ç‰ÒŽ€§ä»¥åŠä¸€äº›ç³»¾lŸé…¾|®ã€?/p>

相关配置

中断和CPU的亲和�/h5>

每个¾|‘卡¼‹¬äšg队列及其兌™”中断¾l‘定åˆîC¸åŒçš„CPU核心。若¼‹¬äšg队列数大于CPU核数åQŒé˜Ÿåˆ—需要配¾|®æˆå¾ªçޝround-robin方式åQ?Irqbalance服务需要被¼›ç”¨ä»¥é˜²å…¶æ›´æ”šw…¾|®ã€?/p>

中断阀速率

nic.sh脚本通过ethtool命ä×o讄¡½®æ¯ç§’中断æ•îC¸Šé™ï¼Œé˜²æ­¢ä¸­æ–­é£Žæš´ã€‚两个Rx中断间隔讄¡½®æˆè‡³ž®?33usåQŒçº¦3000个中断每¿U’ã€?/p>

RPS

为每个CPU核心与不同的¾|‘卡¼‹¬äšg队列之间建立一一映射对应关系åQŒè¿™æ ·CPU核心ž®±å¯ä»¥å¾ˆå‡åŒ€åœ°å¤„理网¾lœæ•°æ®åŒ…。当¾|‘卡¼‹¬äšg队列ž®äºŽCPU内核敎ͼŒnic.sh脚本利用RPS (Receive Packet Steering)软äšg方式òqŒ™¡¡˜q›å…¥‹¹é‡è´Ÿè²åQŒè¿™æ ·CPU和硬仉™˜Ÿåˆ—不存在对应关系。RPS机制可以让进入的数据包自由分发到ä»ÖM¸€CPUæ æ€¸Šã€?/p>

¾|‘卡接收产生的中断可以均衡分配到对应CPU上ã€?/p>

XPS

XPS (Transmit Packet Steering) 建立CPU内核和Tx发送队列映ž®„对应关¾p»ï¼ŒæŽŒæŽ§å‡ºç«™æ•°æ®åŒ…。系¾lŸæœ‰N个CPU核心åQŒè„šæœ¬ä¼šè®„¡½®XPS臛_°‘存在N个Tx队列在网卡接口上åQŒè¿™æ ·å°±å¯ä»¥å»ºç«‹CPU内核和Tx队列1å¯?的映ž®„å…³¾p…R€?/p>

¾|‘卡传送数据äñ”生的中断一样可以均很分配到CPU上,避免单个CPU核心˜q‡äºŽ¾Jå¿™ã€?/p>

IPTABLES

压测æ—Óž¼Œé˜²ç«å¢™iptables的规则会占用更多的CPU周期åQŒæœ‰æ‰€é™ä½Ž¾|‘络堆栈性能。因æ­?code>nic.sh脚本若检‹¹‹åˆ°iptables后台˜qè¡Œä¸­ä¼šç›´æŽ¥è¾“出报警信息åQŒæ½Cºå…³é—­ä¹‹ã€?/p>

nic.sh脚本脚本分析

¾lè¿‡éªŒè¯å¥½ç”¨çš„Intel和博通系列千兆和万兆¾|‘卡列表åQ?/p>

# igb
"Intel Corporation 82576 Gigabit Network Connection (rev 01)"
"Intel Corporation I350 Gigabit Network Connection (rev 01)"
# ixgbe
"Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)"
"Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)"
# tg3
"Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe"
"Broadcom Corporation NetXtreme BCM5761 Gigabit Ethernet PCIe (rev 10)"
# bnx2
"Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)"
"Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)"

若当前服务器没有以上¾|‘卡åQŒä¼šè­¦å‘Šä¸€ä¸‹ï¼Œæ— ç¢ã€?/p>

˜q™é‡ŒæŠŠä¸€äº›å¸¸è§„性的CPU、网卡驱动、网¾lœé˜Ÿåˆ—情冉|£€æŸ¥å•独抽取出来,重温好多已经遗忘的命令,有改变,˜q™æ ·å†™è¾ƒ½Ž€å•嘛åQŒä¾¿äºŽä»¥åŽä‹É用:

  • 直接查看CPU核数åQ?code>grep -c processor /proc/cpuinfo
  • 查看¾|‘卡软接攉™˜Ÿåˆ—æ•°åQ?code>ls /sys/class/net/eth0/queues | grep -c rx
  • 查看¾|‘卡软发生队列数åQ?code>ls /sys/class/net/eth0/queues | grep -c tx
  • 查看当前¾|‘卡¼‹¬äšg队列敎ͼšegrep -c eth0 /proc/interrupts
  • 查看¾|‘卡名称和版本号åQ?code>lspci | grep Ethernet | sed "s/Ethernet controller: //g"
  • 查看¾|‘卡驱动名称åQ?code>ethtool -i eth0 | grep driver

脚本先是获取CPU、网卡等信息åQŒæŽ¥ç€è®„¡½®ä¸­æ–­å•位¿U’内吞吐量: ethtool -C eth0 rx-usecs 333 > /dev/null 2>&1

启用XPSåQŒå……分借助¾|‘卡发送队列,提升¾|‘卡发送吞吐量åQŒæ˜¯æœ‰æ¡ä»‰™™åˆ¶çš„åQŒå‘送队列数要大于CPU核数åQ?/p>

if [[ $TX_QUEUES -ge $CORES ]]; then
    for i in $(seq 0 $((CORES-1))); do
     cpuid_to_mask $((i%CORES)) | xargs -i echo {} > /sys/class/net/$IFACE/queues/tx-$i/xps_cpus
    done
    info_msg "    XPS enabled"
fi

接着判断是否可以启用PRSåQŒçœåŽÀL‰‹åŠ¨è®¾¾|®çš„éºÈƒ¦åQŒä½†å¯ç”¨RPS前提是CPU核数与网卡硬仉™˜Ÿåˆ—不相等åQ?/p>

if [[ ! $HW_QUEUES == $CORES ]]; then
    for i in /sys/class/net/$IFACE/queues/rx-*; do
     printf "%x\n" $((2**CORES-1)) | xargs -i echo {} > $i/rps_cpus;
    done
    info_msg "    RPS enabled"
else
    for i in /sys/class/net/$IFACE/queues/rx-*; do
     echo 0 > $i/rps_cpus;
    done
    info_msg "    RPS disabled"
fi

若没有ä‹É用fastsocketåQŒå•¾U¯å€ŸåŠ©äºŽRPSåQŒä¼šå¸¦æ¥å¤„理中断的CPU和处理当前数据包的CPU不是同一个,自然会造成CPU Cache MissåQˆCPU¾~“存丢失åQ‰ï¼Œé€ æˆž®‘许的性能影响åQŒäؓ了避免这¿Uæƒ…况,äºÞZ»¬ä¼šä¾èµ–于RFSåQˆReceive Flow SteeringåQ‰ã€?/p>

使用了fastsocket后,ž®×ƒ¸ç”¨è¿™ä¹ˆéº»çƒ¦äº†ã€?/p>

irqbalanceå’Œfastsocket有冲½Hï¼Œä¼šå¼ºåˆ¶ç¦ç”¨ï¼š

if ps aux | grep irqbalance | grep -v grep; then
    info_msg "Disable irqbalance..."
    # XXX Do we have a more moderate way to do this?
    killall irqbalance > /dev/null 2>&1
fi

脚本也包含了讄¡½®ä¸­æ–­å’ŒCPU的亲和性:

i=0
intr_list $IFACE $DRIVER | while read irq; do
    cpuid_to_mask $((i%CORES)) | xargs -i echo {} > /proc/irq/$irq/smp_affinity
    i=$((i+1))
done

è‹¥iptables服务存在åQŒä¼šå‹å–„廸™®®¼›ç”¨ä¼šå¥½ä¸€äº›ï¼Œæ¯•竟会带来性能损耗。文件打开句柄不大äº?024åQŒè„šæœ¬åŒæ ·ä¼šæé†’åQŒæ€Žä¹ˆè®„¡½®æ–‡äšg打开句柄åQŒå¯ä»¥å‚考以前博文ã€?/p>

Linux¾pȝ»Ÿ¾|‘络堆栈的常规扩展优化措æ–?/h3>

针对不ä‹É用fastsocket的服务器åQŒå½“前比较流行的针对¾|‘卡的网¾lœå †æ ˆæ€§èƒ½æ‰©å±•、优化措施,一般会使用到RSS、RPS、RFS、XFS½{‰æ–¹å¼ï¼Œä»¥ä¾¿å……分利用CPU多核和硬件网卡等自èín性能åQŒè¾¾åˆ°åƈè¡?òq¶å‘处理的目的。下面æ€È»“一个表æ û|¼Œå¯ä»¥å‡‘合看一下ã€?/p>

RSS
(Receive Side Scaling)
RPS
(Receive Packet Steering)
RFS
(Receive Flow Steering)
Accelerated RFS
(Accelerated Receive Flow Steering)
XPS
(Transmit Packet Steering)
解决问题 ¾|‘卡和驱动支æŒ?/td> 软äšg方式实现RSS 数据包äñ”生的中断和应用处理在同一个CPUä¸?/td> åŸÞZºŽRFS¼‹¬äšg加速的负蝲òqŒ™¡¡æœºåˆ¶ æ™ø™ƒ½é€‰æ‹©¾|‘卡多队列的队列快速发åŒ?/td>
内核支持 2.6.36开始引入,需要硬件支�/td> 2.6.35 2.6.35 2.6.35 2.6.38
廸™®® ¾|‘卡队列数和物理核数一ç›?/td> è‡Ïx­¤å¤šé˜Ÿåˆ—çš„¾|‘卡若RSS已经配置了,则不需要RPSäº?/td> 需要rps_sock_flow_entrieså’Œrps_flow_cnt属æ€?/td> éœ€è¦ç½‘å¡è®¾å¤‡å’Œé©±åŠ¨éƒ½æ”¯æŒåŠ é€Ÿã€‚åÆˆä¸”è¦æ±‚ntuple˜q‡æ×o已经通过ethtool启用 单传输队列的¾|‘卡无效åQŒè‹¥é˜Ÿåˆ—比CPUž®‘,å…׃ín指定队列的CPU最好是与处理传输硬中断的CPUå…׃ín¾~“存的CPU
fastsocket ¾|‘卡ç‰ÒŽ€?/td> 改进版RPSåQŒæ€§èƒ½æå‡ 源码包含åQŒæ–‡æ¡£æ²¡æœ‰æ¶‰å?/td> 文档没有涉及 要求发送队列数要大于CPU核数
传送方å?/td> ¾|‘卡接收 内核接收 CPU接收处理 åŠ é€ŸåÆˆæŽ¥æ”¶ ¾|‘卡发送数æ?/td>

更具体优化措施,可以参考文档:Scaling in the Linux Networking Stackã€?/p>

另,若网卡支æŒ?code>Flow Director Filtersç‰ÒŽ€§ï¼ˆ˜q™é‡Œæœ‰ä¸€ä¸ªéžå¸¸æœ‰­‘£çš„动画介绍åQ?a >Intel® Ethernet Flow DirectoråQŒå€¼å¾—一看)åQŒé‚£ä¹ˆå¯ä»¥ç»“合Fastsocket一起加速。比如,在其所作Redis长连接测试中åQŒå¯ç”¨Flow-Directorç‰ÒŽ€§è¦æ¯”禁用可以带æ?5%的性能提升ã€?/p>

自然软硬¾l“合åQŒå¯ä»¥åšçš„æ›´å¥½ä¸€äº›å˜›ã€?/p>

å»¶äŽ×阅读åQ?a >多队列网卡简ä»?/a>

ž®ç»“

以上记录了学习fastsocket的网卡设¾|®è„šæœ¬æ–¹é¢ç¬”è®°ã€?/p>

不过呢,nic.sh脚本åQŒå€¼å¾—收藏åQŒæ— è®ÞZ‹É不ä‹É用fastsocketåQŒå¯¹¾U¿ä¸ŠæœåŠ¡å™¨ç½‘å¡è°ƒä¼˜éƒ½æ˜¯ä¸é”™é€‰æ‹©å“¦ã€?/p>

]]>
Fastsocket学习½W”记之安装篇http://www.aygfsteel.com/yongboy/archive/2015/01/30/422579.htmlnieyongnieyongFri, 30 Jan 2015 05:14:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/01/30/422579.htmlhttp://www.aygfsteel.com/yongboy/comments/422579.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/01/30/422579.html#Feedback2http://www.aygfsteel.com/yongboy/comments/commentRss/422579.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422579.html

前言

˜qè¡ŒçŽ¯å¢ƒä¸ºCentos 6.5¾pȝ»ŸåQŒé»˜è®¤å†…æ æ€Ø“2.6.32-431.el6.x86_64åQŒä¸‹é¢æ‰€æœ‰ç¼–译安装操作是ä»?code>root用户权限˜q›è¡Œæ“ä½œã€?/p>

¾~–译安装fastsocket内核

½W¬ä¸€æ­¥éœ€è¦ä¸‹è½½ä»£ç ï¼Œå½“ç„¶˜q™æ˜¯åºŸè¯äº†ï¼Œä¸‹è²åˆ?opt目录下:

 git clone https://github.com/fastos/fastsocket.git

¾~–译安装

下蝲之后åQŒéœ€è¦è¿›å…¥å…¶ç›®å½•中:

 cd fastsocket/kernel

å› äØ“æ˜¯æ¶‰åŠåˆ°å†…æ ¸å˜›ï¼Œ¾~–译之前需要做一些参数选项配置åQŒä‹Éç”?code>make config会篏æ­ÖMh的,好几千个选项参数需要你一一配置åQŒå¤§éƒ¨åˆ†æ—‰™—´åQŒé»˜è®¤é…¾|®å°±æŒºå¥½çš„:

 make defconfig

然后嘛,¾~–译内核的节奏:

 make

内核¾~–译相当耗费旉™—´åQŒè‡³ž®?0分钟旉™—´ã€‚之后紧接着是编译所需的内核模块,fastsocket模块åQ?/p>

 make modules_install

¾~–译完成之后åQŒæœ€åŽä¸€æ¡è¾“出,会看刎ͼš

DEPMOD 2.6.32-431.17.1.el6.FASTSOCKET

fastsocket内核模块¾~–译好之后,需要安装内核:

 make install

上面命ä×o其实执行shell脚本˜q›è¡Œå®‰è£…åQ?/p>

sh /opt/fastsocket/kernel/arch/x86/boot/install.sh 2.6.32-431.17.1.el6.FASTSOCKET arch/x86/boot/bzImage \ System.map "/boot"

基本上,fastsocket内核模块已经构徏安装完毕了,但需要告知Linux¾pȝ»Ÿåœ¨ä¸‹‹Æ¡å¯åŠ¨çš„æ—¶å€™åˆ‡æ¢åˆ°æ–°ç¼–è¯‘çš„ã€åŒ…å«æœ‰fastsocket模块的内核ã€?/p>

配置启动旉™œ€è¦åˆ‡æ¢çš„内核

˜q™éƒ¨åˆ†éœ€è¦åœ¨/etc/grup.conf中配¾|®ï¼ŒçŽ°åœ¨çœ‹ä¸€ä¸‹å…¶æ–‡äšg内容åQ?/p>

default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.32-431.17.1.el6.FASTSOCKET)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-431.17.1.el6.FASTSOCKET ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS rd_NO_MD rd_LVM_LV=vg_centos6/lv_swap crashkernel=auto LANG=zh_CN.UTF-8 rd_LVM_LV=vg_centos6/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
        initrd /initramfs-2.6.32-431.17.1.el6.FASTSOCKET.img
title CentOS (2.6.32-431.el6.x86_64)
        root (hd0,0)
        kernel /vmlinuz-2.6.32-431.el6.x86_64 ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS rd_NO_MD rd_LVM_LV=vg_centos6/lv_swap crashkernel=auto LANG=zh_CN.UTF-8 rd_LVM_LV=vg_centos6/lv_root  KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
        initrd /initramfs-2.6.32-431.el6.x86_64.img

defautl=1åQŒè¡¨½Cºç›®å‰ç³»¾lŸé€‰æ‹©çš„以原先内核作作为启动项åQŒåŽŸå…ˆä½äºŽç¬¬äºŒä¸ªroot (hd0,0)后面åQŒéœ€è¦åˆ‡æ¢åˆ°æ–°çš„内核下面åQŒéœ€è¦ä¿®æ”?code>default=0åQŒä¿å­˜åŽåQŒreboot重启¾pȝ»ŸåQŒä‹É之生效ã€?/p>

‹‚€‹¹‹ç”Ÿæ•?/h4>

¾pȝ»Ÿé‡å¯åŽï¼Œéœ€è¦åŠ è½½fastsocket模块到系¾lŸè¿è¡Œä¸­åŽ»ï¼Œä¸‹é¢ä»¥é»˜è®¤é€‰é¡¹å‚æ•°æ–¹å¼åŠ è²åQ?/p>

modprobe fastsocket

加蝲之后åQŒåˆ—出当前系¾lŸæ‰€åŠ è²æ¨¡å—åˆ—è¡¨åQŒæ£€æŸ¥æ˜¯å¦æˆåŠ?/p>

lsmod | grep fastsocket

若能看到¾cÖM¼¼è¾“出信息åQŒè¡¨½CºOKåQ?/p>

fastsocket 39766 0

开始构建libfastsocket.so链接库文�/h3>

上面内核模块安装好之后,可以构徏fastsocket的动态链接库文äšg了:

cd /opt/fastsocket/library/
make

可能会收åˆîC¸€äº›è­¦å‘Šä¿¡æ¯ï¼Œæ— ç¢åQ?/p>

gcc -g -shared -ldl -fPIC libsocket.c -o libfsocket.so -Wall
libsocket.c: 在函æ•?#8216;fastsocket_init’ä¸?
libsocket.c:59: 警告åQšéšå¼å£°æ˜Žå‡½æ•?#8216;open’
libsocket.c: 在函æ•?#8216;fastsocket_expand_fdset’ä¸?
libsocket.c:109: 警告åQšéšå¼å£°æ˜Žå‡½æ•?#8216;ioctl’
libsocket.c: 在函æ•?#8216;accept’ä¸?
libsocket.c:186: 警告åQšå¯¹æŒ‡é’ˆèµ‹å€¼æ—¶ç›®æ ‡ä¸ŽæŒ‡é’ˆç¬¦å·ä¸ä¸€è‡?
libsocket.c: 在函æ•?#8216;accept4’ä¸?
libsocket.c:214: 警告åQšå¯¹æŒ‡é’ˆèµ‹å€¼æ—¶ç›®æ ‡ä¸ŽæŒ‡é’ˆç¬¦å·ä¸ä¸€è‡?

最后,可以看到gcc¾~–译之后生成çš?code>libfsocket.so库文ä»Óž¼Œè¯´æ˜Ž¾~–译成功ã€?/p>

ž®ç»“

OKåQŒç¼–译安装到此结束,后面ž®±æ˜¯å¦‚何使用fastsocket的示范程序进行测试了ã€?/p>

]]>
Fastsocket学习½W”记之示范应用篇http://www.aygfsteel.com/yongboy/archive/2015/01/29/422550.htmlnieyongnieyongThu, 29 Jan 2015 09:16:00 GMThttp://www.aygfsteel.com/yongboy/archive/2015/01/29/422550.htmlhttp://www.aygfsteel.com/yongboy/comments/422550.htmlhttp://www.aygfsteel.com/yongboy/archive/2015/01/29/422550.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/422550.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/422550.html

前言

上篇介绍了如何构建安装fastsocket内核模块åQŒä¸‹é¢å°†åŸÞZºŽfastsocket/demo/README.mdæ–‡äšg¾˜»è¯‘整理而成ã€?/p>

嗯,下面˜q›å…¥¾˜»è¯‘½‹‡ã€?/p>

介绍

½Cø™Œƒä¸ÞZ¸€ä¸ªç®€å•TCP Server服务器程序,用于基准‹¹‹è¯•和剖析Liunx内核¾|‘络堆栈性能表现åQŒå½“ç„¶ä¹Ÿæ˜¯äØ“äº†æ¼”½CºFastsocket可扩展和其性能改进ã€?/p>

½Cø™Œƒåº”用åŸÞZºŽepoll模型和非é˜Õd¡žæ€§IOåQŒå¤„理网¾lœè¿žæŽ¥ï¼Œä½†åªæœ‰åœ¨å¤šæ ¸çš„æ¨¡å¼ä¸‹æ‰èƒ½å¤Ÿå·¥ä½œå¾—很好åQšç¨‹åºçš„æ¯ä¸€ä¸ªè¿›½E‹è¢«¾l‘定到CPU的不同核åQŒè“v始于CPU core 0åQŒå„自独立处理客æˆïL«¯˜qžæŽ¥è¯äh±‚ã€?/p>

½Cø™Œƒ½E‹åºå…ähœ‰ä¸¤ç§å·¥ä½œæ¨¡å¼åQ?/p>

  • 服务器模å¼?/strong>åQšä“Q何请求都会直接返回HTTP 200 OK
  • 代理模式åQšæœåŠ¡å™¨æŽ¥æ”¶åˆ°å®¢æˆïL«¯è¯äh±‚åQŒè{发给后端服务器,同时转发后端响应¾l™å®¢æˆïL«¯ã€?

˜q™æ˜¯ä¸€ä¸ªç®€å•傻瓜åŞ式的Tcp ServeråQŒä»…仅用于测试ä‹É用,使用时要求客æˆïL«¯å’ŒæœåŠ¡å™¨ç«¯åªèƒ½å¤Ÿæºå¸¦ä¸€ä¸ªpacket包大ž®çš„æ•°æ®åQŒå¦åˆ™ç¨‹åºä¼šå¤„理不了ã€?/p>

构徏

以下面方式进行构建:

cd demo && make

用法

最½Ž€å•方式以默认配置无参数åŞ式运行:

./server

参数如下:

  • -w worker_num: 定义˜q›ç¨‹æ•?
    • 默认å€égؓ当前可用CPU核心æ•îC¸ª˜q›ç¨‹.
  • -c start_core: 指定˜q›ç¨‹¾l‘定CPU核的开始烦引å€?
    • 默认å€égØ“ 0.
  • -o log_file: 定义日志文äšg名称
    • 默认å€égØ“ ./demo.log
  • -a listen_address: 指定监听地址åQŒ[ip:port]字符串组合åŞ式,支持æ·ÕdŠ å¤šä¸ªåœ°å€
    • 默认å€égØ“ 0.0.0.0:80
  • -x backend_address: 启动代理模式åQŒéœ€è¦å¡«å†™[ip:port]¾l„合形式地址åQŒæ”¯æŒå¤šä¸ªä»£ç†åœ°å€
    • 默认不开å?
  • -v: 启用详细¾lŸè®¡æ•°æ®è¾“出
    • 默认为禁ç”?
  • -d: 启动Debug调试模式åQŒè°ƒè¯•信息被写入日志文äšgä¸?
    • 默认¼›ç”¨
  • -k: 启用HTTP keepalive机制åQŒå½“前只能够工作在服务器模式ä¸?
    • 默认被禁ç”?

实例

在运行之前,需要注意两点:

  • ä¸ÞZº†è·‘满CPUåQŒéœ€è¦ç¡®ä¿å®¢æˆïL«¯å’ŒåŽç«¯æœåŠ¡å™¨éƒ½ä¸åº”è¯¥æˆäØ“ç“‰™¢ˆåQŒä¸¤¿Uå¯è¡Œæ–¹æ¡ˆï¼š
    • 提供­‘›_¤Ÿå¤šæœºå™¨ç”¨ä»¥å……当客æˆïL«¯å’ŒåŽç«¯æœåŠ¡å™¨è§’è‰²
    • 或在一台机器上充当客户端和后端服务器,使用fastsocketåQˆæŽ¨èæ–¹æ¡ˆï¼Œè¾ƒäؓ节省服务器)
  • 正确配置¾|‘卡åQŒè‹¥ä¸çŸ¥é“如何做åQŒå¯ä»¥å‚考源码中script目录

服务器模式示�/h4>

服务器模式至ž®‘需要两åîC¸»æœºï¼š

  • ä¸ÀLœºAä½œäØ“å®¢æˆ·ç«¯äñ”生HTTPè¯äh±‚
  • ä¸ÀLœºB为Web服务å™?

è®‘Ö®šæ¯å°ä¸ÀLœºCPU 12核,¾|‘络大概讄¡½®å¦‚下åQ?/p>

 +--------------------+     +--------------------+
 |       Host A       |     |      Host B        |
 |                    |     |                    |
 |    10.0.0.1/24     |-----|    10.0.0.2/24     |
 |                    |     |                    |
 +--------------------+     +--------------------+

下面是运行两åîC¸»æœºçš„æ­¥éª¤åQ?/p>

ä¸ÀLœºBåQ?/p>

  • Web服务器模式单独运行,开å?2个工作进½E‹ï¼Œå’ŒCPU核心æ•îC¸€è‡ß_¼š

    ./server -w 12 -a 10.0.0.2:80

  • 或者测试借助于Fastsocket所带来的性能

    LD_PRELOAD=../library/libfsocket.so ./server -w 12 -a 10.0.0.2:80

ä¸ÀLœºAåQ?/p>

  • ˜qè¡ŒApache ab½E‹åºä½œäØ“è¯äh±‚è€?br />ab -n 1000000 -c 100 http://10.0.0.2:80/
  • 单个Apache ab½E‹åºä¸èƒ½å¤Ÿä½“现服务器负蝲能力åQŒå¤šä¸ªab实例同时òq¶å‘˜qè¡Œå¯èƒ½ä¼šå¥½å¾ˆå¤šåQŒå¼€12个实例和CPU核心æ•îC¸€è‡ß_¼š N=12; for i in $(seq 1 $N); do ab -n 1000000 -c 100 http://10.0.0.2:80/ > /dev/null 2>&1; done

代理模式½Cø™Œƒ

代理模式下,需要三台机器:

  • ä¸ÀLœºAä½œäØ“å®¢æˆ·ç«¯äñ”生HTTPè¯äh±‚
  • ä¸ÀLœºBä½œäØ“ä»£ç†è§’è‰²
  • ä¸ÀLœºC则需要后端服务器

è®‘Ö®šæ¯å°æœºå™¨CPU内核æ•?2åQŒç½‘¾lœç»“构如下:

 +--------------------+     +--------------------+     +--------------------+
 |       Host A       |     |       Host B       |     |       Host C       |
 |                    |     |                    |     |                    |
 |    10.0.0.1/24     |     |    10.0.0.2/24     |     |     10.0.0.3/24    |
 +---------+----------+     +---------+----------+     +----------+---------+
           |                          |                           |
 +---------+--------------------------+---------------------------+---------+
 |                                 switch                                   |
 +--------------------------------------------------------------------------+

下面为具体的˜qè¡Œæ­¥éª¤åQ?/p>

ä¸ÀLœºBåQ?/p>

  • ä¸ÞZ»£ç†æœåŠ¡å™¨å¯åŠ¨12个进½E?br />./server -w 12 -a 10.0.0.2:80 -x 10.0.0.3:80
  • 或者以Fastsocket方式启动 LD_PRELOAD=../library/libsocket.so ./server -w 12 -a 10.0.0.2:80 -x 10.0.0.3:80

ä¸ÀLœºCåQ?/p>

  • 理论上ä“Q何WEB服务器都可以充当后端服务器,˜q™é‡Œå……分利用½Cø™Œƒ½E‹åºå¥½äº†åQ?br />./server -w 12 -a 10.0.0.3:80

ä¸ÀLœºAåQ?/p>

  • ä½œäØ“å®¢æˆ·ç«¯è¯·æ±‚ç”Ÿæˆå™¨åQŒåŒæ ·å¯åŠ?2个Apache ab实例åQ?br />N=12; for i in $(seq 1 $N); do ab -n 1000000 -c 100 http://10.0.0.2:80/ > /dev/null 2>&1; done

动手实践

以上¾˜»è¯‘完毕åQŒä¸‹é¢å°†æ˜¯æ ¹æ®ä¸Šé¢å†…容进行动手测试描˜q°å§ã€?/p>

安装Apache ab命ä×o

‹‚€æŸ¥ä¸€ä¸‹åŒ…含Apache ab命ä×oçš„èÊY件包åQ?/p>

yum provides /usr/bin/ab

可以看到¾cÖM¼¼äºŽå¦‚ä¸‹å­—æ øP¼š

httpd-tools-2.2.15-39.el6.centos.x86_64 : Tools for use with the Apache HTTP Server

安装它就可以�/p>

yum install httpd-tools

虚拟机测�/h4>

Windows 7专业版跑VMware Workstation 10.04虚拟机,两个Centos 6.5¾pȝ»ŸåQŒé…¾|®ä¸€è‡ß_¼Œ2G内存åQ?个CPU逻辑处理器核心ã€?/p>

客户端安装Apache ab命ä×o‹¹‹è¯•åQŒè·‘8个实例: for i in $(seq 1 8); do ab -n 10000 -c 100 http://192.168.192.16:80/ > /dev/null 2>&1; done

服务器端åQŒåˆ†åˆ«è®°å½•:
/opt/fast/server -w 8 LD_PRELOAD=../library/libfsocket.so ./server -w 8

服务器模式对�/h4>

两组数据å¯Òޝ”åQ?/p>
˜qè¡Œæ–¹å¼ 处理消耗时é—?¿U? 处理æ€ÀL•° òq›_‡æ¯ç§’处理æ•?/th> 最大å€?/th>
单独˜qè¡Œ 34s 80270 2361 2674
加蝲fasocket 28s 80399 2871 2964

代理模式数据

‹¹‹è¯•方式如上åQŒä¸‰å°æœåС噍åQˆæµ‹è¯•端+代理ç«?服务器端åQ‰é…¾|®ä¸€æ —÷€‚第一‹Æ¡ä»£ç†å•独启动,½W¬äºŒ‹Æ¡ä»£ç†é¢„加蝲fastsocket方式ã€?/p>
˜qè¡Œæ–¹å¼ 处理消耗时é—?¿U? 处理æ€ÀL•° òq›_‡æ¯ç§’处理æ•?/th> 最大å€?/th>
½W¬ä¸€‹Æ¡æµ‹è¯•后ç«?/td> 44s 80189 1822 2150
½W¬ä¸€‹Æ¡æµ‹è¯•代ç?/td> 44s 80189 1822 2152
½W¬äºŒ‹Æ¡æµ‹è¯•后ç«?/td> 42s 80051 1906 2188
½W¬äºŒ‹Æ¡æµ‹è¯•代ç?/td> 42s 80051 1906 2167

备注åQšè™šæ‹Ÿæœºä¸Šæ•°æ®ï¼Œä¸ä»£è¡¨çœŸå®žæœåŠ¡å™¨ä¸Šæ•°æ®ï¼Œä»…ä¾›å‚è€ƒã€?/p>

虽然åŸÞZºŽè™šæ‹Ÿæœºï¼Œ‹¹‹è¯•环境受限åQŒä½†ä¸€æ ·å¯ä»¥çœ‹åˆ°åŸºäºŽfastsocket服务器模型,处理性能有所提升åQšæ€ÖM½“处理旉™—´åQŒæ¯¿U’åã^均处理数åQŒä»¥åŠå¤„理上限等ã€?/p>

关于LD_PRELOAD注意事项

动态链接预先加载LD_PRELOAD虽是利器åQŒä½†ä¸æ˜¯ä¸‡èƒ½è¯ï¼ŒLD_PRELOAD遇到下面情况会失效:

  • 静态链接ä‹É用gcc -static参数把libc.so.6静态链入执行程序中
  • 讄¡½®æ‰§è¡Œæ–‡äšgçš„SUID权限åQŒå¯èƒ½ä¹Ÿä¼šå¯¼è‡´LD_PRELOAD失效åQˆå¦‚åQšchmod 4755 daemonåQ?

情况很复杂,ž®å¿ƒä¸ÞZ¸Šã€?/p>

ž®ç»“

学习òq¶æµ‹è¯•了fastsocket的源码示范部分,前后å¯Òޝ”可以看到fastsocket带来了处理性能的提升ã€?/p>

]]>
Fastsocket学习½W”记之开½‹?/title><link>http://www.aygfsteel.com/yongboy/archive/2015/01/29/422536.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Thu, 29 Jan 2015 06:11:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2015/01/29/422536.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/422536.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2015/01/29/422536.html#Feedback</comments><slash:comments>3</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/422536.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/422536.html</trackback:ping><description><![CDATA[<div id="wmqeeuq" class="wrap"> <h3 id="-">前言</h3> <p>以前在infoq上看到fastsocket的宣ä¼?a >《两周内在Github上收èŽ?800+个星åQšå†…核层¾|‘络栈优化项目Fastsocket背后的故事ã€?/a>åQŒæ˜Žç™½äº†fastsocket是什么:</p> <ul> <li>高度可扩展的socket </li><li>是Linux内核层面的底层网¾lœå®žçŽ? </li><li>在多核机器上可实现极ä½Ïx€§èƒ½åQ?4æ æ€»¥å†…的性能增长呈线性,˜qœè¶…˜q‡é»˜è®¤å†…核在12æ æ€»¥ä¸Šçš„æœºå™¨ž®×ƒ¼šå‡ºçŽ°æ€§èƒ½ä¸‹é™çš„æƒ…å†? </li><li>非常å®ÒŽ˜“使用和维护,应用代码无需变更 </li><li>针对kernel-2.6.32-431.17.1.el6/CentOS-6.5的实çŽ? </li><li>已经在新‹¹ªçš„生äñ”环境部çÖv </li><li>由新‹¹ªçš„æ“ä½œ¾pȝ»Ÿå›¢é˜Ÿå‘è“v </li><li>清华大学操作¾pȝ»Ÿå®žéªŒå®¤ã€Intel、哲思自ç”ÞpÊY件社区(ZeuuxåQ‰å¯¹è¯¥é¡¹ç›®å‡æœ‰æ”¯æŒ? </li><li> <p>å¼€æºåè®®äØ“GPLv2</p> <p>æ€ÖM¹‹å¾ˆå¸å¼•ähåQŒä»Žå†…核层面˜q›è¡Œä¼˜åŒ–TCP/IP¾|‘络堆栈åQŒä¸Šå±‚网¾lœåº”用程序不用做修改åQŒå°±å¯ä»¥å¾—到处理性能的提升,很赞åQ?/p></li></ul> <h3 id="fastsocket-">Fastsocket学习½W”记目录</h3> <p>˜q‘期有点ž®ç©ºé—ÔŒ¼Œå¼€å§‹å¯¹Fastsocket˜q›è¡Œå…Ïx³¨åQŒè™½ç„¶èµ„料不多,但也记录了几½‹‡è¿ž¾l­çš„学习½W”记。大部分½W”è®°åQŒæ€èµ\主要是优先翻译官æ–ÒŽ–‡æ¡£ï¼Œç´§æŽ¥ç€ä¼šå¤¹å¸¦äº›ä¸ªäh一些学习笔记ã€?/p> <p>fastsocket™å¹ç›®åœ°å€æ˜¯ï¼š<a >https://github.com/fastos/fastsocket</a>åQŒå…¶wiki和代码是本系列笔è®îC¸»è¦æ¥æºã€‚一开始想˜q›ä¸€æ­¥å…¨é¢è®¤çŸ¥fastsocketåQŒå‘现无从下手,只能从侧面开始一一旁敲侧击åQŒé€æ¸åŠ æ·±ã€‚æœ¬¾pÕdˆ—½W”è®°æ ÒŽ®å…¶æºç ç›®å½•结构划分特性,分开记录学习:</p> <ul> <li><a href="http://www.aygfsteel.com/yongboy/archive/2015/01/30/422579.html">¾~–译安装½‹?/a> </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/01/29/422550.html">½Cø™Œƒåº”用½‹?/a>åQŒå¯¹åº”demo目录 </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/01/30/422592.html">¾|‘卡讄¡½®½‹?/a>åQŒå¯¹åº”scripts目录 </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/02/02/422658.html">动态链接库½‹?/a>åQŒå¯¹åº”library目录 </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/02/03/422694.html">内核模块½‹?/a>åQŒå¯¹åº”module目录åQŒå®žé™…上是kernel/net/fastsocket目录 </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/02/04/422732.html">内核½‹?/a>åQŒå¯¹åº”kernel目录åQŒä¹Ÿæ˜¯å†…核模块篇 </li><li><a href="http://www.aygfsteel.com/yongboy/archive/2015/02/05/422760.html">ž®ç»“½‹?/a></li></ul> <p>怎么说呢åQŒèƒ½åŠ›æœ‰é™ï¼Œè‹¥å‘çŽ°é—®é¢?¾U°æ¼åQŒè¯·å¸®å¿™åŠæ—¶æŒ‡æ­£åQŒä¸èƒœæ„Ÿ‹È€ã€?/p> <h3 id="-">其它</h3> <p>代码贡献者,除了<a >林晓å³?/a>之外åQŒç›®å‰æäº¤æœ€ä¸ºé¢‘¾Jçš„æ˜?a >greewind</a>同学åQŒå…¶åšå®¢åœ°å€ä¸?a >http://blog.chinaunix.net/uid/23629988.html</a>åQŒä¹Ÿæ˜¯ä¸€ä½ç‰›äººã€?/p> <p>优秀的开源项目,æ€ÀL˜¯å¯ä»¥å¸å¼•到最优秀的开发者ã€?/p></div><img src ="http://www.aygfsteel.com/yongboy/aggbug/422536.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2015-01-29 14:11 <a href="http://www.aygfsteel.com/yongboy/archive/2015/01/29/422536.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>从网¾lœæ¸¸æˆä¸­å­¦ä¹ å¦‚何处理延迟http://www.aygfsteel.com/yongboy/archive/2014/12/23/421672.htmlnieyongnieyongTue, 23 Dec 2014 02:02:00 GMThttp://www.aygfsteel.com/yongboy/archive/2014/12/23/421672.htmlhttp://www.aygfsteel.com/yongboy/comments/421672.htmlhttp://www.aygfsteel.com/yongboy/archive/2014/12/23/421672.html#Feedback1http://www.aygfsteel.com/yongboy/comments/commentRss/421672.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/421672.html前言

¾|‘络延迟是客观存在的åQŒä½†¾|‘络游戏行业已经¿U¯ç¯äº†å¤§é‡ä¼˜è´¨ç»éªŒï¼Œä½¿ç”¨ä¸€äº›ç­–略、技术手ŒDµåœ¨å®¢æˆ·ç«¯æ¶ˆé™?隐藏掉åšg˜qŸå¸¦æ¥çš„不便åQŒä»¥ž®½å¯èƒ½çš„æŽ©ç›–实际存在的åšgæ—Óž¼ŒåŒæ—¶å®žçŽ°å®žæ—¶æ¸²æŸ“åQŒå°†ç”¨æˆ·å¸¦å…¥å¿«é€Ÿçš„交互式实时游戏中åQŒä½“验完¾ŸŽçš„互动å¨×ƒ¹ä¸­ã€?/p>

˜q™æ ·å¤„理¾l“æžœåQŒç¨é«˜åšg˜qŸçš„玩家也不会因为网¾lœä¸æ˜¯é‚£ä¹ˆå¥½åQŒä¹Ÿèƒ½å¤Ÿå¾ˆå’Œè°çš„与其它网¾lœå‚差不及玩家一èµäh¸¸æˆä¸­ã€?/p>

虽然延时军_®šäº†å®žæ—¶æ¸¸æˆçš„æœ€ä½Žååº”æ—¶é—ß_¼Œä½†æœ€é‡è¦çš„æ˜¯å®¢æˆ·ç«¯çœ‹èµäh¥è¦æµç•…。第一人称设计游戏åQˆFPSåQ‰å¯å·§å¦™çš„化解与规避åQŒæœ€¾lˆåœ¨é€‚合普遍用户¾|‘络环境ä¸?200ms)åQŒå®žçŽ°å®žæ—¶å¿«é€Ÿäº’åŠ¨æ¸¸æˆã€?/p>

嗯,下面ž®±æ˜¯˜q‘期脑补¾l“æžœã€?/p>

¾|‘游P2P & CS¾l“æž„

æ—©å…ˆ¾|‘游使用P2P¾|‘络拓扑在玩家之间进行交换数据通信。但P2P模型引è“v的高延迟在FPS游戏中无法被很好掩盖åQŒæ‰€æœ‰çŽ©å®¶çš„å»¶è¿Ÿå–å†³äºŽå½“å‰çŽ©å®¶ä¸­å»¶è¿Ÿæœ€çƒ‚çš„é‚£ä¸ªã€‚å¥½æ¯”æœ¨æ¡¶ç†è®ºï¼Œä½Žåšg˜qŸç½‘¾lœå¥½çš„玩家会被高延迟坏网¾lœçš„玩家拖篏。最¾lˆç»“果导è‡ß_¼Œæ‰€æœ‰çީ安™ƒ½ä¸å¤ªå¼€å¿ƒäº†ã€‚但在局域网环境下,不会感觉到åšg˜qŸå¸¦æ¥çš„问题。另åQŒæ¸¸æˆé€»è¾‘大部分都集中在客æˆïL«¯äº†ï¼Œå¾ˆéš¾é¿å…ä½œå¼Šè¡ŒäØ“ã€?/p>

C/S¾l“æž„¾|‘游åQ?/p>

  • C/S¾l“构在服务器端跑所有的游戏逻辑和输入响应,客户端只需要渲染以及把自己需要一些状态同步下来,把用戯‚¾“入发¾l™æœåŠ¡å™¨ç«¯ï¼Œç„¶åŽæ˜„¡¤º¾l“æžœž®±å¯ä»¥äº†
  • C/S¾l“æž„¾|‘游最大优点就是把延迟从玩家之间最卡玩家的延迟改变为玩家和服务器连接的延迟åQŒç»“果就是客æˆïL«¯åœ¨å¸¦å®½ä¸Šçš„要求也低了不少åQŒå› ä¸ºåªéœ€è¦æŠŠè¾“入发给服务器端ž®×ƒ»¥åŠæŽ¥æ”¶æœåŠ¡å™¨å“åº”ž®±å¤Ÿäº?
  • C/S¾l“æž„¾|‘游虽然转移了网¾lœåšg˜qŸçŸ›ç›„¡‚¹åQŒä½†çŽ°å®ž¾|‘络环境一样会带来较高的网¾lœåšg˜qŸã€‚客æˆïL«¯æ¯æ‰§è¡Œä¸€‹Æ¡æ“ä½œï¼Œéƒ½éœ€è¦ç­‰å¾…服务器端命令,那会用户操作会造成操纵卡顿现象。如何解军_‘¢åQŒå®¢æˆïL«¯ä¸€èˆ¬é‡‡ç”¨é¢„‹¹‹å’Œæ’值等方式在渲染层隐藏¾|‘络延迟

客户端预‹¹‹å’Œæ’å€?/h3>

服务器可以允许某些情况下客户端本地即时执行移动操作,˜q™ç§æ–ÒŽ³•可以¿UîCؓ客户端预‹¹‹ã€?/p>

比如游戏中键盘控制角色行赎ͼŒ˜q™ä¸ªæ—¶å€™å¯ä»¥åœ¨å¾ˆå°çš„æ—¶é—´æ®µåQˆæ—¶é—´å¾ˆçŸ­ï¼Œæ¯”如1-3¿U’)内预‹¹‹ç”¨æˆ¯‚¡ŒåŠ¨è½¨˜q¹ï¼ˆæ–¹å‘+加速度åQŒè§’色行走结果)åQŒè¿™éƒ¨åˆ†çš„命令客æˆïL«¯ä¼šå…¨éƒ¨å‘送到服务器端校验正确与否åQˆé¿å…çž¬é—´è{¿Uȝ­‰å¤–挂åQ‰ã€‚但客户端预‹¹‹æœ‰æ—¶ä¹Ÿä¸æ˜¯ç™‘Öˆ†ç™‘Ö‡†¼‹®ï¼Œéœ€è¦æœåС噍˜q›è¡Œ¾U æ­£åQˆæ‰€è°“服务器ž®±æ˜¯ä¸Šå¸åQŒThe sever is the manåQï¼‰ã€‚纠正结果可能就是游戏角色行走轨˜q¹å’Œå®¢æˆ·ç«¯é¢„‹¹‹è½¨˜qÒŽœ‰æ‰€åå·®åQŒå®¢æˆïL«¯å¯ä»¥ä½¿ç”¨æ’值方式(¾_—略来讲åQŒå°±æ˜¯è§’色在两点之间¿UÕdŠ¨æ¸²æŸ“çš„æ–¹å¼ï¼‰æ¸²æŸ“æ¸¸æˆè§’è‰²åœ¨æ¸¸æˆä¸–ç•Œä¸­çš„ä½¾|®è{¿UÕdã^滑一些,避免游戏角色从一个位¾|®çž¬é—´æ‹‰å›žåˆ°å¦ä¸€ä¸ªä½¾|®ï¼Œè®©äh有些莫名其妙ã€?/p>

插å€û|¼Œæœ‰ähä¹Ÿç§°ä¹‹äØ“è·¯å¾„è¡¥å¿åQŒéƒ½æ˜¯ä¸€å›žäº‹ã€‚插值的æ–ÒŽ³•会涉及到很多数学公式åQŒçº¿æ€§æ’倹{€ä¸‰‹Æ¡çº¿æ€§æ’值等åQŒæ¯”如这½‹‡æ–‡ç« æ‰€è®²åˆ°çš?a >插值那些事ã€?/p>

ž®ç»“åQšå®¢æˆïL«¯é¢„测åQŒæœåŠ¡å™¨ç«¯çº æ­£ï¼Œå®¢æˆ·ç«¯é‡‡ç”¨æ’å€¼æ–¹å¼å¾®è°ƒã€?/p>

针对交互的一¾Ÿ¤çŽ©å®Óž¼Œ¾|‘络好坏层次不齐åQŒæ¸¸æˆçš„一些操作效果可能需è¦?#8221;延迟补偿“½{–ç•¥˜q›è¡Œ

延迟补偿

延迟补偿是游戏服务器端执行的一¿Uç­–略,处理用户命ä×o回退到客æˆïL«¯å‘送命令的准确旉™—´åQˆåšg˜qŸå¯¼è‡ß_¼‰åQŒæ ¹æ®å®¢æˆïL«¯çš„具体情况进行修正,以牺牲游戏在伤害判定斚w¢çš„真实感来å×I补攻击行为等斚w¢çœŸå®žæ„Ÿï¼Œæœ¬è´¨ä¸Šæ˜¯ä¸€¿UæŠ˜ä¸­é€‰æ‹©ã€?/p>

主要注意åQŒåšg˜qŸè¡¥å¿ä¸æ˜¯å‘生在客户端ã€?/p>

关于延迟补偿的一个例子:

  1. 在FPS游戏中,玩家Aåœ?0.5¿U’时向目标对象玩家Bž®„击òq¶ä¸”å‡ÖM¸­åQŒå°„å‡ÖM¿¡æ¯è¢«æ‰“包发送(¾|‘络延迟100毫秒åQ‰ï¼ŒæœåŠ¡å™¨äºŽ10.6¿U’收刎ͼŒæ­¤æ—¶çީ家B可能已跑到另外一个位¾|®ã€?
  2. 若服务器仅仅åŸÞZºŽæŽ¥æ”¶æ—¶åˆ»åQ?0.6¿U’)˜q›è¡Œåˆ¤æ–­åQŒé‚£ä¹ˆçީ家B没有收到伤害åQŒæˆ–许可能会å‡ÖM¸­çީ家B后面紧跟的玩家CåQ?00ms后玩家C完全由可能已处于玩家A的射å‡È›®æ ‡ä½¾|®ï¼‰
  3. ä¸ÞZº†å¼¥è¡¥ç”׃ºŽå»¶è¿Ÿé€ æˆçš„问题,服务器端需要引å…?#8220;延迟补偿”½{–略用于修正因åšg˜qŸé€ æˆé”™äؕ假象
  4. 服务器计½Ž—执行设计命令时é—ß_¼Œç„¶åŽæ‰‘Ö‡ºå½“前世界10.5¿U’æ—¶åˆÈŽ©å®¶ä¿¡æ¯ï¼Œæ ÒŽ®ž®„击½Ž—法模拟得出是否命中判断åQŒä»¥è¾‘Öˆ°ž®½å¯èƒ½ç²¾¼‹?

若游戏åšg˜qŸè¡¥å¿è¢«¼›ç”¨åQŒé‚£ä¹ˆå°±ä¼šæœ‰è®¸å¤šçŽ©å®¶æŠ±æ€¨è‡ªå·±æ˜Žæ˜Žæ‰“ä¸­äº†å¯ÒŽ–¹å´æ²¡æœ‰é€ æˆä»ÖM½•伤害。ã€?/p>

有所得,有所失:但这对低延时玩家貌似有些不公òq»I¼Œ¿UÕdŠ¨é€Ÿåº¦å¿«ï¼Œå¯èƒ½å·²ç»è·‘åˆ°è§’è½é‡ŒåÆˆä¸”å·²íy²åœ¨ä¸€ä¸ªç®±å­åŽé¢éšè—è“v来时被对手击中的错觉åQˆå­å¼ÒŽ— è§†æŽ©ä½“,玩家隔着墙被ž®„击åQ‰ï¼Œ¼‹®å®žæœ‰äº›ä¸ä¹æ„ã€?/p>

延迟补偿åQŒç½‘¾lœé«˜å»¶è¿Ÿçš„玩家有利,低åšg˜qŸçš„玩家优势可能会被降低åQˆä½Žå»¶è¿ŸçŽ©å®¶åˆ©ç›Šå—æŸåQ‰ï¼Œä½†å¯¹¾l´æŠ¤æ¸¸æˆä¸–界的åã^衡还是有利的ã€?/p>

å¯ÒŽ—¶&阀å€?/h3>

客户端和服务器需要对æ—Óž¼Œäº’相知道彼此延迟情况åQŒæ¯”如云风定义的某个步骤åQ?/p>

客户端发送一个本地时间量¾l™æœåС噍åQŒæœåŠ¡æ”¶åˆ°åŒ…åŽï¼Œå¤¹å¸¦ä¸€ä¸ªæœåŠ¡å™¨æ—‰™—´˜q”回¾l™å®¢æˆïL«¯ã€‚当客户端收到这个包后,可以估算出包在èµ\½E‹ä¸Š¾lè¿‡çš„æ—¶é—´ã€‚同时把本地新时间夹带进去,再次发送给服务器。服务器也可以进一步的了解响应旉™—´ã€?/p>

C/S两端通过¾cÖM¼¼æ­¥éª¤˜q›è¡Œè®¡ç®—彼此延时/æ—¶å·®åQŒåŒæ—¶ä¼šå¯¹å®žæ—¶åŒæ­¥è®¾¾|®ä¸€ä¸ªé˜€å€û|¼Œæ¯”如对åšg˜qŸä½Žäº?0msåQ?.01¿U’)的交互认为是åÏx—¶åŒæ­¥å‘生åQŒä¸ä¼šè®¤ä¸ºæ˜¯å»¶è¿Ÿã€?/p>

UDP或TCP

不同¾cÕdž‹çš„æ¸¸æˆä¼šé’Ÿçˆ±ä¸åŒçš„协议呢åQŒä¸ä¸€è€ŒèƒöåQ?/p>

  • 客户端间歇性的发è“v无状态的查询åQŒåƈ且偶ž®”发生åšg˜qŸæ˜¯å¯ä»¥å®¹å¿åQŒé‚£ä¹ˆä‹É用HTTP/HTTPSå?
  • 客户端和服务器都可以独立发包åQŒå¶ž®”发生åšg˜qŸå¯ä»¥å®¹å¿ï¼ˆæ¯”如åQšåœ¨¾U¿çš„¾U¸ç‰Œæ¸¸æˆåQŒè®¸å¤šMMO¾cÈš„æ¸¸æˆåQ‰ï¼Œé‚£ä¹ˆä½¿ç”¨TCP长连接吧
  • 客户端和服务器都可以独立发包åQŒè€Œä¸”无法忍受延迟åQˆæ¯”如:大多数的多ähFPS动作¾cÀL¸¸æˆQuake、CS½{‰ï¼Œä»¥åŠä¸€äº›MMO¾cÀL¸¸æˆï¼‰åQŒé‚£ä¹ˆä‹É用UDPå?

TCPä¼šè®¤å®šä¸¢åŒ…æ˜¯å› äØ“æœ¬åœ°å¸¦å®½ä¸èƒö坯D‡´åQˆæœ¬åœ°å¸¦å®½ä¸­‘Ïx˜¯ä¸¢åŒ…的一部分原因åQ‰ï¼Œä½†å›½å†…ISP可能会在自èín机房¾|‘络拥挤时丢弃数据包åQŒè¿™æ—¶å€™å¯èƒ½éœ€è¦å¿«é€Ÿå‘包争抢通道åQŒè€ŒéžTCP½H—口收羃åQŒUDP没有TCP½H—口收羃的负担,可以很容易做到这一炏V€?/p>

要求实时性放在第一位的FPS游戏åQˆegåQšQuakeåQŒCSåQ‰ï¼Œòq¿åŸŸ¾|‘一般采用UDPåQŒå› å¯å®¹è®¸æœ‰ä¸¢å¤±æ•°æ®åŒ…存在(另客æˆïL«¯è‹¥ç­‰å¾…一ŒD‰|—¶é—´ä¸­é—´ä¸¢åŒ…,可以通过插值等手段忽略掉)åQŒä¸€æ—¦æ£€‹¹‹åˆ°å¯ä»¥å¿«é€Ÿå‘送,另不涉及到重发的时候UDP比TCP要快一点嘛。但会在UDP应用层面有所增加协议控制åQŒæ¯”如ACK½{‰ã€?/p>

å¾ˆå¤šæ—¶å€™åè®®æØœç”¨ï¼Œæ¯”å¦‚MMO客户端也讔R¦–å…ˆä‹É用HTTP去获取上一‹Æ¡çš„æ›´æ–°å†…容åQ?重要信息如角色获得的物品和经验需要通过TCP传输åQŒè€Œå‘¨å›´äh物的动向、NPC¿UÕdŠ¨ã€æŠ€èƒ½åŠ¨ç”ÀLŒ‡ä»¤ç­‰åˆ™å¯ä»¥ä‹É用UDP传输åQŒè™½ç„¶å¯èƒ½ä¸¢åŒ…,但媄响不大ã€?/p>

ž®ç»“

¾|‘游通过客户端预‹¹‹ã€æ’值和服务器端延迟补脓½{‰ï¼ŒåŒ–è§£/消除用户端网¾lœåšg˜qŸé€ æˆçš„停™åѝ€‚我们虽然可能没有机会接触游戏开发,学习跨界的优良经验和实践åQŒè¯´ä¸å‡†ä¼šå¯¹å½“前工作某些业务点的处理有所启发呢ã€?/p>

本集由韩国宇航局赞助播出åQšæˆ‘ä»¬è¦åŽ»è¿œæ–¹çœ‹çœ‹ï¼Œ˜q˜æœ‰ä»€ä¹ˆæ˜¯æˆ‘们的思密达ã€?------ 《万万没惛_ˆ°ã€‹çދ大锤



]]>
随手è®îC¹‹Android¾|‘络调试½Ž€è¦è®°å½?/title><link>http://www.aygfsteel.com/yongboy/archive/2014/11/20/420371.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Thu, 20 Nov 2014 14:05:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2014/11/20/420371.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/420371.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2014/11/20/420371.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/420371.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/420371.html</trackback:ping><description><![CDATA[<p>最˜q‘一ŒD‰|—¶é—ß_¼Œ¿UÕdЍ2G/3G客户端连接成功率不高åQŒç€å®žè®©äººå¤´ç–¹{€?/p> <p>说是Android¾|‘络调试åQŒå…¶å®žä¹Ÿä¸è¿‡æ˜¯åœ¨è¢«ROOT后Android¾pȝ»Ÿæ“ä½œåQŒä‹É用adb shell执行一些常规的¾lˆç«¯å‘½ä×oåQŒæ£€‹¹?G/3G/4G/WIFI¾|‘络½{‰ï¼Œ˜q›è€Œç¡®å®šä¸€äº›å› ¾|‘络½{‰å¯¼è‡´çš„问题而已。但adb shell默认没有几个支持的命令,比如 <code>cat</code>, <code>tcpdump</code>åQŒè¿™äº›éƒ½æ˜¯æœ€åŸºæœ¬çš„必备命令,也不支持。对于想要查看网¾lœè¯·æ±‚有几次跌™{åQŒä¸å€ŸåŠ©äº›å¤–åŠ›ï¼Œ¼‹®å®žæ˜¯äšg很不可能的事情ã€?/p> <p>基本ž®†ä¼šåŒ…含如下内容åQ?/p> <blockquote> <ul> <li>如何安装需要的Linux¾lˆç«¯å‘½ä×otcpdump,mtr </li><li>调试2G/3G½{‰ç½‘¾lœè¿žé€šï¼ŒåŸŸåè¯äh±‚è·Œ™{ </li><li>è¯äh±‚丢包情况 </li></ul></blockquote> <h3>Android¾lˆç«¯æ‰©å±•¼œžå™¨<strong>opkg</strong></h3> <p>说它是神器,一炚wƒ½ä¸å¤¸å¼ ã€‚HomepageåQ?http://dan.drown.org/android/)åQ‰ä¸Šå¼€½‹‡æ˜Žä¹‰ï¼š</p> <blockquote> <p>Unix command-line programs ported to run on android. This project uses opkg, which handles downloading and installing packages and their dependencies (like yum or apt). Source for all packages are available.</p></blockquote> <p>作è€?strong>Dan</strong> (http://blog.dan.drown.org/)为我们移植到Androidòq›_°åQŒåƈ且还为我们编译好相当多的常用½E‹åºåQŒå…·ä½“支持列表,可从<code>Changelog</code>(http://dan.drown.org/android/)中找刎ͼŒ˜q™é‡Œä¸å†ç´¯è¿°ã€?/p> <p>十分隑־—åQŒç”±è¡äh„Ÿè°¢ã€?/p> <h3>下蝲opkgåŒ?/h3> <p>预先把依赖下载到本地:</p> <blockquote> <p>http://dan.drown.org/android/system/xbin/busybox <br />http://dan.drown.org/android/opkg.tar.gz</p></blockquote> <h3>安装opkg</h3> <p>讑֮‰è£…到Android手机çš?/data/local 目录åQŒé‚£ä¹ˆé¦–先需要确保这个目录具有可è¯Õd†™æƒé™ã€?/p> <blockquote> <p>记得要ä‹É用su命ä×o切换到root½Ž¡ç†å‘˜èÌŽæˆøP¼Œæ“ä½œã€æƒé™æ‰ä¸ä¼šå—阻ã€?/p></blockquote><pre><code>adb shell chmod 777 /data/local </code></pre> <p>拯‚´opkgåˆ?data/local目录</p><pre><code>adb push busybox /data/local adb push opkg.tar.gz /data/local </code></pre> <p>adb shell˜q›åŽ»ä¹‹åŽåQŒå¼€å§‹ç¼–译安装:</p><pre><code>cd /data/local chmod 777 busybox ./busybox tar zxvf opkg.tar.gz </code></pre> <p>讄¡½®çŽ¯å¢ƒå˜é‡åQ?/p><pre><code>export PATH=$PATH:/data/local/bin </code></pre> <p>执行更新、安装准å¤?/p><pre><code>opkg update opkg install opkg opkg list # 可以查看可以支持安装的终端应用程åº?命ä×o) </code></pre> <p>话说åQŒopkg可以应用于各¿UåµŒå…¥å¼çŽ¯å¢ƒä¸­ï¼Œ­‘…强的说ã€?/p> <h4>安装linux¾lˆç«¯åº”用/命ä×o</h4> <p>可以一口气安装几个试试åQ?/p><pre><code>opkg install mtr curl tcpdump cat </code></pre> <p>当然åQŒä½ ä¹Ÿå¯ä»¥ä¸€ä¸ªä¸€ä¸ªå®‰è£…ã€?/p> <p>安装好之后呢åQŒå°±æ˜¯ç›´æŽ¥è¿è¡Œåº”ç”?命ä×o了,‹¹‹è¯•baidu.com域名解析、丢包情å†üc€?/p> <blockquote> <p>mtr -r baidu.com HOST: localhost Loss% Snt Last<br />Avg Best Wrst StDev<br />1.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0<br />2.|-- 192.168.61.1 0.0% 10 504.3 635.0 339.3 1024. 238.7<br />3.|-- 192.168.63.138 0.0% 10 392.9 588.7 298.5 847.7 220.3<br />4.|-- 221.130.39.106 0.0% 10 340.9 557.3 257.4 823.5 211.7<br />5.|-- 221.179.159.45 10.0% 10 649.6 631.4 332.6 821.4 165.0<br />6.|-- 111.13.14.6 10.0% 10 561.9 551.3 268.2 777.0 170.0<br />7.|-- 111.13.0.162 10.0% 10 510.6 570.6 385.5 767.6 116.6<br />8.|-- 111.13.1.14 10.0% 10 775.4 565.2 377.7 775.4 130.9<br />9.|-- 111.13.2.130 10.0% 10 707.2 564.6 381.1 887.3 173.4</p></blockquote> <p>嗯,通过mtr¼‹®å®žå¾ˆå®¹æ˜“就看出åQŒç½‘¾lœèŸ©æ•ŽÍ¼Œæ¯ä¸€ä¸ªèŠ‚ç‚¹ä¸¢åŒ…çŽ‡ã€‚è¿™æ ·å°±èƒ½å¾ˆå®ÒŽ˜“扑ֈ°åœ¨ç§»åŠ?G/3G¾|‘络˜qžæŽ¥­‘…时比较严重的问题所在。下面就是希望运¾l´çš„同学ž®½å¿«å¤„理好,避免再次出现ç”Þp”通机房再‹Æ¡èŸ©è½¬åˆ°¿UÕdŠ¨æœºæˆ¿é—®é¢˜ã€?/p> <blockquote> <p>非常感谢陈杰同学推荐的比ping+traceroute˜q˜è¦å¥½ç”¨å‘½ä×omtr。一旦拥有,不会放手åQ?/p></blockquote> <h4>¿UÕdЍ2G/3G下网¾lœæŠ“åŒ?/h4> <p>要想抓取2G/3G¾|‘络下数据包åQŒå¿…™åÕd®‰è£…一个tcpdump命ä×oåQ?/p><pre><code>opkg install tcpdump </code></pre> <p>opkg很脓心的会把所依赖的libpcap也都一òq¶å®‰è£…上åQŒå®Œå…¨ä¸ç”¨æ‹…心版本问题!</p><pre><code>tcpdump -i any -p -vv -s 0 -w /sdcard/capture.pcap </code></pre> <p>下面ž®±æ˜¯ä¸€æ°”呵成的导出åQŒä‹É用wireshark˜q›è¡Œåˆ†æžäº†ã€?/p><pre><code>adb pull /sdcard/tmp1.pcap c:/tmp </code></pre> <h3>其它有利于诊断网¾lœçš„APP</h3> <p>不习惯ä‹É用终端诊断网¾lœï¼Œå¯ä»¥ç›´æŽ¥ä½¿ç”¨çŽ°æˆçš„APPã€?/p> <ol> <li>½W¬ä¸€å?FingåQŒå¤§åå¦‚雯‚¯è€»I¼Œè·¨Android、IOSòq›_°åQŒDNS、PING½{‰ä¸åœ¨è¯ä¸‹ï¼Œå±…家生活之必å¤? </li><li>½W¬äºŒåå˜›åQŒæš‚时还没有发现å‘? </li><li>shark for rootåQŒä¹Ÿä¸é”™åQŒAndroidòq›_°æŽ¨è </li><li>¾|‘速测试,可以看到当前¾|‘络的åšg˜qŸç­‰åQŒä¹Ÿä¸é”™ </li></ol> <p>有更好的APP推荐åQŒæ¬¢˜qŽæŽ¨èä¸€äºŒã€?/p> <h3>ž®ç»“</h3> <ol> <li>希望可以¾l™é‡åˆ°åŒæ ·é—®é¢˜çš„同学一些帮åŠ? </li><li>记录下来便于以后索引 </li></ol><img src ="http://www.aygfsteel.com/yongboy/aggbug/420371.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2014-11-20 22:05 <a href="http://www.aygfsteel.com/yongboy/archive/2014/11/20/420371.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>ä¸ÞZ»€ä¹ˆæ‰¹é‡è¯·æ±‚要ž®½å¯èƒ½çš„åˆåÆˆæ“ä½œhttp://www.aygfsteel.com/yongboy/archive/2014/11/09/419829.htmlnieyongnieyongSun, 09 Nov 2014 14:08:00 GMThttp://www.aygfsteel.com/yongboy/archive/2014/11/09/419829.htmlhttp://www.aygfsteel.com/yongboy/comments/419829.htmlhttp://www.aygfsteel.com/yongboy/archive/2014/11/09/419829.html#Feedback16http://www.aygfsteel.com/yongboy/comments/commentRss/419829.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/419829.html前言

¾U¿ä¸Šæƒ…况åQ?/p>

  1. ¾U¿ä¸ŠRedis集群åQŒå¤šä¸ªTwemproxy代理åQˆnutcrackeråQ‰ï¼ŒLVS DR路由均衡调度
  2. 客户端ä‹É用Jedis操作Redis集群åQŒä¸€ä¸ªç¨‹åºè¿›½E‹å®žä¾‹ä‹É用原å…?024个工作线½E‹å¤„理请求,若干个进½E‹å®žä¾?
  3. 一天超˜q?2亿次è¯äh±‚åQŒç½‘¾lœä¸€èˆ¬æƒ…况下åQŒä¸€å¤©è¶…˜q‡ä¸Šä¸‡ä¸ª˜qžæŽ¥å¤ÞpÓ|异常
  4. ˜qç»´åŒå­¦å‘ŠçŸ¥åQŒLVS压力较大

改进工作åQ?/p>

  1. 工作¾U¿ç¨‹ç”±åŽŸå…?024改用16ä¸?
  2. 每个¾U¿ç¨‹æ¯æ¬¡æœ€å¤šæ“ä½?000个Redis命ä×o扚w‡æäº¤

实际效果åQ?/p>

  1. 一天不åˆîC¸€äº¿æ¬¡çš„请求量
  2. LVS压力大减
  3. CPU压力降低到原�/3以下
  4. 单个è¯äh±‚抽样调研òq›_‡å‡å°‘1-90毫秒旉™—´åQˆå°¤å…¶æ˜¯è·¨æœºæˆ¿å¤„理)

Redis支持扚w‡æäº¤

原生支持扚w‡æ“ä½œæ–¹å¼

一般命令前¾~€è‹¥æ·»åŠ ä¸Šm字符ä¸ÔŒ¼Œè¡¨ç¤ºæ”¯æŒå¤šä¸ªã€æ‰¹é‡å‘½ä»¤æäº¤äº†ã€?/p>

昑ּçš?..

MSET key value [key value ...]
MSETNX key value [key value ...]

HMGET key field [field ...]
HMSET key field value [field value ...]

一般方式的...

HDEL key field [field ...]
SREM key member [member ...]
RPUSH key value [value ...]
......

更多åQŒè¯·å‚考:http://redis.cn/commands.html

pipeline½Ž¡é“方式

官方文档åQ?a >http://redis.io/topics/pipelining

  1. Redis Client把所有命令一èµäh‰“包发送到Redis ServeråQŒç„¶åŽé˜»å¡žç­‰å¾…处理结æž?
  2. Redis Server必须在处理完所有命令前先缓存è“v所有命令的处理¾l“æžœ
  3. 打包的命令越多,¾~“存消耗内存也­‘Šå¤š
  4. 不是打包的命令越多越�
  5. å®žé™…çŽ¯å¢ƒéœ€è¦æ ¹æ®å‘½ä»¤æ‰§è¡Œæ—¶é—´ç­‰å„ç§å› ç´ é€‰æ‹©åˆåÆˆå‘½ä×o的个敎ͼŒä»¥åŠ‹¹‹è¯•效果½{?

Java队列支持

一般业务、接入前端请求量˜q‡å¤§åQŒç”Ÿäº§è€…速度˜q‡å¿«åQŒè¿™æ—¶å€™ä‹É用队列暂时缓存会比较好一些,消费者直接直接从队列获取ä»ÕdŠ¡åQŒé€šè¿‡é˜Ÿåˆ—让生产者和消费者进行分¼›»è¿™ä¹Ÿæ˜¯ä¸šç•Œæ™®é€šé‡‡ç”¨çš„æ–¹å¼ã€?/p>

监控队列

有的时候,若可以监控一下队列消è´ÒŽƒ…况,可以监控一下,ž®±å¾ˆç›´è§‚ã€‚åŒäº‹äØ“é˜Ÿåˆ—æ·ÕdŠ äº†ä¸€ä¸ªç›‘æŽ§çº¿½E‹ï¼Œæ¸…晰明了了解队列消费情况ã€?/p>

½Cø™Œƒ

½Cø™Œƒä½¿ç”¨äº†Redis PipelineåQŒçº¿½E‹æ± åQŒå‡†å¤‡æ•°æ®ï¼Œç”Ÿäñ”è€?消费者队列,队列监控½{‰ï¼Œæ¶ˆè´¹å®Œæ¯•åQŒç¨‹åºå…³é—­ã€?/p>

/**
 * 以下‹¹‹è¯•在Jedis 2.6下测试通过
 * 
 * @author nieyong
 * 
 */
public class TestJedisPipeline {
    private static final int NUM = 512;
    private static final int MAX = 1000000; // 100W

    private static JedisPool redisPool;
    private static final ExecutorService pool = Executors.newCachedThreadPool();
    protected static final BlockingQueue<String> queue = new ArrayBlockingQueue<String>(
            MAX); // 100W
    private static boolean finished = false;

    static {
        JedisPoolConfig config = new JedisPoolConfig();
        config.setMaxActive(64);
        config.setMaxIdle(64);

        try {
            redisPool = new JedisPool(config, "192.168.192.8", 6379, 10000,
                    null, 0);
        } catch (Exception e) {
            System.err.println("Init msg redis factory error! " + e.toString());
        }
    }

    public static void main(String[] args) throws InterruptedException {
        System.out.println("prepare test data 100W");
        prepareTestData();
        System.out.println("prepare test data done!");

        // 生äñ”者,模拟è¯äh±‚100W‹Æ?
        pool.execute(new Runnable() {
            @Override
            public void run() {
                for (int i = 0; i < MAX; i++) {
                    if (i % 3 == 0) {
                        queue.offer("del_key key_" + i);
                    } else {
                        queue.offer("get_key key_" + i);
                    }
                }
            }
        });

        // CPU核数*2 个工作者线½E?
        int threadNum = 2 * Runtime.getRuntime().availableProcessors();

        for (int i = 0; i < threadNum; i++)
            pool.execute(new ConsumerTask());

        pool.execute(new MonitorTask());

        Thread.sleep(10 * 1000);// 10sec
        System.out.println("going to shutdown server ...");
        setFinished(true);
        pool.shutdown();

        pool.awaitTermination(1, TimeUnit.MILLISECONDS);

        System.out.println("colse!");
    }

    private static void prepareTestData() {
        Jedis redis = redisPool.getResource();
        Pipeline pipeline = redis.pipelined();

        for (int i = 0; i < MAX; i++) {
            pipeline.set("key_" + i, (i * 2 + 1) + "");

            if (i % (NUM * 2) == 0) {
                pipeline.sync();
            }
        }
        pipeline.sync();
        redisPool.returnResource(redis);
    }

    // queue monitoråQŒç”Ÿäº§è€?消费队列监控
    private static class MonitorTask implements Runnable {

        @Override
        public void run() {
            while (!Thread.interrupted() && !isFinished()) {
                System.out.println("queue.size = " + queue.size());
                try {
                    Thread.sleep(500); // 0.5 second
                } catch (InterruptedException e) {
                    break;
                }
            }
        }
    }

    // consumeråQŒæ¶ˆè´¹è€?
    private static class ConsumerTask implements Runnable {
        @Override
        public void run() {
            while (!Thread.interrupted() && !isFinished()) {
                if (queue.isEmpty()) {
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException e) {
                    }

                    continue;
                }

                List<String> tasks = new ArrayList<String>(NUM);
                queue.drainTo(tasks, NUM);
                if (tasks.isEmpty()) {
                    continue;
                }

                Jedis jedis = redisPool.getResource();
                Pipeline pipeline = jedis.pipelined();

                try {
                    List<Response<String>> resultList = new ArrayList<Response<String>>(
                            tasks.size());

                    List<String> waitDeleteList = new ArrayList<String>(
                            tasks.size());

                    for (String task : tasks) {
                        String key = task.split(" ")[1];
                        if (task.startsWith("get_key")) {
                            resultList.add(pipeline.get(key));
                            waitDeleteList.add(key);
                        } else if (task.startsWith("del_key")) {
                            pipeline.del(key);
                        }
                    }

                    pipeline.sync();

                    // 处理˜q”回列表
                    for (int i = 0; i < resultList.size(); i++) {
                        resultList.get(i).get();
                        // handle value here ...
                        // System.out.println("get value " + value);
                    }

                    // è¯Õd–完毕åQŒç›´æŽ¥åˆ é™¤ä¹‹
                    for (String key : waitDeleteList) {
                        pipeline.del(key);
                    }

                    pipeline.sync();
                } catch (Exception e) {
                    redisPool.returnBrokenResource(jedis);
                } finally {
                    redisPool.returnResource(jedis);
                }
            }
        }
    }

    private static boolean isFinished(){
        return finished;
    }

    private static void setFinished(boolean bool){
        finished = bool;
    }
}

ä»£ç ä½œäØ“½Cø™Œƒã€‚è‹¥¾U¿ä¸Šåˆ™éœ€è¦å¤„理一些异常等ã€?/p>

ž®ç»“

若能够批量请求进行合òq¶æ“ä½œï¼Œè‡ªç„¶å¯ä»¥èŠ‚çœå¾ˆå¤šçš„ç½‘¾lœå¸¦å®½ã€CPU½{‰èµ„源。有¾cÖM¼¼é—®é¢˜çš„同学,不妨考虑一下ã€?/p>

]]>
随手è®îC¹‹Linux 2.6.32内核SYN flooding警告信息http://www.aygfsteel.com/yongboy/archive/2014/08/20/417165.htmlnieyongnieyongWed, 20 Aug 2014 12:43:00 GMThttp://www.aygfsteel.com/yongboy/archive/2014/08/20/417165.htmlhttp://www.aygfsteel.com/yongboy/comments/417165.htmlhttp://www.aygfsteel.com/yongboy/archive/2014/08/20/417165.html#Feedback3http://www.aygfsteel.com/yongboy/comments/commentRss/417165.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/417165.html前言

新申è¯ïLš„æœåŠ¡å™¨å†…æ æ€Ø“2.6.32åQŒåŽŸå…ˆçš„TCP Server直接在新内核的Linxu服务器上˜qè¡ŒåQŒè¿è¡Œdmesg命ä×oåQŒå¯ä»¥çœ‹åˆ°å¤§é‡çš„SYN flooding警告åQ?/p>

possible SYN flooding on port 8080. Sending cookies.

原先çš?.6.18内核的参数在2.6.32内核版本情况下,½Ž€å•è°ƒæ•?net.ipv4.tcp_max_syn_backlog"已经没有作用ã€?/p>

怎么办,只能再次阅读2.6.32源码åQŒä»¥ä¸‹å³æ˜¯ã€?/p>

最后小¾l“处有直接结论,心急的你可以直接阅è¯ÀL€È»“好了ã€?/p>

linux内核2.6.32有关backlog值分�/h3>

net/Socket.c:

SYSCALL_DEFINE2(listen, int, fd, int, backlog)
{
    struct socket *sock;
    int err, fput_needed;
    int somaxconn;

    sock = sockfd_lookup_light(fd, &err, &fput_needed);
    if (sock) {
        somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
        if ((unsigned)backlog > somaxconn)
            backlog = somaxconn;

        err = security_socket_listen(sock, backlog);
        if (!err)
            err = sock->ops->listen(sock, backlog);

        fput_light(sock->file, fput_needed);
    }
    return err;
}

net/ipv4/Af_inet.c:

/*
 *  Move a socket into listening state.
 */
int inet_listen(struct socket *sock, int backlog)
{
    struct sock *sk = sock->sk;
    unsigned char old_state;
    int err;

    lock_sock(sk);

    err = -EINVAL;
    if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
        goto out;

    old_state = sk->sk_state;
    if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
        goto out;

    /* Really, if the socket is already in listen state
     * we can only allow the backlog to be adjusted.
     */
    if (old_state != TCP_LISTEN) {
        err = inet_csk_listen_start(sk, backlog);
        if (err)
            goto out;
    }
    sk->sk_max_ack_backlog = backlog;
    err = 0;

out:
    release_sock(sk);
    return err;
}

inet_listen调用inet_csk_listen_start函数åQŒæ‰€ä¼ å…¥çš„backlog参数改头换面åQŒå˜æˆäº†ä¸å¯ä¿®æ”¹çš„常量nr_table_entries了ã€?/p>

net/ipv4/Inet_connection_sock.c:

int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
{
    struct inet_sock *inet = inet_sk(sk);
    struct inet_connection_sock *icsk = inet_csk(sk);
    int rc = reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries);

    if (rc != 0)
        return rc;

    sk->sk_max_ack_backlog = 0;
    sk->sk_ack_backlog = 0;
    inet_csk_delack_init(sk);

    /* There is race window here: we announce ourselves listening,
     * but this transition is still not validated by get_port().
     * It is OK, because this socket enters to hash table only
     * after validation is complete.
     */
    sk->sk_state = TCP_LISTEN;
    if (!sk->sk_prot->get_port(sk, inet->num)) {
        inet->sport = htons(inet->num);

        sk_dst_reset(sk);
        sk->sk_prot->hash(sk);

        return 0;
    }

    sk->sk_state = TCP_CLOSE;
    __reqsk_queue_destroy(&icsk->icsk_accept_queue);
    return -EADDRINUSE;
}

下面处理的是TCP SYN_RECV状态的˜qžæŽ¥åQŒå¤„于握手阶ŒDµï¼Œä¹Ÿå¯ä»¥è¯´æ˜¯åŠ˜qžæŽ¥æ—Óž¼Œ½{‰å¾…着˜qžæŽ¥æ–¹ç¬¬ä¸‰æ¬¡æ¡æ‰‹ã€?/p>

/*
 * Maximum number of SYN_RECV sockets in queue per LISTEN socket.
 * One SYN_RECV socket costs about 80bytes on a 32bit machine.
 * It would be better to replace it with a global counter for all sockets
 * but then some measure against one socket starving all other sockets
 * would be needed.
 *
 * It was 128 by default. Experiments with real servers show, that
 * it is absolutely not enough even at 100conn/sec. 256 cures most
 * of problems. This value is adjusted to 128 for very small machines
 * (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb).
 * Note : Dont forget somaxconn that may limit backlog too.
 */
int reqsk_queue_alloc(struct request_sock_queue *queue,
              unsigned int nr_table_entries)
{
    size_t lopt_size = sizeof(struct listen_sock);
    struct listen_sock *lopt;
    nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
    nr_table_entries = max_t(u32, nr_table_entries, 8);
    nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
    lopt_size += nr_table_entries * sizeof(struct request_sock *); 
    if (lopt_size > PAGE_SIZE)
        lopt = __vmalloc(lopt_size,
            GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
            PAGE_KERNEL);
    else
        lopt = kzalloc(lopt_size, GFP_KERNEL);
    if (lopt == NULL)
        return -ENOMEM;

    for (lopt->max_qlen_log = 3;
         (1 << lopt->max_qlen_log) < nr_table_entries;
         lopt->max_qlen_log++);

    get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
    rwlock_init(&queue->syn_wait_lock);
    queue->rskq_accept_head = NULL;
    lopt->nr_table_entries = nr_table_entries;

    write_lock_bh(&queue->syn_wait_lock);
    queue->listen_opt = lopt;
    write_unlock_bh(&queue->syn_wait_lock);

    return 0;
}

关键要看nr_table_entries变量åQŒåœ¨reqsk_queue_alloc函数中nr_table_entries变成了无½W¦å·å˜é‡åQŒå¯ä¿®æ”¹çš„,变化受限ã€?/p>

比如实际内核参数å€égØ“åQ?/p>

net.ipv4.tcp_max_syn_backlog = 65535

所传入的backlogåQˆä¸å¤§äºŽnet.core.somaxconn = 65535åQ‰äØ“8102åQŒé‚£ä¹?/p>

// 取listen函数的backlogå’Œsysctl_max_syn_backlog最ž®å€û|¼Œ¾l“æžœä¸?102
nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
// 取nr_table_entrieså’?˜q›è¡Œæ¯”较的最大å€û|¼Œ¾l“æžœä¸?102
nr_table_entries = max_t(u32, nr_table_entries, 8);
// 可看å?nr_table_entries*2åQŒç»“æžœäØ“8102*2=16204
nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);

计算¾l“æžœåQŒmax_qlen_log = 14

2.6.18内核中max_qlen_log的计½Ž—æ–¹æ³?/h4>
for (lopt->max_qlen_log = 6;
     (1 << lopt->max_qlen_log) < sysctl_max_syn_backlog;
     lopt->max_qlen_log++);
  1. 很显ç„Óž¼Œsysctl_max_syn_backlog参与了运½Ž—,sysctl_max_syn_backlogå€¼å¾ˆå¤§çš„è¯ä¼šå¯ÆD‡´max_qlen_log值相å¯Òޝ”也很å¤?
  2. è‹¥sysctl_max_syn_backlog=65535åQŒé‚£ä¹ˆmax_qlen_log=16
  3. 2.6.18内核中半˜qžæŽ¥é•¿åº¦ä¸?^16=65536

ä½œäØ“listen_sock¾l“构定义了需要处理的处理半连接的队列元素个数为nr_table_entriesåQŒæ­¤ä¾‹ä¸­ä¸?6204长度ã€?/p>

/** struct listen_sock - listen state
 *
 * @max_qlen_log - log_2 of maximal queued SYNs/REQUESTs
 */
struct listen_sock {
    u8          max_qlen_log;
    /* 3 bytes hole, try to use */
    int         qlen;
    int         qlen_young;
    int         clock_hand;
    u32         hash_rnd;
    u32         nr_table_entries;
    struct request_sock *syn_table[0];
};

¾læ˜q°è€ŒçŸ¥åQ?^max_qlen_log = 半连接队列长度qlen倹{€?/p>

再回头看看报告SYN flooding的函敎ͼš

net/ipv4/Tcp_ipv4.c

#ifdef CONFIG_SYN_COOKIES
static void syn_flood_warning(struct sk_buff *skb)
{
    static unsigned long warntime;

    if (time_after(jiffies, (warntime + HZ * 60))) {
        warntime = jiffies;
        printk(KERN_INFO
               "possible SYN flooding on port %d. Sending cookies.\n",
               ntohs(tcp_hdr(skb)->dest));
    }
}
#endif

被调用的处,已精½Ž€è‹¥å¹²ä»£ç åQ?/p>

int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
......
#ifdef CONFIG_SYN_COOKIES
    int want_cookie = 0;
#else
#define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
#endif
    ......
    /* TW buckets are converted to open requests without
     * limitations, they conserve resources and peer is
     * evidently real one.
     */
     // 判断半连接队列是否已�&& !0
    if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
#ifdef CONFIG_SYN_COOKIES
        if (sysctl_tcp_syncookies) {
            want_cookie = 1;
        } else
#endif
        goto drop;
    }

    /* Accept backlog is full. If we have already queued enough
     * of warm entries in syn queue, drop request. It is better than
     * clogging syn queue with openreqs with exponentially increasing
     * timeout.
     */
    if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
        goto drop;

    req = inet_reqsk_alloc(&tcp_request_sock_ops);
    if (!req)
        goto drop;

    ......

    if (!want_cookie)
        TCP_ECN_create_request(req, tcp_hdr(skb));

    if (want_cookie) {
#ifdef CONFIG_SYN_COOKIES
        syn_flood_warning(skb);
        req->cookie_ts = tmp_opt.tstamp_ok;
#endif
        isn = cookie_v4_init_sequence(sk, skb, &req->mss);
    } else if (!isn) {
        ......
    }       
    ......
}

判断半连接队列已满的函数很关键,可以看看˜qç®—法则åQ?/p>

include/net/Inet_connection_sock.h:

static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
{
    return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
}

include/net/Rquest_sock.h:

static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
{
    // 向右¿UÖM½max_qlen_log个单ä½?
    return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
}

˜q”回1åQŒè‡ªç„¶è¡¨½CºåŠ˜qžæŽ¥é˜Ÿåˆ—已满ã€?/p>

以上仅仅是分析了半连接队列已满的判断条äšgåQŒæ€ÖM¹‹åº”用½E‹åºæ‰€ä¼ å…¥çš„backlog很关键,如值太ž®ï¼Œå¾ˆå®¹æ˜“å¾—åˆ?.

è‹?somaxconn = 128åQŒsysctl_max_syn_backlog = 4096åQŒbacklog = 511 则最¾l?nr_table_entries = 256åQŒmax_qlen_log = 8。那么超˜q?56个半˜qžæŽ¥çš„队列,257 >> 8 = 1åQŒé˜Ÿåˆ—已满ã€?/p>

如何讄¡½®backlogåQŒè¿˜å¾—éœ€è¦ç»“åˆå…·ä½“åº”ç”¨ç¨‹åºï¼Œéœ€è¦äØ“å…¶è°ƒç”¨listenæ–ÒŽ³•赋倹{€?/p>

Netty backlog处理

Tcp Server使用Netty 3.7 版本åQŒç‰ˆæœ¬è¾ƒä½Žï¼Œåœ¨å¤„理backlogåQŒè‹¥æˆ‘们不手动指定backlogå€û|¼ŒJDK 1.6默认ä¸?0ã€?/p>

有证如下åQ?java.net.ServerSocket:

public void bind(SocketAddress endpoint, int backlog) throws IOException {
    if (isClosed())
        throw new SocketException("Socket is closed");
    if (!oldImpl && isBound())
        throw new SocketException("Already bound");
    if (endpoint == null)
        endpoint = new InetSocketAddress(0);
    if (!(endpoint instanceof InetSocketAddress))
        throw new IllegalArgumentException("Unsupported address type");
    InetSocketAddress epoint = (InetSocketAddress) endpoint;
    if (epoint.isUnresolved())
        throw new SocketException("Unresolved address");
    if (backlog < 1)
      backlog = 50;
    try {
        SecurityManager security = System.getSecurityManager();
        if (security != null)
        security.checkListen(epoint.getPort());
        getImpl().bind(epoint.getAddress(), epoint.getPort());
        getImpl().listen(backlog);
        bound = true;
    } catch(SecurityException e) {
        bound = false;
        throw e;
    } catch(IOException e) {
        bound = false;
        throw e;
    }
}

netty中,处理backlog的地方:

org/jboss/netty/channel/socket/DefaultServerSocketChannelConfig.java:

@Override
public boolean setOption(String key, Object value) {
    if (super.setOption(key, value)) {
        return true;
    }

    if ("receiveBufferSize".equals(key)) {
        setReceiveBufferSize(ConversionUtil.toInt(value));
    } else if ("reuseAddress".equals(key)) {
        setReuseAddress(ConversionUtil.toBoolean(value));
    } else if ("backlog".equals(key)) {
        setBacklog(ConversionUtil.toInt(value));
    } else {
        return false;
    }
    return true;
}

既然需要我们手动指定backlogå€û|¼Œé‚£ä¹ˆå¯ä»¥˜q™æ ·åšï¼š

bootstrap.setOption("backlog", 8102); // 讄¡½®å¤§ä¸€äº›æ²¡æœ‰å…³¾p»ï¼Œ¾pȝ»Ÿå†…核会自动与net.core.somaxconn相比较,取最低å€?

相对比Netty 4.0åQŒæœ‰äº›ä¸æ™ø™ƒ½åQŒå¯å‚考:http://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.html

ž®ç»“

在linux内核2.6.32åQŒè‹¥åœ¨æ²¡æœ‰é­å—到SYN floodingæ”Õd‡»çš„æƒ…况下åQŒå¯ä»¥é€‚当调整åQ?/p>

sysctl -w net.core.somaxconn=32768

sysctl -w net.ipv4.tcp_max_syn_backlog=65535

sysctl -p

另千万别忘记修改TCP Serverçš„listen接口所传入的backlogå€û|¼Œè‹¥ä¸è®„¡½®æˆ–者过ž®ï¼Œéƒ½ä¼šæœ‰å¯èƒ½é€ æˆSYN flooding的警告信息。开始不妨设¾|®æˆ1024åQŒç„¶åŽè§‚察一ŒD‰|—¶é—´æ ¹æ®å®žé™…情况需要再慢慢往上调ã€?/p>

无论你如何设¾|®ï¼Œæœ€¾lˆbacklog倯DŒƒå›´äØ“åQ?/p>

backlog <= net.core.somaxconn

半连接队列长度约为:

半连接队列长åº?≈ 2 * min(backlog, net.ipv4.tcpmax_syn_backlog)

另,若出现SYN floodingæ—Óž¼Œæ­¤æ—¶TCP SYN_RECV数量表示半连接队列已¾læ»¡åQŒå¯ä»¥æŸ¥çœ‹ä¸€ä¸‹ï¼š

ss -ant | awk 'NR>1 {++s[$1]} END {for(k in s) print k,s[k]}'

感谢˜qç»´ä¹¦å¤ž®ä¼™æä¾›çš„æ¯”较好用查看命令ã€?/p>

]]>
随手è®îC¹‹Linux内核SYN flooding警告信息http://www.aygfsteel.com/yongboy/archive/2014/08/06/416647.htmlnieyongnieyongWed, 06 Aug 2014 13:57:00 GMThttp://www.aygfsteel.com/yongboy/archive/2014/08/06/416647.htmlhttp://www.aygfsteel.com/yongboy/comments/416647.htmlhttp://www.aygfsteel.com/yongboy/archive/2014/08/06/416647.html#Feedback5http://www.aygfsteel.com/yongboy/comments/commentRss/416647.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/416647.html前言

最˜q‘线上服务器åQŒdmesg会给å‡ÞZ¸€äº›è­¦å‘Šä¿¡æ¯ï¼š

possible SYN flooding on port 8080. Sending cookies.

åˆçœ‹ä»¥äØ“æ˜¯å—åˆ°DOS拒绝性攻击,但仔¾l†ä¸€åˆ†æžåQŒä¸€å¤©é‡ä¹Ÿå°±æ˜¯åœ¨1000多条左右åQŒæ„Ÿè§‰ä¸Šå±žäºŽæ­£å¸¸å¯æŽ¥å—范围ã€?/p>

下面需要找出来源,以及原因åQŒä»¥ä¸‹å†…容基于Linux 2.6.18内核ã€?/p>

警告输出源头

net/ipv4/Tcp_ipv4.c:

#ifdef CONFIG_SYN_COOKIES
static void syn_flood_warning(struct sk_buff *skb)
{
    static unsigned long warntime; // ½W¬ä¸€‹Æ¡åŠ è½½åˆå§‹åŒ–ä¸ºé›¶åQŒåŽ¾l­warntime = jiffies

    if (time_after(jiffies, (warntime + HZ * 60))) {
        warntime = jiffies;
        printk(KERN_INFO
           "possible SYN flooding on port %d. Sending cookies.\n",
           ntohs(skb->h.th->dest));
    }
}
#endif

很显ç„Óž¼ŒCONFIG_SYN_COOKIES在Linux¾pȝ»Ÿ¾~–译æ—Óž¼Œå·²è¢«è®„¡½®trueã€?/p>

time_after宏定义:

#define time_after(a,b)     \
    (typecheck(unsigned long, a) && \
     typecheck(unsigned long, b) && \
     ((long)(b) - (long)(a) < 0))

两个无符åïLš„æ—‰™—´æ¯”较åQŒç¡®å®šå…ˆåŽé¡ºåºã€?/p>

jiffies真èínåQ?/p>

# define jiffies    raid6_jiffies()

#define HZ 1000

......

static inline uint32_t raid6_jiffies(void)
{
    struct timeval tv;
    gettimeofday(&tv, NULL);
    return tv.tv_sec*1000 + tv.tv_usec/1000; // ¿U?1000 + 微秒/1000
}

回过头来åQŒå†çœ‹çœ‹syn_flood_warning函数åQ?/p>

static void syn_flood_warning(struct sk_buff *skb)
{
    static unsigned long warntime; // ½W¬ä¸€‹Æ¡åŠ è½½åˆå§‹åŒ–ä¸ºé›¶åQŒåŽ¾l­warntime = jiffies

    if (time_after(jiffies, (warntime + HZ * 60))) {
        warntime = jiffies;
        printk(KERN_INFO
           "possible SYN flooding on port %d. Sending cookies.\n",
           ntohs(skb->h.th->dest));
    }
}

warntime为static¾cÕdž‹åQŒç¬¬ä¸€‹Æ¡è°ƒç”¨æ—¶è¢«åˆå§‹åŒ–为零åQŒä¸‹‹Æ¡è°ƒç”¨å°±æ˜¯ä¸Š‹Æ¡çš„jiffieså€égº†åQŒå‰åŽé—´éš”倯D¶…˜q‡HZ*60ž®×ƒ¸ä¼šè¾“凸™­¦å‘Šä¿¡æ¯äº†ã€?/p>

有关time_afterå’ŒjiffiesåQŒåˆ†äº«å‡ ½‹‡æ–‡ç« ï¼š

http://wenku.baidu.com/view/c75658d480eb6294dd886c4e.html

http://www.360doc.com/content/11/1201/09/1317564_168810003.shtml

 

警告输出需要满­‘³çš„æ¡äšg

注意观察want_cookie=1时的条äšgã€?/p>

net/ipv4/Tcp_ipv4.c:

int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
{
    struct inet_request_sock *ireq;
    struct tcp_options_received tmp_opt;
    struct request_sock *req;
    __u32 saddr = skb->nh.iph->saddr;
    __u32 daddr = skb->nh.iph->daddr;
    __u32 isn = TCP_SKB_CB(skb)->when; // when在tcp_v4_rcv()中会被置�
    struct dst_entry *dst = NULL;
#ifdef CONFIG_SYN_COOKIES
    int want_cookie = 0;
#else
#define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
#endif

    /* Never answer to SYNs send to broadcast or multicast */
    if (((struct rtable *)skb->dst)->rt_flags &
    (RTCF_BROADCAST | RTCF_MULTICAST))
        goto drop;

    /* TW buckets are converted to open requests without
     * limitations, they conserve resources and peer is
     * evidently real one.
     */
    // if(判断半连接队列已�&& !0)
    if (inet_csk_reqsk_queue_is_full(sk) && !isn) { 
#ifdef CONFIG_SYN_COOKIES
        if (sysctl_tcp_syncookies) { // net.ipv4.tcp_syncookies = 1
            want_cookie = 1;
        } else
#endif
        goto drop;
    }

    /* Accept backlog is full. If we have already queued enough
     * of warm entries in syn queue, drop request. It is better than
     * clogging syn queue with openreqs with exponentially increasing
     * timeout.
     */
    // if(˜qžæŽ¥é˜Ÿåˆ—是否已满 && 半连接队列中˜q˜æœ‰æœªé‡ä¼ ACK半连接数å­?> 1) 
    if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
        goto drop;

    ......

    tcp_openreq_init(req, &tmp_opt, skb);

    ireq = inet_rsk(req);
    ireq->loc_addr = daddr;
    ireq->rmt_addr = saddr;
    ireq->opt = tcp_v4_save_options(sk, skb);
    if (!want_cookie)
        TCP_ECN_create_request(req, skb->h.th);

    if (want_cookie) { // 半连接队列已满会触发
#ifdef CONFIG_SYN_COOKIES
        syn_flood_warning(skb);
#endif
        isn = cookie_v4_init_sequence(sk, skb, &req->mss);
    } else if (!isn) {
        ......
    }
    /* Kill the following clause, if you dislike this way. */
    // net.ipv4.tcp_syncookies未设¾|®æƒ…况下åQŒsysctl_max_syn_backlog发生的作ç”?
    else if (!sysctl_tcp_syncookies &&
             (sysctl_max_syn_backlog - inet_csk_reqsk_queue_len(sk) <
              (sysctl_max_syn_backlog >> 2)) &&
             (!peer || !peer->tcp_ts_stamp) &&
             (!dst || !dst_metric(dst, RTAX_RTT))) {
            /* Without syncookies last quarter of
             * backlog is filled with destinations,
             * proven to be alive.
             * It means that we continue to communicate
             * to destinations, already remembered
             * to the moment of synflood.
             */
            LIMIT_NETDEBUG(KERN_DEBUG "TCP: drop open "
                   "request from %u.%u.%u.%u/%u\n",
                   NIPQUAD(saddr),
                   ntohs(skb->h.th->source));
            dst_release(dst);
            goto drop_and_free;
        }

        isn = tcp_v4_init_sequence(sk, skb);
    }
    tcp_rsk(req)->snt_isn = isn;

    if (tcp_v4_send_synack(sk, req, dst))
        goto drop_and_free;

    if (want_cookie) {
        reqsk_free(req);
    } else {
        inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
    }
    return 0;

drop_and_free:
    reqsk_free(req);
drop:
    return 0;
}

ž®ç»“

æ€ÖM¹‹åQŒå¦‚¾pȝ»Ÿå‡ºçްåQ?/p>

possible SYN flooding on port 8080. Sending cookies.

若量不大åQŒæ˜¯åœ¨æé†’你需要关心一下sysctl_max_syn_backlog其值是否过ä½?

sysctl -a | grep 'max_syn_backlog'

不妨成倍增加一�/p>

sysctl -w net.ipv4.tcp_max_syn_backlog=8192

sysctl -p

若进½E‹æ— æ³•做到重新加载,那就需要重启应用,以适应新的内核参数。进而持¾l­è§‚察一ŒD‰|—¶é—´ã€?/p>

貌似tcp_max_syn_backlog参数其完整作用域˜q˜æ²¡æœ‰ç†è§£å®Œæ•ß_¼Œä¸‹æ¬¡æœ‰æ—¶é—´å†å†™å§ã€?/p>

]]>
随手è®îC¹‹Linux内核Backlog½W”è®°http://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.htmlnieyongnieyongWed, 30 Jul 2014 09:22:00 GMThttp://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.htmlhttp://www.aygfsteel.com/yongboy/comments/416373.htmlhttp://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.html#Feedback5http://www.aygfsteel.com/yongboy/comments/commentRss/416373.htmlhttp://www.aygfsteel.com/yongboy/services/trackbacks/416373.html零。前­a€

有些东西æ€ÀL˜¯å¾ˆå®¹æ˜“遗忘,一时记得了åQŒè¿‡ä¸¤å¤©ž®ÞqœŸæ­£è¿˜¾l™å‘¨å…¬äº†ã€‚零零碎¼„Žçš„不如一òq¶è®°ä¸‹æ¥åQŒä»¥åŽå¯ä»¥ç›´æŽ¥æ‹¿˜q‡æ¥æŸ¥è¯¢å›_¯ã€?/p>

以下内容åŸÞZºŽLinux 2.6.18内核ã€?/p>

一。listenæ–ÒŽ³•传入的backlog参数åQŒnet.core.somaxconn

˜q™ä¸ªå‚数具体意义åQŒå…ˆçœ‹çœ‹Linux Socketçš„listen解释

man listen

   #include <sys/socket.h>

   int listen(int sockfd, int backlog);

int¾cÕdž‹çš„backlog参数åQŒlistenæ–ÒŽ³•çš„backlog意义为,已经完成三次握手、已¾læˆåŠŸå¾ç«‹è¿žæŽ¥çš„å¥—æŽ¥å­—å°†è¦è¿›å…¥é˜Ÿåˆ—çš„é•¿åº¦ã€?/p>

一般我们自己定义设定backlogå€û|¼Œè‹¥æˆ‘们设¾|®çš„backlog值大于net.core.somaxconnå€û|¼Œž®†è¢«¾|®äØ“net.core.somaxconn值大ž®ã€‚若不想直接¼‹¬æ€§æŒ‡å®šï¼Œè·Ÿéš¾pȝ»Ÿè®‘Ö®šåQŒåˆ™éœ€è¦è¯»å?proc/sys/net/core/somaxconnã€?/p>

net\Socket.c :

/*
 *  Perform a listen. Basically, we allow the protocol to do anything
 *  necessary for a listen, and if that works, we mark the socket as
 *  ready for listening.
 */

int sysctl_somaxconn = SOMAXCONN;

asmlinkage long sys_listen(int fd, int backlog)
{
    struct socket *sock;
    int err, fput_needed;

    if ((sock = sockfd_lookup_light(fd, &err, &fput_needed)) != NULL) {
        if ((unsigned) backlog > sysctl_somaxconn)
            backlog = sysctl_somaxconn;

        err = security_socket_listen(sock, backlog);
        if (!err)
            err = sock->ops->listen(sock, backlog);

        fput_light(sock->file, fput_needed);
    }
    return err;
}

比如¾lå¸¸ä½¿ç”¨çš„netty(4.0)框架åQŒåœ¨Linux下启动时åQŒä¼šç›´æŽ¥è¯Õd–/proc/sys/net/core/somaxconn值然后作为listençš„backlog参数˜q›è¡Œè°ƒç”¨Linux¾pȝ»Ÿçš„listen˜q›è¡Œåˆå§‹åŒ–ç­‰ã€?/p>

int somaxconn = 3072;
BufferedReader in = null;
try {
    in = new BufferedReader(new FileReader("/proc/sys/net/core/somaxconn"));
    somaxconn = Integer.parseInt(in.readLine());
    logger.debug("/proc/sys/net/core/somaxconn: {}", somaxconn);
} catch (Exception e) {
    // Failed to get SOMAXCONN
} finally {
    if (in != null) {
        try {
            in.close();
        } catch (Exception e) {
            // Ignored.
        }
    }
}

SOMAXCONN = somaxconn;
......
private volatile int backlog = NetUtil.SOMAXCONN;

一般稍微增大net.core.somaxconn值就昑־—很有必要ã€?/p>

讄¡½®å…¶å€¼æ–¹æ³•:

sysctl -w net.core.somaxconn=65535

较大内存的LinuxåQ?5535æ•°å€ég¸€èˆ¬å°±å¯ä»¥äº†ã€?/p>

若让其生效,sysctl -p 卛_¯åQŒç„¶åŽé‡å¯ä½ çš„Server应用卛_¯ã€?/p>

二。网卡设备将è¯äh±‚攑օ¥é˜Ÿåˆ—的长度,netdev_max_backlog

内核代码中sysctl.cæ–‡äšg解释åQ?/p>

number of unprocessed input packets before kernel starts dropping them, default 300

我所理解的含义,每个¾|‘络接口接收数据包的速率比内核处理这些包的速率快时åQŒå…è®”R€åˆ°é˜Ÿåˆ—的最大数目,一旦超˜q‡å°†è¢«ä¸¢å¼ƒã€?/p>

所起作用处åQŒnet/core/Dev.cåQ?/p>

int netif_rx(struct sk_buff *skb)
{
    struct softnet_data *queue;
    unsigned long flags;

    /* if netpoll wants it, pretend we never saw it */
    if (netpoll_rx(skb))
        return NET_RX_DROP;

    if (!skb->tstamp.off_sec)
        net_timestamp(skb);

    /*
     * The code is rearranged so that the path is the most
     * short when CPU is congested, but is still operating.
     */
    local_irq_save(flags);
    queue = &__get_cpu_var(softnet_data);

    __get_cpu_var(netdev_rx_stat).total++;
    if (queue->input_pkt_queue.qlen <= netdev_max_backlog) {
        if (queue->input_pkt_queue.qlen) {
enqueue:
            dev_hold(skb->dev);
            __skb_queue_tail(&queue->input_pkt_queue, skb);
            local_irq_restore(flags);
            return NET_RX_SUCCESS;
        }

        netif_rx_schedule(&queue->backlog_dev);
        goto enqueue;
    }

    __get_cpu_var(netdev_rx_stat).dropped++;
    local_irq_restore(flags);

    kfree_skb(skb);
    return NET_RX_DROP;
}

以上代码看一下,大概会明白netdev_max_backlog会在什么时候è“v作用ã€?/p>

]]>
Linux服务器端口的那些äº?/title><link>http://www.aygfsteel.com/yongboy/archive/2014/06/28/415240.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Sat, 28 Jun 2014 06:15:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2014/06/28/415240.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/415240.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2014/06/28/415240.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/415240.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/415240.html</trackback:ping><description><![CDATA[<h3>前言</h3> <p>公司内技术分享文档,不涉及公司内部技术等åQŒå¯ä»¥æ‹¿å‡ºæ¥åˆ†äín一下ã€?/p> <h3>演示文æ¡£</h3><script async class="speakerdeck-embed" data-id="2bdcab50e0b60131fe8936fb43230fd9" data-ratio="1.41436464088398" src="http://speakerdeck.com/assets/embed.js"></script> <p>讉K—®åœ°å€åQ?a >https://speakerdeck.com/yongboy/linuxxi-tong-fu-wu-duan-kou-de-na-xie-shi</a></p> <p>有些¾_—ç³™åQŒæœ‰äº›ç‚¹å¯èƒ½æœªè¡¨è¾¾æ¸…楚,您若发现谬误之处åQŒæ¬¢˜qŽåŠæ—¶æŒ‡å‡ºã€?/p><img src ="http://www.aygfsteel.com/yongboy/aggbug/415240.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2014-06-28 14:15 <a href="http://www.aygfsteel.com/yongboy/archive/2014/06/28/415240.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>《让¾|‘页加蝲快一些》培训演½Cºæ–‡æ¡?/title><link>http://www.aygfsteel.com/yongboy/archive/2013/06/28/401054.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Fri, 28 Jun 2013 08:56:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2013/06/28/401054.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/401054.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2013/06/28/401054.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/401054.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/401054.html</trackback:ping><description><![CDATA[<div>《让¾|‘页加蝲快一些》,˜q™ç¯‡PPT演示文æ¡£åQŒç›®çš„在于扩大视野用åQˆæ²¡æœ‰æ¶‰åŠåˆ°æ·±åº¦åQ‰ï¼Œä¾¿äºŽåœ¨å¤„理网™å‰|€§èƒ½ä¼˜åŒ–æ—Óž¼Œä¸ÞZ¸€äº›åŒäº‹æä¾›ä¸€äº›å¤„理思èµ\åQŒé¿å…åˆ°å¤„撞墙ã€?br /> <br /> 目标åQ?br /> <div>如何让一个页面加载快一些,˜q™æ˜¯ä¸»é¢˜<br /> ™åµé¢æ¯ç»˜q‡ä¸€ä¸ªçŽ¯èŠ‚ï¼Œéƒ½ä¼š½Ž€å•涉å?br /> 覆盖面广(前前后后都有)åQŒä½†èœ»èœ“ç‚ÒŽ°´<br /> 可能会增加些视野åQˆç›®çš„也ž®Þp¾¾åˆîCº†åQ?br /> 前期不要做优化,但需要做规划åQ?br /> <br /> <embed src='http://www.docin.com/DocinViewer-671771410-144.swf' width='650' height='490' type=application/x-shockwave-flash ALLOWFULLSCREEN='true' ALLOWSCRIPTACCESS='always'></embed> <br />豆丁地址åQ?br /> <div><a >http://www.docin.com/p-671771410.html</a></div></div> </div><img src ="http://www.aygfsteel.com/yongboy/aggbug/401054.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2013-06-28 16:56 <a href="http://www.aygfsteel.com/yongboy/archive/2013/06/28/401054.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>åŸÞZºŽErlang OTP构徏一个TCP服务å™?/title><link>http://www.aygfsteel.com/yongboy/archive/2012/10/24/390185.html</link><dc:creator>nieyong</dc:creator><author>nieyong</author><pubDate>Wed, 24 Oct 2012 10:14:00 GMT</pubDate><guid>http://www.aygfsteel.com/yongboy/archive/2012/10/24/390185.html</guid><wfw:comment>http://www.aygfsteel.com/yongboy/comments/390185.html</wfw:comment><comments>http://www.aygfsteel.com/yongboy/archive/2012/10/24/390185.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.aygfsteel.com/yongboy/comments/commentRss/390185.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/yongboy/services/trackbacks/390185.html</trackback:ping><description><![CDATA[<h2><strong>套接字模å¼?/strong></h2> <p>ä¸ÕdŠ¨æ¨¡å¼åQˆé€‰é¡¹{active, true}åQ‰ä¸€èˆ¬è®©äººå¾ˆå–œæ¬¢åQŒéžé˜Õd¡žæ¶ˆæ¯æŽ¥æ”¶åQŒä½†åœ¨ç³»¾lŸæ— æ³•应对超大流量请求时åQŒå®¢æˆïL«¯å‘送的数据快过服务器可以处理的速度åQŒé‚£ä¹ˆç³»¾lŸå°±å¯èƒ½ä¼šé€ æˆæ¶ˆæ¯¾~“å†²åŒø™¢«å¡žæ»¡åQŒå¯èƒ½å‡ºçŽ°æŒ¾l­ç¹å¿™çš„‹¹é‡çš„æžç«¯æƒ…况下åQŒç³»¾lŸå› è¯äh±‚而溢出,虚拟机造成内存不èƒö的风险而崩溃ã€?/p> <p>使用被动模式åQˆé€‰é¡¹{active, false}åQ‰çš„套接字,底层的TCP¾~“冲区可用于抑制è¯äh±‚åQŒåƈ拒绝客户端的消息åQŒåœ¨æŽ¥æ”¶æ•°æ®çš„地斚wƒ½ä¼šè°ƒç”¨gen_tcp:recvåQŒé€ æˆé˜Õd¡žåQˆå•˜q›ç¨‹æ¨¡å¼ä¸‹å°±åªèƒ½æ¶ˆæž½{‰å¾…某一个具体的客户端套接字åQŒå¾ˆå±é™©åQ‰ã€‚需要注意的是,操作¾pȝ»Ÿå¯èƒ½˜q˜ä¼šåšä¸€äº›ç¼“存允许客æˆïL«¯æœºå™¨¾l§ç®‹å‘送少量数据,然后才会ž®†å…¶é˜Õd¡žåQŒæ­¤æ—¶Erlangž®šæœªè°ƒç”¨recv函数ã€?/p> <p>混合型模式(半阻塞)åQŒä‹É用选项{active, once}打开åQŒä¸»åŠ¨ä»…é’ˆå¯¹ä¸€ä¸ªæ¶ˆæ¯ï¼Œåœ¨æŽ§åˆ¶è¿›½E‹å‘送完一个数据消息后åQŒå¿…™åÀL˜¾½Cø™°ƒç”¨inet:setopts(Socket, [{active, once}])重新‹È€‹zÖM»¥ä¾¿æŽ¥å—下一个消息(在此之前åQŒç³»¾lŸå¤„于阻塞状态)。可见,混合型模式综合了ä¸ÕdŠ¨æ¨¡å¼å’Œè¢«åŠ¨æ¨¡å¼çš„ä¸¤è€…ä¼˜åŠ¿ï¼Œå¯å®žçŽ°æµé‡æŽ§åˆÓž¼Œé˜²æ­¢æœåŠ¡å™¨è¢«˜q‡å¤šæ¶ˆæ¯æ·Ò޲¡ã€?/p> <p>以下TCP Server代码åQŒéƒ½æ˜¯å¾ç«‹åœ¨æ··åˆåž‹æ¨¡å¼ï¼ˆåŠé˜»å¡žï¼‰åŸºç¡€ä¸Šã€?/p> <h2><strong>prim_inet相关说明</strong></h2> <p>prim_inet没有官方文档åQŒå¯ä»¥è®¤ä¸ºæ˜¯å¯¹åº•层socket的直接包装。淘å®?a target="_blank">yufeng</a>è¯ß_¼Œ˜q™æ˜¯otp内部实现的细èŠ?是针对Erlang库开发者的private moduleåQŒåº•层模块,不推荐ä‹É用。但åœ?a target="_blank">Building a Non-blocking TCP server using OTP principles</a>½Cø™Œƒä¸­æ¼”½CÞZº†prim_inet操作Socket异步ç‰ÒŽ€§ã€?/p> <h2><strong>设计模式</strong></h2> <p>一般来è¯ß_¼Œéœ€è¦ä¸€ä¸ªå•独进½E‹è¿›è¡Œå®¢æˆïL«¯å¥—接字监听,每一个子˜q›ç¨‹˜q›è¡Œå¤„理来自具体客户端的socketè¯äh±‚ã€?/p> <p>åœ?a target="_blank">Building a Non-blocking TCP server using OTP principles</a>½Cø™Œƒä¸­ï¼Œå­è¿›½E‹ä‹É用gen_fsm处理åQŒå¾ˆå·§å¦™çš„结合状态机和消息事ä»Óž¼Œå€¼å¾—学习ã€?/p> <p>åœ?a target="_blank">Erlang: A Generalized TCP Server</a>文章中,作者也是ä‹É用此模式åQŒä½†å­è¿›½E‹ä¸½W¦åˆOTP规范åQŒå› æ­¤ä¸ªäºø™®¤ä¸ÞZ¸æ˜¯ä¸€ä¸ªå¾ˆå¥½çš„实践模式ã€?/p> <h2><strong>simple_one_for_one</strong></h2> <p>½Ž€æ˜“的一对一监督˜q›ç¨‹åQŒç”¨æ¥åˆ›å»ÞZ¸€¾l„动态子˜q›ç¨‹ã€‚å¯¹äºŽéœ€è¦åÆˆå‘å¤„ç†å¤šä¸ªè¯·æ±‚çš„æœåŠ¡å™¨è¾ƒä¸ºåˆé€‚ã€‚æ¯”å¦‚socket 服务端接受新的客æˆïL«¯˜qžæŽ¥è¯äh±‚以后åQŒéœ€è¦åŠ¨æ€åˆ›å»ÞZ¸€ä¸ªæ–°çš„socket˜qžæŽ¥å¤„理子进½E‹ã€‚若遵守OTP原则åQŒé‚£ž®±æ˜¯å­ç›‘督进½E‹ã€?/p> <h2><strong>TCP Server实现</strong> </h2> <h3><strong>åŸÞZºŽæ ‡å‡†API½Ž€å•实çŽ?/strong></h3> <p>也是åŸÞZºŽ{active, once}模式åQŒä½†é˜Õd¡žçš„等待下一个客æˆïL«¯˜qžæŽ¥çš„ä“Q务被抛给了子监督˜q›ç¨‹ã€?/p> <p>看一下入口tcp_server_appå?/p><script src="https://gist.github.com/3945140.js?file=tcp_server_app.erl"></script> <p>è¯Õd–端口åQŒç„¶åŽå¯åŠ¨ä¸»ç›‘ç£˜q›ç¨‹åQˆæ­¤æ—¶è¿˜ä¸ä¼šç›‘听处理客户端socketè¯äh±‚åQ‰ï¼Œç´§æŽ¥ç€å¯åŠ¨å­ç›‘ç£è¿›½E‹ï¼Œå¼€å§‹å¤„理来自客æˆïL«¯çš„socket的连接ã€?/p> <p>监督˜q›ç¨‹tcp_server_sup也很½Ž€å•:</p><script src="https://gist.github.com/3945155.js?file=tcp_server_sup.erl"></script> <p>需要注意的是,只有调用start_child函数æ—Óž¼Œæ‰çœŸæ­£è°ƒç”¨tcp_server_handler:start_link([LSock])函数ã€?/p> <p>tcp_server_handler的代码也不复杂:</p><script src="https://gist.github.com/3945175.js?file=tcp_server_handler.erl"></script> <p>代码很精巧,有些ž®æŠ€å·§åœ¨é‡Œé¢ã€‚子监督˜q›ç¨‹è°ƒç”¨start_link函数åQŒinit会返回{ok, #state{lsock = Socket}, 0}. æ•°å­—0代表了timeoutæ•°å€û|¼Œæ„å‘³ç€gen_server马上调用handle_info(timeout, #state{lsock = LSock} = State)函数åQŒæ‰§è¡Œå®¢æˆïL«¯socket监听åQŒé˜»å¡žäºŽæ­¤ï¼Œä½†ä¸ä¼šåª„响在此模式下其它函数的调用。直到有客户端进来,然后启动一个新的子监督˜q›ç¨‹tcp_server_handleråQŒå½“前子监督˜q›ç¨‹è§£é™¤é˜Õd¡žã€?/p> <p> </p> <h3><strong>åŸÞZºŽprim_inet实现</strong></h3> <p>˜q™ä¸ªå®žçŽ°å¸ˆä»ŽäºŽNon-blocking TCP server using OTP principles一文,但子˜q›ç¨‹æ”¹äؓ了gen_server实现ã€?/p> <p>看一看入口,很简单的åQ?/p><script src="https://gist.github.com/3945273.js?file=tcp_server_app.erl"></script> <p>监督˜q›ç¨‹ä»£ç åQ?/p><script src="https://gist.github.com/3945280.js?file=tcp_server_sup.erl"></script> <p>½{–ç•¥ä¸ä¸€æ øP¼Œone_for_one包括了一个监听进½E‹tcp_listeneråQŒè¿˜åŒ…含了一个tcp_client_sup˜q›ç¨‹æ ?simple_one_for_one½{–ç•¥)</p> <p>tcp_listener单独一个进½E‹ç”¨äºŽç›‘听来自客æˆïL«¯socket的连æŽ?</p><script src="https://gist.github.com/3945295.js?file=tcp_listener.erl"></script> <p>很显ç„Óž¼ŒæŽ¥æ”¶å®¢æˆ·ç«¯çš„˜qžæŽ¥ä¹‹åŽåQŒè{交给tcp_client_handler模块˜q›è¡Œå¤„理åQ?/p><script src="https://gist.github.com/3945302.js?file=tcp_client_handler.erl"></script> <p>和标准APIå¯Òޝ”一下,可以感受到异步IO的好处ã€?/p> <h2><strong>ž®ç»“</strong></h2> <p>通过不同的模式,½Ž€å•实çŽîC¸€ä¸ªåŸºäºŽErlang OTPçš„TCP服务器,也是学习æ€È»“åQŒä¸è‡³äºŽå¿˜è®°ã€?/p> <p>æ‚¨è‹¥æœ‰æ›´å¥½çš„å»ø™®®åQŒæ¬¢˜qŽå‘ŠçŸ¥ï¼Œè°¢è°¢ã€?/p> <h2><strong>参考资æ–?/strong></h2> <ol> <li><a target="_blank">Building a Non-blocking TCP server using OTP principles</a></li> <li><a target="_blank">Erlang: A Generalized TCP Server</a></li> <li>《Erlang½E‹åºè®¾è®¡ã€?/li> <li>《Erlang/OTPòq¶å‘¾~–程实战ã€?/li></ol><img src ="http://www.aygfsteel.com/yongboy/aggbug/390185.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/yongboy/" target="_blank">nieyong</a> 2012-10-24 18:14 <a href="http://www.aygfsteel.com/yongboy/archive/2012/10/24/390185.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss> <footer> <div class="friendship-link"> <a href="http://www.aygfsteel.com/" title="狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频">狠狠久久亚洲欧美专区_中文字幕亚洲综合久久202_国产精品亚洲第五区在线_日本免费网站视频</a> </div> </footer> Ö÷Õ¾Ö©Öë³ØÄ£°å£º <a href="http://" target="_blank">ÔæÇ¿ÏØ</a>| <a href="http://" target="_blank">ÐËÉ½ÏØ</a>| <a href="http://" target="_blank">¹ÌÊ¼ÏØ</a>| <a href="http://" target="_blank">²ýÀÖÏØ</a>| <a href="http://" target="_blank">ÁÉÔ´ÊÐ</a>| <a href="http://" target="_blank">¹ãÎ÷</a>| <a href="http://" target="_blank">Ç­ÄÏ</a>| <a href="http://" target="_blank">ÁÙ¹ðÏØ</a>| <a href="http://" target="_blank">ÖñÉ½ÏØ</a>| <a href="http://" target="_blank">Æ¤É½ÏØ</a>| <a href="http://" target="_blank">ÄÏÏªÏØ</a>| <a href="http://" target="_blank">Ð˺ÍÏØ</a>| <a href="http://" target="_blank">á°¸ÞÏØ</a>| <a href="http://" target="_blank">¹ÌÊ¼ÏØ</a>| <a href="http://" target="_blank">вýÏØ</a>| <a href="http://" target="_blank">ÑÓ³¤ÏØ</a>| <a href="http://" target="_blank">¦µ×ÊÐ</a>| <a href="http://" target="_blank">°¢°Ó</a>| <a href="http://" target="_blank">ºìÔ­ÏØ</a>| <a href="http://" target="_blank">ͨ»¯ÏØ</a>| <a href="http://" target="_blank">Í­¹ÄÏØ</a>| <a href="http://" target="_blank">ÖÛÇúÏØ</a>| <a href="http://" target="_blank">ÀûÐÁÏØ</a>| <a href="http://" target="_blank">Çø¡£</a>| <a href="http://" target="_blank">ͨ³ÇÏØ</a>| <a href="http://" target="_blank">´ó×ãÏØ</a>| <a href="http://" target="_blank">·¿²ú</a>| <a href="http://" target="_blank">ʱÉÐ</a>| <a href="http://" target="_blank">¹ÝÌÕÏØ</a>| <a href="http://" target="_blank">°½ººÆì</a>| <a href="http://" target="_blank">¸·ÄÏÏØ</a>| <a href="http://" target="_blank">ÈýÃÅÏ¿ÊÐ</a>| <a href="http://" target="_blank">ÕòÔ­ÏØ</a>| <a href="http://" target="_blank">ºìºÓÏØ</a>| <a href="http://" target="_blank">ÔÆÁúÏØ</a>| <a href="http://" target="_blank">ÈÄÆ½ÏØ</a>| <a href="http://" target="_blank">ÓÀÉÆÏØ</a>| <a href="http://" target="_blank">ʯÃÞÏØ</a>| <a href="http://" target="_blank">°²ÑôÏØ</a>| <a href="http://" target="_blank">Õã½­Ê¡</a>| <a href="http://" target="_blank">¹ÝÌÕÏØ</a>| <script> (function(){ var bp = document.createElement('script'); var curProtocol = window.location.protocol.split(':')[0]; if (curProtocol === 'https') { bp.src = 'https://zz.bdstatic.com/linksubmit/push.js'; } else { bp.src = 'http://push.zhanzhang.baidu.com/push.js'; } var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(bp, s); })(); </script> </body>