聶永的博客

          記錄工作/學(xué)習(xí)的點(diǎn)點(diǎn)滴滴。

          隨手記之Linux 2.6.32內(nèi)核SYN flooding警告信息

          前言

          新申請(qǐng)的服務(wù)器內(nèi)核為2.6.32,原先的TCP Server直接在新內(nèi)核的Linxu服務(wù)器上運(yùn)行,運(yùn)行dmesg命令,可以看到大量的SYN flooding警告:

          possible SYN flooding on port 8080. Sending cookies.

          原先的2.6.18內(nèi)核的參數(shù)在2.6.32內(nèi)核版本情況下,簡(jiǎn)單調(diào)整"net.ipv4.tcp_max_syn_backlog"已經(jīng)沒(méi)有作用。

          怎么辦,只能再次閱讀2.6.32源碼,以下即是。

          最后小結(jié)處有直接結(jié)論,心急的你可以直接閱讀總結(jié)好了。

          linux內(nèi)核2.6.32有關(guān)backlog值分析

          net/Socket.c:

          SYSCALL_DEFINE2(listen, int, fd, int, backlog)
          {
              struct socket *sock;
              int err, fput_needed;
              int somaxconn;
          
              sock = sockfd_lookup_light(fd, &err, &fput_needed);
              if (sock) {
                  somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn;
                  if ((unsigned)backlog > somaxconn)
                      backlog = somaxconn;
          
                  err = security_socket_listen(sock, backlog);
                  if (!err)
                      err = sock->ops->listen(sock, backlog);
          
                  fput_light(sock->file, fput_needed);
              }
              return err;
          }
          

          net/ipv4/Af_inet.c:

          /*
           *  Move a socket into listening state.
           */
          int inet_listen(struct socket *sock, int backlog)
          {
              struct sock *sk = sock->sk;
              unsigned char old_state;
              int err;
          
              lock_sock(sk);
          
              err = -EINVAL;
              if (sock->state != SS_UNCONNECTED || sock->type != SOCK_STREAM)
                  goto out;
          
              old_state = sk->sk_state;
              if (!((1 << old_state) & (TCPF_CLOSE | TCPF_LISTEN)))
                  goto out;
          
              /* Really, if the socket is already in listen state
               * we can only allow the backlog to be adjusted.
               */
              if (old_state != TCP_LISTEN) {
                  err = inet_csk_listen_start(sk, backlog);
                  if (err)
                      goto out;
              }
              sk->sk_max_ack_backlog = backlog;
              err = 0;
          
          out:
              release_sock(sk);
              return err;
          }
          

          inet_listen調(diào)用inet_csk_listen_start函數(shù),所傳入的backlog參數(shù)改頭換面,變成了不可修改的常量nr_table_entries了。

          net/ipv4/Inet_connection_sock.c:

          int inet_csk_listen_start(struct sock *sk, const int nr_table_entries)
          {
              struct inet_sock *inet = inet_sk(sk);
              struct inet_connection_sock *icsk = inet_csk(sk);
              int rc = reqsk_queue_alloc(&icsk->icsk_accept_queue, nr_table_entries);
          
              if (rc != 0)
                  return rc;
          
              sk->sk_max_ack_backlog = 0;
              sk->sk_ack_backlog = 0;
              inet_csk_delack_init(sk);
          
              /* There is race window here: we announce ourselves listening,
               * but this transition is still not validated by get_port().
               * It is OK, because this socket enters to hash table only
               * after validation is complete.
               */
              sk->sk_state = TCP_LISTEN;
              if (!sk->sk_prot->get_port(sk, inet->num)) {
                  inet->sport = htons(inet->num);
          
                  sk_dst_reset(sk);
                  sk->sk_prot->hash(sk);
          
                  return 0;
              }
          
              sk->sk_state = TCP_CLOSE;
              __reqsk_queue_destroy(&icsk->icsk_accept_queue);
              return -EADDRINUSE;
          }
          

          下面處理的是TCP SYN_RECV狀態(tài)的連接,處于握手階段,也可以說(shuō)是半連接時(shí),等待著連接方第三次握手。

          /*
           * Maximum number of SYN_RECV sockets in queue per LISTEN socket.
           * One SYN_RECV socket costs about 80bytes on a 32bit machine.
           * It would be better to replace it with a global counter for all sockets
           * but then some measure against one socket starving all other sockets
           * would be needed.
           *
           * It was 128 by default. Experiments with real servers show, that
           * it is absolutely not enough even at 100conn/sec. 256 cures most
           * of problems. This value is adjusted to 128 for very small machines
           * (<=32Mb of memory) and to 1024 on normal or better ones (>=256Mb).
           * Note : Dont forget somaxconn that may limit backlog too.
           */
          int reqsk_queue_alloc(struct request_sock_queue *queue,
                        unsigned int nr_table_entries)
          {
              size_t lopt_size = sizeof(struct listen_sock);
              struct listen_sock *lopt;
              nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
              nr_table_entries = max_t(u32, nr_table_entries, 8);
              nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
              lopt_size += nr_table_entries * sizeof(struct request_sock *); 
              if (lopt_size > PAGE_SIZE)
                  lopt = __vmalloc(lopt_size,
                      GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO,
                      PAGE_KERNEL);
              else
                  lopt = kzalloc(lopt_size, GFP_KERNEL);
              if (lopt == NULL)
                  return -ENOMEM;
          
              for (lopt->max_qlen_log = 3;
                   (1 << lopt->max_qlen_log) < nr_table_entries;
                   lopt->max_qlen_log++);
          
              get_random_bytes(&lopt->hash_rnd, sizeof(lopt->hash_rnd));
              rwlock_init(&queue->syn_wait_lock);
              queue->rskq_accept_head = NULL;
              lopt->nr_table_entries = nr_table_entries;
          
              write_lock_bh(&queue->syn_wait_lock);
              queue->listen_opt = lopt;
              write_unlock_bh(&queue->syn_wait_lock);
          
              return 0;
          }
          

          關(guān)鍵要看nr_table_entries變量,在reqsk_queue_alloc函數(shù)中nr_table_entries變成了無(wú)符號(hào)變量,可修改的,變化受限。

          比如實(shí)際內(nèi)核參數(shù)值為:

          net.ipv4.tcp_max_syn_backlog = 65535

          所傳入的backlog(不大于net.core.somaxconn = 65535)為8102,那么

          // 取listen函數(shù)的backlog和sysctl_max_syn_backlog最小值,結(jié)果為8102
          nr_table_entries = min_t(u32, nr_table_entries, sysctl_max_syn_backlog);
          // 取nr_table_entries和8進(jìn)行比較的最大值,結(jié)果為8102
          nr_table_entries = max_t(u32, nr_table_entries, 8);
          // 可看做 nr_table_entries*2,結(jié)果為8102*2=16204
          nr_table_entries = roundup_pow_of_two(nr_table_entries + 1);
          
          計(jì)算結(jié)果,max_qlen_log = 14
          

          2.6.18內(nèi)核中max_qlen_log的計(jì)算方法

          for (lopt->max_qlen_log = 6;
               (1 << lopt->max_qlen_log) < sysctl_max_syn_backlog;
               lopt->max_qlen_log++);
          
          1. 很顯然,sysctl_max_syn_backlog參與了運(yùn)算,sysctl_max_syn_backlog值很大的話會(huì)導(dǎo)致max_qlen_log值相對(duì)比也很大
          2. 若sysctl_max_syn_backlog=65535,那么max_qlen_log=16
          3. 2.6.18內(nèi)核中半連接長(zhǎng)度為2^16=65536

          作為listen_sock結(jié)構(gòu)定義了需要處理的處理半連接的隊(duì)列元素個(gè)數(shù)為nr_table_entries,此例中為16204長(zhǎng)度。

          /** struct listen_sock - listen state
           *
           * @max_qlen_log - log_2 of maximal queued SYNs/REQUESTs
           */
          struct listen_sock {
              u8          max_qlen_log;
              /* 3 bytes hole, try to use */
              int         qlen;
              int         qlen_young;
              int         clock_hand;
              u32         hash_rnd;
              u32         nr_table_entries;
              struct request_sock *syn_table[0];
          };
          

          經(jīng)描述而知,2^max_qlen_log = 半連接隊(duì)列長(zhǎng)度qlen值。

          再回頭看看報(bào)告SYN flooding的函數(shù):

          net/ipv4/Tcp_ipv4.c

          #ifdef CONFIG_SYN_COOKIES
          static void syn_flood_warning(struct sk_buff *skb)
          {
              static unsigned long warntime;
          
              if (time_after(jiffies, (warntime + HZ * 60))) {
                  warntime = jiffies;
                  printk(KERN_INFO
                         "possible SYN flooding on port %d. Sending cookies.\n",
                         ntohs(tcp_hdr(skb)->dest));
              }
          }
          #endif
          

          被調(diào)用的處,已精簡(jiǎn)若干代碼:

          int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
          {
          ......
          #ifdef CONFIG_SYN_COOKIES
              int want_cookie = 0;
          #else
          #define want_cookie 0 /* Argh, why doesn't gcc optimize this :( */
          #endif
              ......
              /* TW buckets are converted to open requests without
               * limitations, they conserve resources and peer is
               * evidently real one.
               */
               // 判斷半連接隊(duì)列是否已滿 && !0
              if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
          #ifdef CONFIG_SYN_COOKIES
                  if (sysctl_tcp_syncookies) {
                      want_cookie = 1;
                  } else
          #endif
                  goto drop;
              }
          
              /* Accept backlog is full. If we have already queued enough
               * of warm entries in syn queue, drop request. It is better than
               * clogging syn queue with openreqs with exponentially increasing
               * timeout.
               */
              if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
                  goto drop;
          
              req = inet_reqsk_alloc(&tcp_request_sock_ops);
              if (!req)
                  goto drop;
          
              ......
          
              if (!want_cookie)
                  TCP_ECN_create_request(req, tcp_hdr(skb));
          
              if (want_cookie) {
          #ifdef CONFIG_SYN_COOKIES
                  syn_flood_warning(skb);
                  req->cookie_ts = tmp_opt.tstamp_ok;
          #endif
                  isn = cookie_v4_init_sequence(sk, skb, &req->mss);
              } else if (!isn) {
                  ......
              }       
              ......
          }
          

          判斷半連接隊(duì)列已滿的函數(shù)很關(guān)鍵,可以看看運(yùn)算法則:

          include/net/Inet_connection_sock.h:

          static inline int inet_csk_reqsk_queue_is_full(const struct sock *sk)
          {
              return reqsk_queue_is_full(&inet_csk(sk)->icsk_accept_queue);
          }
          

          include/net/Rquest_sock.h:

          static inline int reqsk_queue_is_full(const struct request_sock_queue *queue)
          {
              // 向右移位max_qlen_log個(gè)單位
              return queue->listen_opt->qlen >> queue->listen_opt->max_qlen_log;
          }
          

          返回1,自然表示半連接隊(duì)列已滿。

          以上僅僅是分析了半連接隊(duì)列已滿的判斷條件,總之應(yīng)用程序所傳入的backlog很關(guān)鍵,如值太小,很容易得到1.

          若 somaxconn = 128,sysctl_max_syn_backlog = 4096,backlog = 511 則最終 nr_table_entries = 256,max_qlen_log = 8。那么超過(guò)256個(gè)半連接的隊(duì)列,257 >> 8 = 1,隊(duì)列已滿。

          如何設(shè)置backlog,還得需要結(jié)合具體應(yīng)用程序,需要為其調(diào)用listen方法賦值。

          Netty backlog處理

          Tcp Server使用Netty 3.7 版本,版本較低,在處理backlog,若我們不手動(dòng)指定backlog值,JDK 1.6默認(rèn)為50。

          有證如下: java.net.ServerSocket:

          public void bind(SocketAddress endpoint, int backlog) throws IOException {
              if (isClosed())
                  throw new SocketException("Socket is closed");
              if (!oldImpl && isBound())
                  throw new SocketException("Already bound");
              if (endpoint == null)
                  endpoint = new InetSocketAddress(0);
              if (!(endpoint instanceof InetSocketAddress))
                  throw new IllegalArgumentException("Unsupported address type");
              InetSocketAddress epoint = (InetSocketAddress) endpoint;
              if (epoint.isUnresolved())
                  throw new SocketException("Unresolved address");
              if (backlog < 1)
                backlog = 50;
              try {
                  SecurityManager security = System.getSecurityManager();
                  if (security != null)
                  security.checkListen(epoint.getPort());
                  getImpl().bind(epoint.getAddress(), epoint.getPort());
                  getImpl().listen(backlog);
                  bound = true;
              } catch(SecurityException e) {
                  bound = false;
                  throw e;
              } catch(IOException e) {
                  bound = false;
                  throw e;
              }
          }
          

          netty中,處理backlog的地方:

          org/jboss/netty/channel/socket/DefaultServerSocketChannelConfig.java:

          @Override
          public boolean setOption(String key, Object value) {
              if (super.setOption(key, value)) {
                  return true;
              }
          
              if ("receiveBufferSize".equals(key)) {
                  setReceiveBufferSize(ConversionUtil.toInt(value));
              } else if ("reuseAddress".equals(key)) {
                  setReuseAddress(ConversionUtil.toBoolean(value));
              } else if ("backlog".equals(key)) {
                  setBacklog(ConversionUtil.toInt(value));
              } else {
                  return false;
              }
              return true;
          }
          

          既然需要我們手動(dòng)指定backlog值,那么可以這樣做:

          bootstrap.setOption("backlog", 8102); // 設(shè)置大一些沒(méi)有關(guān)系,系統(tǒng)內(nèi)核會(huì)自動(dòng)與net.core.somaxconn相比較,取最低值
          

          相對(duì)比Netty 4.0,有些不智能,可參考:http://www.aygfsteel.com/yongboy/archive/2014/07/30/416373.html

          小結(jié)

          在linux內(nèi)核2.6.32,若在沒(méi)有遭受到SYN flooding攻擊的情況下,可以適當(dāng)調(diào)整:

          sysctl -w net.core.somaxconn=32768

          sysctl -w net.ipv4.tcp_max_syn_backlog=65535

          sysctl -p

          另千萬(wàn)別忘記修改TCP Server的listen接口所傳入的backlog值,若不設(shè)置或者過(guò)小,都會(huì)有可能造成SYN flooding的警告信息。開(kāi)始不妨設(shè)置成1024,然后觀察一段時(shí)間根據(jù)實(shí)際情況需要再慢慢往上調(diào)。

          無(wú)論你如何設(shè)置,最終backlog值范圍為:

          backlog <= net.core.somaxconn

          半連接隊(duì)列長(zhǎng)度約為:

          半連接隊(duì)列長(zhǎng)度 ≈ 2 * min(backlog, net.ipv4.tcpmax_syn_backlog)

          另,若出現(xiàn)SYN flooding時(shí),此時(shí)TCP SYN_RECV數(shù)量表示半連接隊(duì)列已經(jīng)滿,可以查看一下:

          ss -ant | awk 'NR>1 {++s[$1]} END {for(k in s) print k,s[k]}'
          

          感謝運(yùn)維書(shū)坤小伙提供的比較好用查看命令。

          posted on 2014-08-20 20:43 nieyong 閱讀(12461) 評(píng)論(3)  編輯  收藏 所屬分類: Socket

          評(píng)論

          # re: 隨手記之Linux 2.6.32內(nèi)核SYN flooding警告信息 2014-08-21 13:24 互聯(lián)網(wǎng)思維

          了解下什么是互聯(lián)網(wǎng)思維  回復(fù)  更多評(píng)論   

          # re: 隨手記之Linux 2.6.32內(nèi)核SYN flooding警告信息 2014-08-22 08:52 全智賢代言

          bootstrap.setOption("backlog", 8102); ,,這個(gè)最低值是怎么來(lái)的?  回復(fù)  更多評(píng)論   

          # re: 隨手記之Linux 2.6.32內(nèi)核SYN flooding警告信息 2014-08-25 11:34 nieyong

          @全智賢代言
          原本應(yīng)該是8192,65536/8=8192。
          其實(shí),傳入8102,也未嘗不可。  回復(fù)  更多評(píng)論   

          公告

          所有文章皆為原創(chuàng),若轉(zhuǎn)載請(qǐng)標(biāo)明出處,謝謝~

          新浪微博,歡迎關(guān)注:

          導(dǎo)航

          <2014年8月>
          272829303112
          3456789
          10111213141516
          17181920212223
          24252627282930
          31123456

          統(tǒng)計(jì)

          常用鏈接

          留言簿(58)

          隨筆分類(130)

          隨筆檔案(151)

          個(gè)人收藏

          最新隨筆

          搜索

          最新評(píng)論

          閱讀排行榜

          評(píng)論排行榜

          主站蜘蛛池模板: 磴口县| 手机| 克拉玛依市| 精河县| 吐鲁番市| 东兰县| 临夏县| 雷州市| 巴马| 财经| 和田市| 纳雍县| 石棉县| 东乡县| 杭州市| 同德县| 宝兴县| 呼玛县| 元氏县| 凯里市| 团风县| 嘉祥县| 平利县| 沅陵县| 耿马| 扶沟县| 平安县| 怀宁县| 资兴市| 彰化市| 霞浦县| 五华县| 吴堡县| 长阳| 行唐县| 宝坻区| 遵化市| 常州市| 介休市| 望江县| 淮南市|