posts - 431,  comments - 344,  trackbacks - 0
          公告
           Don't Repeat Yourself
          座右銘:you can lose your money, you can spent all of it, and if you work hard you get it all back. But if you waste your time, you're never gonna get it back.
          公告本博客在此聲明部分文章為轉摘,只做資料收集使用。


          微信: szhourui
          QQ:109450684
          Email
          lsi.zhourui@gmail.com
          <2009年10月>
          27282930123
          45678910
          11121314151617
          18192021222324
          25262728293031
          1234567

          留言簿(15)

          隨筆分類(1019)

          文章分類(3)

          文章檔案(21)

          收藏夾

          Link

          好友博客

          最新隨筆

          搜索

          •  

          積分與排名

          • 積分 - 860769
          • 排名 - 44

          最新評論

          閱讀排行榜

          /*  $RCSfile$
           *  $Author$
           *  $Date$
           *  $Revision$
           *
           *  Copyright (C) 1997-2007  The Chemistry Development Kit (CDK) project
           *
           *  Contact: cdk-devel@lists.sourceforge.net
           *
           *  This program is free software; you can redistribute it and/or
           *  modify it under the terms of the GNU Lesser General Public License
           *  as published by the Free Software Foundation; either version 2.1
           *  of the License, or (at your option) any later version.
           *  All we ask is that proper credit is given for our work, which includes
           *  - but is not limited to - adding the above copyright notice to the beginning
           *  of your source code files, and to any copyright notice that you may distribute
           *  with programs based on this work.
           *
           *  This program is distributed in the hope that it will be useful,
           *  but WITHOUT ANY WARRANTY; without even the implied warranty of
           *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
           *  GNU Lesser General Public License for more details.
           *
           *  You should have received a copy of the GNU Lesser General Public License
           *  along with this program; if not, write to the Free Software
           *  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
           *
           */
          package org.openscience.cdk.similarity;


          import org.openscience.cdk.annotations.TestClass;
          import org.openscience.cdk.annotations.TestMethod;
          import org.openscience.cdk.exception.CDKException;

          import Java.util.BitSet;

          /**
           *  Calculates the Tanimoto coefficient for a given pair of two
           *  fingerprint bitsets or real valued feature vectors.
           *
           *  The Tanimoto coefficient is one way to
           *  quantitatively measure the "distance" or similarity of
           *  two chemical structures.
           *
           *  <p>You can use the FingerPrinter class to retrieve two fingerprint bitsets.
           *  We assume that you have two structures stored in cdk.Molecule objects.
           *  A tanimoto coefficient can then be calculated like:
           *  <pre>
           *   BitSet fingerprint1 = Fingerprinter.getFingerprint(molecule1);
           *   BitSet fingerprint2 = Fingerprinter.getFingerprint(molecule2);
           *   float tanimoto_coefficient = Tanimoto.calculate(fingerprint1, fingerprint2);
           *  </pre>
           *
           *  <p>The FingerPrinter assumes that hydrogens are explicitely given, if this
           *  is desired!
           *  <p>Note that the continuous Tanimoto coefficient does not lead to a metric space
           *
           *@author         steinbeck
           * @cdk.githash
           *@cdk.created    2005-10-19
           *@cdk.keyword    jaccard
           *@cdk.keyword    similarity, tanimoto
           * @cdk.module fingerprint
           */
          @TestClass("org.openscience.cdk.similarity.TanimotoTest")
          public class Tanimoto
          {

              /**
               * Evaluates Tanimoto coefficient for two bit sets.
               *
               * @param bitset1 A bitset (such as a fingerprint) for the first molecule
               * @param bitset2 A bitset (such as a fingerprint) for the second molecule
               * @return The Tanimoto coefficient
               * @throws org.openscience.cdk.exception.CDKException  if bitsets are not of the same length
               */
              @TestMethod("testTanimoto1,testTanimoto2")
              public static float calculate(BitSet bitset1, BitSet bitset2) throws CDKException
              {
                  float _bitset1_cardinality = bitset1.cardinality();
                  float _bitset2_cardinality = bitset2.cardinality();
                  if (bitset1.size() != bitset2.size()) {
                      throw new CDKException("Bisets must have the same bit length");
                  }
                  BitSet one_and_two = (BitSet)bitset1.clone();
                  one_and_two.and(bitset2);
                  float _common_bit_count = one_and_two.cardinality();
                  return _common_bit_count/(_bitset1_cardinality + _bitset2_cardinality - _common_bit_count);
              }
             
              /**
               * Evaluates the continuous Tanimoto coefficient for two real valued vectors.
               *
               * @param features1 The first feature vector
               * @param features2 The second feature vector
               * @return The continuous Tanimoto coefficient
               * @throws org.openscience.cdk.exception.CDKException  if the features are not of the same length
               */
              @TestMethod("testTanimoto3")
              public static float calculate(double[] features1, double[] features2) throws CDKException {

                  if (features1.length != features2.length) {
                      throw new CDKException("Features vectors must be of the same length");
                  }

                  int n = features1.length;
                  double ab = 0.0;
                  double a2 = 0.0;
                  double b2 = 0.0;

                  for (int i = 0; i < n; i++) {
                      ab += features1[i] * features2[i];
                      a2 += features1[i]*features1[i];
                      b2 += features2[i]*features2[i];
                  }
                  return (float)ab/(float)(a2+b2-ab);
              }
          }

          通過源碼可以看出calculate(BitSet bitset1, BitSet bitset2)方法,是通過比較兩個分子的fingerprint的位,來計算相似度.通過BitSet的and操作得到共同的個數,然后在除以總共為true的個數,這樣就得到相似值.

          posted on 2009-10-18 13:36 周銳 閱讀(489) 評論(0)  編輯  收藏 所屬分類: ChemistryJavaCDK
          主站蜘蛛池模板: 阿鲁科尔沁旗| 资兴市| 太谷县| 新疆| 腾冲县| 台前县| 永和县| 思南县| 寿阳县| 洪江市| 新宾| 大石桥市| 饶阳县| 松滋市| 宣恩县| 苍梧县| 工布江达县| 乐山市| 嘉祥县| 砀山县| 兴山县| 白城市| 关岭| 聂荣县| 和平县| 博罗县| 安吉县| 金坛市| 台南县| 清新县| 琼中| 小金县| 利津县| 永靖县| 余江县| 皋兰县| 屯门区| 宝清县| 上饶市| 商南县| 曲麻莱县|