Skynet

---------- ---------- 我的新 blog : liukaiyi.cublog.cn ---------- ----------

:: 管理

112 Posts :: 1 Stories :: 49 Comments :: 0 Trackbacks

參考 : http://mifunny.info/python-image-watermark-script-68.html

下載：http://code.google.com/p/nothing-at-all/downloads/detail?name=watermark_py_20080724.tar.bz2&can=2&q=

直接解壓:
easy_install PIL
>>python "watermark.py" "C:"Documents and Settings"lky"桌面"py"fangfei.jpg" 2

Save to C:"Documents and Settings"lky"桌面"py"fangfei_watermark.jpg

posted @ 2009-10-27 23:49 劉凱毅閱讀(1047) | 評(píng)論 (0) | 編輯收藏

python telnet 服務(wù)器

telnet 192.168.101.103 8014

import threading

class myThread(threading.Thread):
    def __init__(self,conn,add):
        threading.Thread.__init__(self)
        self.inputstr = ''
    self.connection=conn
    self.address=add
    def run(self):
    ii=0
        while True:
        self.connection.settimeout(50)
            buf = self.connection.recv(1024)
        if  buf.rfind("\n") > -1 :
                print "**-"+self.inputstr
                self.connection.close()
                break
            else:
                self.inputstr+=buf
        if ii==0:
            self.connection.send(buf)
        ii+=1
                continue




if __name__ == '__main__':
   import socket
   sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   sock.bind(('192.168.101.103', 8014))
   sock.listen(5)
   while True:
       try:
           connection,address = sock.accept()
           ithread=myThread(connection,address)
           ithread.start()
       except socket.timeout:
           print 'time out'

posted @ 2009-10-27 19:16 劉凱毅閱讀(1865) | 評(píng)論 (0) | 編輯收藏

mysql數(shù)據(jù)定時(shí)導(dǎo)入腳本(shell)

在服務(wù)器上發(fā)現(xiàn)的定時(shí)的導(dǎo)入腳本，記錄下來(lái)。還是很有用的
30 06 * * * /data/dmsp/shell/crontab_search_stats_import.sh >> /data/dmsp/logs/crontab_search_stats_import.log

#!/bin/sh

y=$(date -d "-1 day" +%Y)
m=$(date -d "-1 day" +%m)
d=$(date -d "-1 day" +%d)

#/data/dmsp/ftp/dim_stats/2009/09/14

act=dim_stats
mypath=/data/dmsp/ftp/${act}/${y}/${m}/$wmqeeuq/

echo $mypath
statpath=${mypath}${act}${y}${m}$wmqeeuq.tar.bz2.state

for(( j=1; j<10000; j=j+1))
do
    if [ -d "${statpath}" ] ; then
              tar xjf ${mypath}${act}${y}${m}$wmqeeuq.tar.bz2 -C ${mypath}
                mysql -h 127.0.0.1 -P3306 -u root -pmysql -e "LOAD DATA INFILE '${mypath}part-00000' INTO TABLE dmsp.dmsp_veidoo character set utf8 FIELDS TERMINATED BY '\t' lines terminated by '\n'";
        break
    else
        echo "[${j}:1000] not ready. sleep 10 seconds then retry."
        sleep 10
    fi
don

posted @ 2009-10-23 10:51 劉凱毅閱讀(2435) | 評(píng)論 (1) | 編輯收藏

數(shù)據(jù)挖掘研究?jī)?nèi)容和本質(zhì)（轉(zhuǎn)）

	數(shù)據(jù)挖掘研究?jī)?nèi)容和本質(zhì)
	隨著DMKD研究逐步走向深入，數(shù)據(jù)挖掘和知識(shí)發(fā)現(xiàn)的研究已經(jīng)形成了三根強(qiáng)大的技術(shù)支柱：數(shù)據(jù)庫(kù)、人工智能和數(shù)理統(tǒng)計(jì)。因此，KDD大會(huì)程序委員會(huì)曾經(jīng)由這三個(gè)學(xué)科的權(quán)威人物同時(shí)來(lái)任主席。目前DMKD的主要研究?jī)?nèi)容包括基礎(chǔ)理論、發(fā)現(xiàn)算法、數(shù)據(jù)倉(cāng)庫(kù)、可視化技術(shù)、定性定量互換模型、知識(shí)表示方法、發(fā)現(xiàn)知識(shí)的維護(hù)和再利用、半結(jié)構(gòu)化和非結(jié)構(gòu)化數(shù)據(jù)中的知識(shí)發(fā)現(xiàn)以及網(wǎng)上數(shù)據(jù)挖掘等。數(shù)據(jù)挖掘所發(fā)現(xiàn)的知識(shí)最常見的有以下四類：
-	廣義知識(shí)（Generalization）
	廣義知識(shí)指類別特征的概括性描述知識(shí)。根據(jù)數(shù)據(jù)的微觀特性發(fā)現(xiàn)其表征的、帶有普遍性的、較高層次概念的、中觀和宏觀的知識(shí)，反映同類事物共同性質(zhì)，是對(duì)數(shù)據(jù)的概括、精煉和抽象。廣義知識(shí)的發(fā)現(xiàn)方法和實(shí)現(xiàn)技術(shù)有很多，如數(shù)據(jù)立方體、面向?qū)傩缘臍w約等。數(shù)據(jù)立方體還有其他一些別名，如“多維數(shù)據(jù)庫(kù)”、“實(shí)現(xiàn)視圖”、“OLAP"等。該方法的基本思想是實(shí)現(xiàn)某些常用的代價(jià)較高的聚集函數(shù)的計(jì)算，諸如計(jì)數(shù)、求和、平均、最大值等，并將這些實(shí)現(xiàn)視圖儲(chǔ)存在多維數(shù)據(jù)庫(kù)中。既然很多聚集函數(shù)需經(jīng) 常重復(fù)計(jì)算，那么在多維數(shù)據(jù)立方體中存放預(yù)先計(jì)算好的結(jié)果將能保證快速響應(yīng)，并可靈活地提供不同角度和不同抽象層次上的數(shù)據(jù)視圖。另一種廣義知識(shí)發(fā)現(xiàn)方法是加拿大SimonFraser大學(xué)提出的面向?qū)傩缘臍w約方法。這種方法以類SQL語(yǔ)言表示數(shù)據(jù)挖掘查詢，收集數(shù)據(jù)庫(kù)中的相關(guān)數(shù)據(jù)集，然后在相關(guān)數(shù)據(jù)集上應(yīng)用一系列數(shù)據(jù)推廣技術(shù)進(jìn)行數(shù)據(jù)推廣，包括屬性刪除、概念樹提升、屬性閾值控制、計(jì)數(shù)及其他聚集函數(shù)傳播等。

-	關(guān)聯(lián)知識(shí)（Association）
	它反映一個(gè)事件和其他事件之間依賴或關(guān)聯(lián)的知識(shí)。如果兩項(xiàng)或多項(xiàng)屬性之間存在關(guān)聯(lián)，那么其中一項(xiàng)的屬性值就可以依據(jù)其他屬性值進(jìn)行預(yù)測(cè)。最為著名的關(guān)聯(lián)規(guī)則發(fā)現(xiàn)方法是R.Agrawal提出的Apriori算法。關(guān)聯(lián)規(guī)則的發(fā)現(xiàn)可分為兩步。第一步是迭代識(shí)別所有的頻繁項(xiàng)目集，要求頻繁項(xiàng)目集的支持率不低于用戶設(shè)定的最低值；第二步是從頻繁項(xiàng)目集中構(gòu)造可信度不低于用戶設(shè)定的最低值的規(guī)則。識(shí)別或發(fā)現(xiàn)所有頻繁項(xiàng)目集是關(guān)聯(lián)規(guī)則發(fā)現(xiàn)算法的核心，也是計(jì)算量最大的部分。

-	分類知識(shí)(Classification＆Clustering)
	它反映同類事物共同性質(zhì)的特征型知識(shí)和不同事物之間的差異型特征知識(shí)。最為典型的分類方法是基于決策樹的分類方法。它是從實(shí)例集中構(gòu)造決策樹，是一種有指導(dǎo)的學(xué)習(xí)方法。該方法先根據(jù)訓(xùn)練子集（又稱為窗口）形成決策樹。如果該樹不能對(duì)所有對(duì)象給出正確的分類，那么選擇一些例外加入到窗口中，重復(fù)該過程一直到形成正確的決策集。最終結(jié)果是一棵樹，其葉結(jié)點(diǎn)是類名，中間結(jié)點(diǎn)是帶有分枝的屬性，該分枝對(duì)應(yīng)該屬性的某一可能值。最為典型的決策樹學(xué)習(xí)系統(tǒng)是ID3，它采用自頂向下不回溯策略，能保證找到一個(gè)簡(jiǎn)單的樹。算法C4.5和C5.0都是ID3的擴(kuò)展，它們將分類領(lǐng)域從類別屬性擴(kuò)展到數(shù)值型屬性。數(shù)據(jù)分類還有統(tǒng)計(jì)、粗糙集（RoughSet）等方法。線性回歸和線性辨別分析是典型的統(tǒng)計(jì)模型。為降低決策樹生成代價(jià)，人們還提出了一種區(qū)間分類器。最近也有人研究使用神經(jīng)網(wǎng)絡(luò)方法在數(shù)據(jù)庫(kù)中進(jìn)行分類和規(guī)則提取。

-	預(yù)測(cè)型知識(shí)（Prediction）
	它根據(jù)時(shí)間序列型數(shù)據(jù)，由歷史的和當(dāng)前的數(shù)據(jù)去推測(cè)未來(lái)的數(shù)據(jù)，也可以認(rèn)為是以時(shí)間為關(guān)鍵屬性的關(guān)聯(lián)知識(shí)。目前，時(shí)間序列預(yù)測(cè)方法有經(jīng)典的統(tǒng)計(jì)方法、神經(jīng)網(wǎng)絡(luò)和機(jī)器學(xué)習(xí)等。1968年Box和Jenkins提出了一套比較完善的時(shí)間序列建模理論和分析方法，這些經(jīng)典的數(shù)學(xué)方法通過建立隨機(jī)模型，如自回歸模型、自回歸滑動(dòng)平均模型、求和自回歸滑動(dòng)平均模型和季節(jié)調(diào)整模型等，進(jìn)行時(shí)間序列的預(yù)測(cè)。由于大量的時(shí)間序列是非平穩(wěn)的，其特征參數(shù)和數(shù)據(jù)分布隨著時(shí)間的推移而發(fā)生變化。因此，僅僅通過對(duì)某段歷史數(shù)據(jù)的訓(xùn)練，建立單一的神經(jīng)網(wǎng)絡(luò)預(yù)測(cè)模型，還無(wú)法完成準(zhǔn)確的預(yù)測(cè)任務(wù)。為此，人們提出了基于統(tǒng)計(jì)學(xué)和基于精確性的再訓(xùn)練方法，當(dāng)發(fā)現(xiàn)現(xiàn)存預(yù)測(cè)模型不再適用于當(dāng)前數(shù)據(jù)時(shí)，對(duì)模型重新訓(xùn)練，獲得新的權(quán)重參數(shù)，建立新的模型。也有許多系統(tǒng)借助并行算法的計(jì)算優(yōu)勢(shì)進(jìn)行時(shí)間序列預(yù)測(cè)。

-	偏差型知識(shí)(Deviation)
	此外，還可以發(fā)現(xiàn)其他類型的知識(shí)，如偏差型知識(shí)(Deviation)，它是對(duì)差異和極端特例的描述，揭示事物偏離常規(guī)的異?，F(xiàn)象，如標(biāo)準(zhǔn)類外的特例，數(shù)據(jù)聚類外的離群值等。所有這些知識(shí)都可以在不同的概念層次上被發(fā)現(xiàn)，并隨著概念層次的提升，從微觀到中觀、到宏觀，以滿足不同用戶不同層次決策的需要。

	數(shù)據(jù)挖掘的功能
	數(shù)據(jù)挖掘通過預(yù)測(cè)未來(lái)趨勢(shì)及行為，做出前攝的、基于知識(shí)的決策。數(shù)據(jù)挖掘的目標(biāo)是從數(shù)據(jù)庫(kù)中發(fā)現(xiàn)隱含的、有意義的知識(shí)，主要有以下五類功能。

-	自動(dòng)預(yù)測(cè)趨勢(shì)和行為
	數(shù)據(jù)挖掘自動(dòng)在大型數(shù)據(jù)庫(kù)中尋找預(yù)測(cè)性信息，以往需要進(jìn)行大量手工分析的問題如今可以迅速直接由數(shù)據(jù)本身得出結(jié)論。一個(gè)典型的例子是市場(chǎng)預(yù)測(cè)問題，數(shù)據(jù)挖掘使用過去有關(guān)促銷的數(shù)據(jù)來(lái)尋找未來(lái)投資中回報(bào)最大的用戶，其它可預(yù)測(cè)的問題包括預(yù)報(bào)破產(chǎn)以及認(rèn)定對(duì)指定事件最可能作出反應(yīng)的群體。

-	關(guān)聯(lián)分析
	數(shù)據(jù)關(guān)聯(lián)是數(shù)據(jù)庫(kù)中存在的一類重要的可被發(fā)現(xiàn)的知識(shí)。若兩個(gè)或多個(gè)變量的取值之間存在某種規(guī)律性，就稱為關(guān) 聯(lián)。關(guān)聯(lián)可分為簡(jiǎn)單關(guān)聯(lián)、時(shí)序關(guān)聯(lián)、因果關(guān)聯(lián)。關(guān)聯(lián)分析的目的是找出數(shù)據(jù)庫(kù)中隱藏的關(guān)聯(lián)網(wǎng)。有時(shí)并不知道數(shù)據(jù)庫(kù)中數(shù)據(jù)的關(guān)聯(lián)函數(shù)，即使知道也是不確定的，因此關(guān)聯(lián)分析生成的規(guī)則帶有可信度。

-	聚類
	數(shù)據(jù)庫(kù)中的記錄可被化分為一系列有意義的子集，即聚類。聚類增強(qiáng)了人們對(duì)客觀現(xiàn)實(shí)的認(rèn)識(shí)，是概念描述和偏差分析的先決條件。聚類技術(shù)主要包括傳統(tǒng)的模式識(shí)別方法和數(shù)學(xué)分類學(xué)。80年代初，Mchalski提出了概念聚類技術(shù)牞其要點(diǎn)是，在劃分對(duì)象時(shí)不僅考慮對(duì)象之間的距離，還要求劃分出的類具有某種內(nèi)涵描述，從而避免了傳統(tǒng)技術(shù)的某些片面性。

-	概念描述
	概念描述就是對(duì)某類對(duì)象的內(nèi)涵進(jìn)行描述，并概括這類對(duì)象的有關(guān)特征。概念描述分為特征性描述和區(qū)別性描述，前者描述某類對(duì)象的共同特征，后者描述不同類對(duì)象之間的區(qū)別。生成一個(gè)類的特征性描述只涉及該類對(duì)象中所有對(duì)象的共性。生成區(qū)別性描述的方法很多，如決策樹方法、遺傳算法等。

-	偏差檢測(cè)
	數(shù)據(jù)庫(kù)中的數(shù)據(jù)常有一些異常記錄，從數(shù)據(jù)庫(kù)中檢測(cè)這些偏差很有意義。偏差包括很多潛在的知識(shí)，如分類中的反常實(shí)例、不滿足規(guī)則的特例、觀測(cè)結(jié)果與模型預(yù)測(cè)值的偏差、量值隨時(shí)間的變化等。偏差檢測(cè)的基本方法是，尋找觀測(cè)結(jié)果與參照值之間有意義的差別。

	數(shù)據(jù)挖掘常用技術(shù)
-	人工神經(jīng)網(wǎng)絡(luò)
	仿照生理神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的非線形預(yù)測(cè)模型，通過學(xué)習(xí)進(jìn)行模式識(shí)別。

-	決策樹
	代表著決策集的樹形結(jié)構(gòu)。

-	遺傳算法
	基于進(jìn)化理論，并采用遺傳結(jié)合、遺傳變異、以及自然選擇等設(shè)計(jì)方法的優(yōu)化技術(shù)。

-	近鄰算法
	將數(shù)據(jù)集合中每一個(gè)記錄進(jìn)行分類的方法。

-	規(guī)則推導(dǎo)
	從統(tǒng)計(jì)意義上對(duì)數(shù)據(jù)中的“如果-那么”規(guī)則進(jìn)行尋找和推導(dǎo)。采用上述技術(shù)的某些專門的分析工具已經(jīng)發(fā)展了大約十年的歷史，不過這些工具所面對(duì)的數(shù)據(jù)量通常較小。而現(xiàn)在這些技術(shù)已經(jīng)被直接集成到許多大型的工業(yè)標(biāo)準(zhǔn)的數(shù)據(jù)倉(cāng)庫(kù)和聯(lián)機(jī)分析系統(tǒng)中去了。
	摘自《數(shù)據(jù)挖掘討論組》

posted @ 2009-10-22 18:05 劉凱毅閱讀(2030) | 評(píng)論 (1) | 編輯收藏

強(qiáng)大的 R 語(yǔ)言

中文參考資料：http://www.biosino.org/R/R-doc/
報(bào)表圖： http://zoonek2.free.fr/UNIX/48_R/03.html

http://addictedtor.free.fr/graphiques/graphiques/graph_113.png

posted @ 2009-10-22 16:59 劉凱毅閱讀(359) | 評(píng)論 (0) | 編輯收藏

py django 引入 wiki 模塊

1. 安裝 python 2.6 （鄙視下-python.org/download被屏蔽）
2. 可愛的Python: 使用setuptools 孵化Python egg
3. easy_install django
4. http://github.com/sneeu/django-wiki download
5. django-admin.py startproject newtest
6. cd newtest
7. 拷貝 django-wiki 解壓包到 newtest/wiki
8. vim setting.py

DATABASE_ENGINE = 'sqlite3'
DATABASE_NAME = 'mysite.db'

INSTALLED_APPS = (
......
'newtest.wiki',
)

9. vim urls.py

urlpatterns = patterns('',
(r'^wiki/', include('wiki.urls')),
)

10. cd ..
11. manage.py syncdb
12. manage.py runserver

13. http://127.0.0.1:8000/wiki/

哦~ 不太好，改善改善！
自己的定制自己的wiki 呵呵

posted @ 2009-10-21 00:13 劉凱毅閱讀(1491) | 評(píng)論 (0) | 編輯收藏

數(shù)據(jù)庫(kù)測(cè)試生成腳本 - infobright

我這對(duì)測(cè)試 mysql infobright 壓縮和查詢速度進(jìn)行測(cè)試
測(cè)試結(jié)果我會(huì)盡快登出的

#!/usr/bin/python
import MySQLdb
#conn = MySQLdb.Connection('127.0.0.1', 'root', '', 'dmspi')
conn=MySQLdb.connect(host="127.0.0.1",port=3307,user="root",passwd="",db="test")
cur =  conn.cursor()
st = "create table testtime4 ( "

try :
        for cc in xrange(1000):
                if cc % 2 == 0 :
                        st += 'a'+str(cc)+' varchar(20),\n'
                else :
                        st += 'a'+str(cc)+' int(20),\n'

        st += 'a int(20)'
        st = st + ");"
        cur.execute(st)

        # import sys
        # sys.exit(1)
        import random
        ccs = lambda : random.choice(['apple', 'pear', 'peach', 'orange', 'lemon',''])
        ccn = lambda : random.randint(0,10000)

        fd = open('/data/logs/dataFormat/test/t4.data','w')
        for cc in xrange(10000000):
                st = ''
                ss = ccs()
                nn = str(ccn())
                for cc in xrange(1000):
                        if cc < 15 :
                                if cc % 2 == 0 :
                                        st += ss+'\t'
                                else :
                                        st += nn+'\t'
                        else :
                                st += '\t'
                st += nn
                print >>fd,st
        fd.close()

        # cur.execute('load data infile \'/data/logs/dataFormat/test/t4.data\'  into table testtime4 fields terminated by "\t";')
finally :
        cur.close()
        conn.close()

mysql infobright 測(cè)試結(jié)果報(bào)告：

一千萬(wàn)條數(shù)據(jù)導(dǎo)入花費(fèi)時(shí)間:

mysql> load data infile '/data/logs/dataFormat/test/t4.data'  into table testtime4 fields terminated by "\t";
Query OK, 10000000 rows affected (36 min 47.00 sec)

測(cè)試一 :
1. 表屬性有 500 列
2. 屬性列都有值, 無(wú) Null 數(shù)據(jù)
3. 原始文件大小 26G ,導(dǎo)入數(shù)據(jù)倉(cāng)庫(kù) 5G

部分測(cè)試時(shí)間：
select count(*) from testtime where a0="pear" and a2="orange";
1 row in set (3.63 sec)

select a6,count(*) from testtime group by a6 order by a6 desc ;
5 rows in set (2.24 sec)

mysql> select count(*) from testtime where a0="apple" ;
1 row in set (5.68 sec)

測(cè)試二 :
1. 表屬性有 1000 列
2. 屬性列前 15 列有值 , 其余后面都為 Null
3. 原始文件大小 10G ,導(dǎo)入數(shù)據(jù)倉(cāng)庫(kù) 215M

mysql> select a0,count(*) from testtime4 group by a0 ;
+--------+----------+
| a0     | count(*) |
+--------+----------+
| lemon  |  1665543 |
| peach  |  1666276 |
| orange |  1667740 |
| pear   |  1665910 |
| apple  |  1665678 |
| NULL   |  1668863 |
+--------+----------+
6 rows in set (4.55 sec)

select * from testtime4 order by a6 desc limit 2000000,1 ;
1 row in set (3.30 sec)

posted @ 2009-10-20 13:44 劉凱毅閱讀(1611) | 評(píng)論 (0) | 編輯收藏

mysql federated 表類型使用

參考：http://blog.chinaunix.net/u/29134/showart_485759.html
可以說(shuō)本文關(guān)鍵就是他

mysql max 版本下載.
或者你這心情好編譯邊
參考： ./configure --prefix=/home/lky/tools/mysql2 --with-plugins=heap,innobase,myisam,ndbcluster,federated,blackhole --enable-assembler --enable-static
然后在 my.cnf 的
[mysqld]
federated #加入

mysql 參考：http://blog.chinaunix.net/u3/90603/showart_1925406.html

mysql> show engines;
+------------+----------+----------------------------------------------------------------+--------------+-----+------------+
| Engine     | Support  | Comment                                                        | Transactions | XA  | Savepoints |
+------------+----------+----------------------------------------------------------------+--------------+-----+------------+
| ndbcluster | DISABLED | Clustered, fault-tolerant, memory-based tables                 | YES          | NO  | NO         |
| FEDERATED  | YES      | Federated MySQL storage engine        | YES          | NO  | NO         |
| MRG_MYISAM | YES      | Collection of identical MyISAM tables                          | NO           | NO  | NO         |
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance         | NO           | NO  | NO         |
| BLACKHOLE  | YES      | /dev/null storage engine (anything you write to it disappears) | NO           | NO  | NO         |
| InnoDB     | YES      | Supports transactions, row-level locking, and foreign keys     | YES          | YES | YES        |
| MEMORY     | YES      | Hash based, stored in memory, useful for temporary tables      | NO           | NO  | NO         |
| ARCHIVE    | YES      | Archive storage engine                                         | NO           | NO  | NO         |
+------------+----------+----------------------------------------------------------------+--------------+-----+------------+

http://topic.csdn.net/u/20071122/11/016C3D25-82A2-46DC-B8B0-3A22F8573C70.html

測(cè)試:
0. mysql_install_db 生成測(cè)試 basedir
1. mysqld_safe 服務(wù)器開啟
2. mysql 測(cè)試

先郁悶句在max 版本上沒有 mysql_install_db ??！，自己想辦法把，下個(gè)其他版本的給 install database !

[client]
socket=/home/lky/data/d2/mysql.sock
port=3308

[mysqld]
port=3308
datadir=/home/lky/data/d2
socket=/home/lky/data/d2/mysql.sock

user=lky
# Default to using old password format for compatibility with mysql 3.x
# clients (those using the mysqlclient10 compatibility package).
#old_passwords=123

[mysqld_safe]
log-error=/home/lky/data/d2/mysqld.log
pid-file=/home/lky/data/d2/mysqld.pid

命令
啟動(dòng) 服務(wù)1 ＃注意 my.cnf 的端口和 datadir
cd /usr/local/mysql-max-5.1.5-alpha-linux-i686-glibc23
update user set host="%" where user='root' ;
./bin/mysqld_safe --defaults-file=/home/lky/data/d1/my.cnf

啟動(dòng) 服務(wù)2 ＃注意 my.cnf 的端口和 datadir
cd /usr/local/mysql-max-5.1.5-alpha-linux-i686-glibc23
./bin/mysqld_safe --defaults-file=/home/lky/data/d2/my.cnf

運(yùn)行1
./bin/mysql --defaults-file=/home/lky/data/d1/my.cnf
>create table t_tableC (id int not null auto_increment primary key, c_str char(20) not null)
>insert into t_tableC values(1,'cc');
運(yùn)行2
./bin/mysql --defaults-file=/home/lky/data/d2/my.cnf
>create table t_tableC (id int not null auto_increment primary key, c_str char(20) not null)
engine federated
connection = 'mysql://lky@127.0.0.1:3307/test/t_tableC';
> select * from t_tableC ;
+----+-------+
| id | c_str |
+----+-------+
| 1 | cc |
+----+-------+

最讓我喜歡的一個(gè)特性：
在 d2 上（運(yùn)行2）上本地的表可以和 federated 進(jìn)行表連

mysql> select * from t2 ;
+------+------+
| id   | vn   |
+------+------+
|    1 | cc   |
+------+------+
1 row in set (0.29 sec)

mysql> show create table t2 ;
+-------+----------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                         |
+-------+----------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `id` int(11) default NULL,
  `vn` char(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select * from t_tableC c,t2 t where c.id=c.id ;
+----+--------+------+------+
| id | c_str  | id   | vn   |
+----+--------+------+------+
|  1 | cc     |    1 | cc   |
|  2 | ccttcc |    1 | cc   |
+----+--------+------+------+
2 rows in set (0.00 sec)

posted @ 2009-10-08 12:59 劉凱毅閱讀(2066) | 評(píng)論 (0) | 編輯收藏

hadoop streaming( hadoop + perl )小試

參考:
http://hadoop.apache.org/common/docs/r0.15.2/streaming.html

注意
目前 streaming 對(duì) linux pipe #也就是 cat |wc -l 這樣的管道不支持，但不妨礙我們使用perl,python 行式命令??！
原話是：
Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?
    Currently this does not work and gives an "java.io.IOException: Broken pipe" error.
    This is probably a bug that needs to be investigated.
但如果你是強(qiáng)烈的 linux shell pipe 發(fā)燒友！參考下面
$> perl -e 'open( my $fh, "grep -v null tt |sed -n 1,5p |");while ( <$fh> ) {print;} '
     #不過我沒測(cè)試通過 ??！

環(huán)境：hadoop-0.18.3
$> find . -type f -name "*streaming*.jar"
./contrib/streaming/hadoop-0.18.3-streaming.jar

測(cè)試數(shù)據(jù)：

-bash-3.00$ head tt
null    false    3702    208100
6005100    false    70    13220
6005127    false    24    4640
6005160    false    25    4820
6005161    false    20    3620
6005164    false    14    1280
6005165    false    37    7080
6005168    false    104    20140
6005169    false    35    6680
6005240    false    169    32140
......

運(yùn)行：

c1=" perl -ne  'if(/.*\t(.*)/){\$sum+=\$1;}END{print \"\$sum\";}' "
# 注意這里 $ 要寫成 \$    " 寫成 \"
echo $c1; # 打印輸出 perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}'
hadoop jar hadoop-0.18.3-streaming.jar
   -input file:///data/hadoop/lky/jar/tt
   -mapper   "/bin/cat"
   -reducer "$c1"
   -output file:///tmp/lky/streamingx8

結(jié)果:
cat /tmp/lky/streamingx8/*
1166480

本地運(yùn)行輸出:
perl -ne 'if(/.*"t(.*)/){$sum+=$1;}END{print $sum;}' < tt
1166480

結(jié)果正確!!!!

命令自帶文檔：

-bash-3.00$ hadoop jar hadoop-0.18.3-streaming.jar -info
09/09/25 14:50:12 ERROR streaming.StreamJob: Missing required option -input
Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \
          $HADOOP_HOME/hadoop-streaming.jar [options]
Options:
  -input    <path>     DFS input file(s) for the Map step
  -output   <path>     DFS output directory for the Reduce step
  -mapper   <cmd|JavaClassName>      The streaming command to run
  -combiner <JavaClassName> Combiner has to be a Java class
  -reducer  <cmd|JavaClassName>      The streaming command to run
  -file     <file>     File/dir to be shipped in the Job jar file
  -dfs    <h:p>|local  Optional. Override DFS configuration
  -jt     <h:p>|local  Optional. Override JobTracker configuration
  -additionalconfspec specfile  Optional.
  -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional.
  -outputformat TextOutputFormat(default)|JavaClassName  Optional.
  -partitioner JavaClassName  Optional.
  -numReduceTasks <num>  Optional.
  -inputreader <spec>  Optional.
  -jobconf  <n>=<v>    Optional. Add or override a JobConf property
  -cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
  -mapdebug <path>  Optional. To run this script when a map task fails
  -reducedebug <path>  Optional. To run this script when a reduce task fails
  -cacheFile fileNameURI
  -cacheArchive fileNameURI
  -verbose

posted @ 2009-09-25 14:33 劉凱毅閱讀(3377) | 評(píng)論 (0) | 編輯收藏

ftp 上傳(py)

什么都不說(shuō)，在代碼里

#!/usr/bin/env python
#-*- encoding: utf8 -*-
from ftplib import FTP
import sys,os,getopt

opts,args=getopt.getopt(sys.argv[1:],'hf:d:i:u:p:')

def usage():
    print '''
Help Information:
  上傳正常結(jié)束后,會(huì)在上次文件邊創(chuàng)建成功狀態(tài)文件夾，名稱為 [上次文件名.state ]
    -h : Show help information
    -f : local upload file   eg -> /home/user/xx/file.tar
    -d : upload to ftp path  eg -> /x/xx/xxx
    -i : [optional] Default 122.102.xx.xx
    -u : [optional] Default xx
    -p : [optional] Default *** (xx passwd)
    '''

fip='122.xx.xx.xx'
fur='xx'
fpw='123'
for o,a in opts:
    if o=='-h':
        usage()
        sys.exit()
    if o=='-f' : upload_file=a
    if o=='-d' : ftp_path=a
    if o=='-i' : fip=a
    if o=='-u' : fur=a
    if o=='-p' : fpw=a

ftp = FTP(fip)
ftp.login(fur,fpw)

if not ( locals().has_key('ftp_path') and locals().has_key('upload_file') ):
    usage()
    sys.exit()

# 迭代創(chuàng)建目錄
to_path='/'
for sp in ftp_path.split('/')[1:]:
    drs = ftp.nlst(to_path)
    if to_path=='/':to_path+=sp
    else : to_path+='/'+sp
    if not to_path in drs :
        ftp.mkd(to_path)

#到最終目錄下
ftp.cwd(to_path)

# 上傳準(zhǔn)備
bufsize = 1024
file_handler = open(upload_file,'rb')
file_name=os.path.split(upload_file)[1]

# 判定是否有上傳完狀態(tài)文件夾，如果有刪除
sfile=to_path+'/'+file_name+'.state'
if sfile in ftp.nlst(to_path):
    print '[Resend] delete original dir state '+sfile
    ftp.rmd( sfile )

# 上傳文件
ftp.storbinary('STOR %s' % (file_name),file_handler,bufsize)

# 如果上傳文件大小不一，不標(biāo)注成功上傳狀態(tài)文件夾
if not os.path.getsize(upload_file) == ftp.size(to_path+'/'+file_name) :
    print '[Error]  upload to ftp size Different ! '
    sys.exit()

# 上傳成功創(chuàng)建標(biāo)示狀態(tài) 文件夾
ftp.mkd(sfile)
file_handler.close()
ftp.quit()

posted @ 2009-09-09 10:01 劉凱毅閱讀(1569) | 評(píng)論 (1) | 編輯收藏

僅列出標(biāo)題

Skynet

常用鏈接

留言簿(13)

我參與的團(tuán)隊(duì)

隨筆分類

隨筆檔案

相冊(cè)

搜索

最新評(píng)論

閱讀排行榜

評(píng)論排行榜