??xml version="1.0" encoding="utf-8" standalone="yes"?>99精品久久只有精品,青青草国产精品97视觉盛宴 ,黄页网站在线观看免费http://www.aygfsteel.com/paulwong/archive/2016/04/19/430154.htmlpaulwongpaulwongTue, 19 Apr 2016 09:54:00 GMThttp://www.aygfsteel.com/paulwong/archive/2016/04/19/430154.htmlhttp://www.aygfsteel.com/paulwong/comments/430154.htmlhttp://www.aygfsteel.com/paulwong/archive/2016/04/19/430154.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/430154.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/430154.htmlhttp://colobu.com/categories/%E6%9E%B6%E6%9E%84/page/2/

paulwong 2016-04-19 17:54 发表评论
]]>
SPRING CACHE资源http://www.aygfsteel.com/paulwong/archive/2015/02/25/423032.htmlpaulwongpaulwongWed, 25 Feb 2015 08:04:00 GMThttp://www.aygfsteel.com/paulwong/archive/2015/02/25/423032.htmlhttp://www.aygfsteel.com/paulwong/comments/423032.htmlhttp://www.aygfsteel.com/paulwong/archive/2015/02/25/423032.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/423032.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/423032.html
http://docs.spring.io/spring/docs/3.2.x/spring-framework-reference/htmlsingle/#cache

SPRING CONCURRENTMAP MANAGER加过期策?br />http://stackoverflow.com/questions/8181768/can-i-set-a-ttl-for-cacheable

l合KEY
http://stackoverflow.com/questions/14072380/cacheable-key-on-multiple-method-arguments

Spring Cache抽象详解
http://www.open-open.com/lib/view/open1389575623336.html

注释驱动?Spring cache ~存介绍
https://www.ibm.com/developerworks/cn/opensource/os-cn-spring-cache/

Spring Cache抽象详解



paulwong 2015-02-25 16:04 发表评论
]]>
使用WILDFLY中的分布式缓存INFISHPANhttp://www.aygfsteel.com/paulwong/archive/2015/02/23/422998.htmlpaulwongpaulwongMon, 23 Feb 2015 05:40:00 GMThttp://www.aygfsteel.com/paulwong/archive/2015/02/23/422998.htmlhttp://www.aygfsteel.com/paulwong/comments/422998.htmlhttp://www.aygfsteel.com/paulwong/archive/2015/02/23/422998.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/422998.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/422998.html
  • 通过http://127.0.0.1:9991/console/App.html#infinispandCACHE
    <cache-container name="tickets" default-cache="default" jndi-name="java:jboss/infinispan/tickets">
           <local-cache name="default" batching="true">
                  <locking isolation="REPEATABLE_READ"/>
           </local-cache>
    </cache-container>

  • pom.xmld依赖?br />
            <dependency>
                <groupId>org.infinispan</groupId>
                <artifactId>infinispan-core</artifactId>
                <scope>provided</scope>
            </dependency>
            
            <dependency>
                <groupId>org.infinispan</groupId>
                <artifactId>infinispan-client-hotrod</artifactId>
                <scope>provided</scope>
            </dependency>

        <dependency>
            <groupId>org.jgroups</groupId>
            <artifactId>jgroups</artifactId>
            <scope>provided</scope>
        </dependency>

            <dependency>
                <groupId>org.infinispan</groupId>
                <artifactId>infinispan-spring</artifactId>
                <version>6.0.2.Final</version>
            </dependency>
            
            <dependency>
                <groupId>org.infinispan</groupId>
                <artifactId>infinispan-jcache</artifactId>
                <version>6.0.2.Final</version>
            </dependency>

  • d拦截器,WEB-INF/beans.xml
    <?xml version="1.0"?>
    <beans xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation
    ="http://java.sun.com/xml/ns/javaee http://jboss.org/schema/cdi/beans_1_0.xsd">
        <interceptors>
            <class>org.infinispan.jcache.annotation.CacheResultInterceptor</class>
            <class>org.infinispan.jcache.annotation.CachePutInterceptor</class>
            <class>org.infinispan.jcache.annotation.CacheRemoveEntryInterceptor</class>
            <class>org.infinispan.jcache.annotation.CacheRemoveAllInterceptor</class>
        </interceptors>
    </beans>

  • d目的全局依赖QWEB-INF/jboss-deployment-structure.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <jboss-deployment-structure>
        <deployment>
            <dependencies>
                <module name="org.jboss.xnio" />
                <module name="org.infinispan" export="true"/>
                <module name="org.infinispan.commons" export="true"/>
                <module name="org.infinispan.client.hotrod" export="true"/>
            </dependencies>
        </deployment>
    </jboss-deployment-structure>

  • 在CDI BEAN中用CACHE
    package com.paul.myejb;

    import javax.annotation.Resource;
    import javax.cache.annotation.CacheResult;
    import javax.ejb.Remote;
    import javax.ejb.Stateless;
    import javax.interceptor.Interceptors;

    import org.infinispan.Cache;
    import org.infinispan.manager.EmbeddedCacheManager;
    //import org.springframework.cache.annotation.Cacheable;
    import org.springframework.ejb.interceptor.SpringBeanAutowiringInterceptor;

    /**
     * Session Bean implementation class HelloWorldBean
     
    */
    @Stateless
    //@Local(HelloWorld.class)
    @Remote(HelloWorld.class)
    @Interceptors(SpringBeanAutowiringInterceptor.class)
    //@RolesAllowed({Roles.ADMIN})
    public class HelloWorldBean implements HelloWorld {
        
        @Resource(lookup = "java:jboss/infinispan/tickets")
        private EmbeddedCacheManager container;
        
        
        /**
         * Default constructor. 
         
    */
        public HelloWorldBean() {
        }

    //    @Transactional
    //    @Cacheable(value = "books", key = "#name")
        @CacheResult
        public String sayHello(String name) {
            System.out.println("NO CACHE");
            String result = "Hello " + name + ", I am HelloWorldBean.";
            Cache<String, String> cache = this.container.getCache();
            cache.put(name, result);
            return result;
        }

    }


  • 修改modules/system/layers/base/org/infinispan/client/hotrod/main/modules.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!--
      ~ JBoss, Home of Professional Open Source.
      ~ Copyright 2010, Red Hat, Inc., and individual contributors
      ~ as indicated by the @author tags. See the copyright.txt file in the
      ~ distribution for a full listing of individual contributors.
      ~
      ~ This is free software; you can redistribute it and/or modify it
      ~ under the terms of the GNU Lesser General Public License as
      ~ published by the Free Software Foundation; either version 2.1 of
      ~ the License, or (at your option) any later version.
      ~
      ~ This software is distributed in the hope that it will be useful,
      ~ but WITHOUT ANY WARRANTY; without even the implied warranty of
      ~ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
      ~ Lesser General Public License for more details.
      ~
      ~ You should have received a copy of the GNU Lesser General Public
      ~ License along with this software; if not, write to the Free
      ~ Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
      ~ 02110-1301 USA, or see the FSF site: http://www.fsf.org.
      
    -->
    <module xmlns="urn:jboss:module:1.3" name="org.infinispan.client.hotrod">
        <properties>
            <property name="jboss.api" value="private"/>
        </properties>

        <resources>
            <resource-root path="infinispan-client-hotrod-6.0.2.Final.jar"/>
        </resources>

        <dependencies>
            <module name="javax.api"/>
            <!--下面q一行注释掉-->
            <!--<module name="com.google.protobuf"/>-->
            <module name="org.apache.commons.pool"/>
            <module name="org.infinispan.commons"/>
            <module name="org.infinispan.query.dsl"/>
            <module name="org.jboss.logging"/>
        </dependencies>
    </module>

  • 以下是SPRING版本
    1. d依赖的SPRING BEAN
      <?xml version="1.0" encoding="UTF-8"?>
      <beans xmlns="http://www.springframework.org/schema/beans"
          xmlns:xsi
      ="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
          xmlns:cache
      ="http://www.springframework.org/schema/cache"
          xmlns:p
      ="http://www.springframework.org/schema/p"
          xmlns:jee
      ="http://www.springframework.org/schema/jee"
          xsi:schemaLocation
      ="http://www.springframework.org/schema/context
                http://www.springframework.org/schema/context/spring-context-3.0.xsd
                http://www.springframework.org/schema/beans
                http://www.springframework.org/schema/beans/spring-beans-3.0.xsd
                http://www.springframework.org/schema/cache
                http://www.springframework.org/schema/cache/spring-cache.xsd
                http://www.springframework.org/schema/jee 
                http://www.springframework.org/schema/jee/spring-jee.xsd"
      >

          <cache:annotation-driven />
          
          <bean id="cacheManager"
                class
      ="org.infinispan.spring.provider.ContainerCacheManagerFactoryBean">
                <constructor-arg ref="cacheContainer"  />
          </bean>
          
          <jee:jndi-lookup id="cacheContainer" jndi-name="java:jboss/infinispan/tickets" > 
          </jee:jndi-lookup>
          
          <!-- <bean id="cacheContainer"
                class="com.paul.myejb.common.util.cache.JndiSpringCacheManagerFactoryBean"
                p:infinispanJNDI="java:jboss/infinispan/tickets" /> 
      -->
          
      </beans>

    2. 使用CACHE
      package com.paul.myejb.spring;

      import org.springframework.beans.factory.annotation.Autowired;
      import org.springframework.cache.CacheManager;
      import org.springframework.cache.annotation.Cacheable;
      import org.springframework.stereotype.Component;

      @Component
      public class MySpringBean {
          
          @Autowired
          private CacheManager cacheManager;
          
          @Cacheable(value = "my-local-cache", key = "#name")
          public String sayHello(String name)
          {
              System.out.println("MySpringBean NO CACHE");
              String result = "Hi " + name + ", I am Spring!";
              org.springframework.cache.Cache springCache = this.cacheManager.getCache("my-local-cache");
              System.out.println(springCache.get(name) == null ? "null" : springCache.get(name).get());
              springCache.put(name, result);
              return result;
          }

      }




    paulwong 2015-02-23 13:40 发表评论
    ]]>
    SPRING-SESSIONhttp://www.aygfsteel.com/paulwong/archive/2014/11/19/420309.htmlpaulwongpaulwongWed, 19 Nov 2014 10:23:00 GMThttp://www.aygfsteel.com/paulwong/archive/2014/11/19/420309.htmlhttp://www.aygfsteel.com/paulwong/comments/420309.htmlhttp://www.aygfsteel.com/paulwong/archive/2014/11/19/420309.html#Feedback1http://www.aygfsteel.com/paulwong/comments/commentRss/420309.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/420309.html阅读全文

    paulwong 2014-11-19 18:23 发表评论
    ]]>
    分布式调度QUARTZ+SPRINGhttp://www.aygfsteel.com/paulwong/archive/2014/11/14/420104.htmlpaulwongpaulwongFri, 14 Nov 2014 10:46:00 GMThttp://www.aygfsteel.com/paulwong/archive/2014/11/14/420104.htmlhttp://www.aygfsteel.com/paulwong/comments/420104.htmlhttp://www.aygfsteel.com/paulwong/archive/2014/11/14/420104.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/420104.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/420104.html触发器:存放旉排程
    dQ蔟业务代码
    排程器:负责调度Q即在指定的旉执行对应的Q?br />
    如果是分布式QUARTZQ则各个节点会上报Q务,存到数据库中Q执行时会从数据库中取出触发器来执行Q如果触发器的名U和执行旉相同Q则只有一个节点去执行此Q务?br />如果此节Ҏ行失败,则此d则会被分zֈ另一节点执行?br />
    quartz.properties
    #============================================================================
    #
     Configure JobStore  
    #
     Using Spring datasource in quartzJobsConfig.xml
    #
     Spring uses LocalDataSourceJobStore extension of JobStoreCMT
    #
    ============================================================================
    org.quartz.jobStore.useProperties=true
    org.quartz.jobStore.tablePrefix = QRTZ_
    org.quartz.jobStore.isClustered = true
    org.quartz.jobStore.clusterCheckinInterval = 5000
    org.quartz.jobStore.misfireThreshold = 60000
    org.quartz.jobStore.txIsolationLevelReadCommitted = true
     
    # Change this to match your DB vendor
    org.quartz.jobStore.class = org.quartz.impl.jdbcjobstore.JobStoreTX
    org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
     

    #============================================================================
    #
     Configure Main Scheduler Properties  
    #
     Needed to manage cluster instances
    #
    ============================================================================
    org.quartz.scheduler.instanceId=AUTO
    org.quartz.scheduler.instanceName=MY_CLUSTERED_JOB_SCHEDULER
    org.quartz.scheduler.rmi.export = false
    org.quartz.scheduler.rmi.proxy = false


    #============================================================================
    #
     Configure ThreadPool  
    #
    ============================================================================
    org.quartz.threadPool.class = org.quartz.simpl.SimpleThreadPool
    org.quartz.threadPool.threadCount = 10
    org.quartz.threadPool.threadPriority = 5
    org.quartz.threadPool.threadsInheritContextClassLoaderOfInitializingThread = true


    web-schedule-applicationcontext.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <beans xmlns="http://www.springframework.org/schema/beans"
        xmlns:xsi
    ="http://www.w3.org/2001/XMLSchema-instance" xmlns:context="http://www.springframework.org/schema/context"
        xmlns:mongo
    ="http://www.springframework.org/schema/data/mongo"
        xsi:schemaLocation
    ="http://www.springframework.org/schema/context
              http://www.springframework.org/schema/context/spring-context-3.0.xsd
              http://www.springframework.org/schema/data/mongo
              http://www.springframework.org/schema/data/mongo/spring-mongo-1.3.xsd
              http://www.springframework.org/schema/beans
              http://www.springframework.org/schema/beans/spring-beans-3.0.xsd"
    >


        <!-- 增加定时器配|?nbsp;-->
        <!-- U程执行器配|,用于d注册 -->
        <bean id="executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
             <property name="corePoolSize" value="10" />
             <property name="maxPoolSize" value="100" />
             <property name="queueCapacity" value="500" />
        </bean>

        <!-- 讄调度 -->
        <bean id="webScheduler"
            class
    ="org.springframework.scheduling.quartz.SchedulerFactoryBean">

            <property name="configLocation" value="classpath:/properties/config/quartz.properties" />
            <property name="dataSource" ref="dataSourceCMS" />
            <property name="transactionManager" ref="txManager" />

            <!-- This name is persisted as SCHED_NAME in db. for local testing could 
                change to unique name to avoid collision with dev server 
    -->
            <property name="schedulerName" value="quartzScheduler" />

            <!-- Will update database cron triggers to what is in this jobs file on 
                each deploy. Replaces all previous trigger and job data that was in the database. 
                YMMV 
    -->
            <property name="overwriteExistingJobs" value="true" />

            <property name="startupDelay" value="5"/>
            <property name="applicationContextSchedulerContextKey" value="applicationContext" />
            <property name="jobFactory">
                <bean class="com.tcl.project7.boss.common.scheduling.AutowiringSpringBeanJobFactory" />
            </property>
            
            <property name="triggers">
                  <list>
                           <ref bean="springQuertzClusterTaskSchedulerTesterTigger" />
                  </list>
             </property>
            <property name="jobDetails">
                <list>
                    <ref bean="springQuertzClusterTaskSchedulerTesterJobDetail" />
                </list>
            </property>
             <property name="taskExecutor" ref="executor" />

        </bean>


        
        
        
        <!-- 触发?nbsp;-->
        <bean id="springQuertzClusterTaskSchedulerTesterTigger" class="common.scheduling.PersistableCronTriggerFactoryBean">
            <property name="jobDetail" ref="springQuertzClusterTaskSchedulerTesterJobDetail"/>
            <property name="cronExpression" value="* * * * * ?" />    
        </bean>
        
        <bean id="springQuertzClusterTaskSchedulerTesterJobDetail" class="org.springframework.scheduling.quartz.JobDetailBean">
            <property name="jobClass" value="common.scheduling.SpringQuertzClusterTaskSchedulerTester" />
            
            <!-- fail-over 重写执行p|的Q?default=false -->
            <property name="requestsRecovery" value="false"/>
        </bean>
        
        
        
    </beans>


    JOB文gQSpringQuertzClusterTaskSchedulerTester.java
    package common.scheduling;

    import java.util.Date;

    import org.quartz.JobExecutionContext;
    import org.quartz.JobExecutionException;
    import org.slf4j.Logger;
    import org.slf4j.LoggerFactory;
    import org.springframework.beans.factory.annotation.Autowired;
    import org.springframework.scheduling.quartz.QuartzJobBean;

    import com.tcl.project7.boss.common.util.UrlUtil;
    import com.tcl.project7.boss.common.util.time.TimeUtils;
    /**
     * <p>Title:SpringQuertzClusterTaskSchedulerTester</p>
     * <p>Description:
     * 应ؓ要持久化{特性操?需要?nbsp;QuartzJobBean
     * <br>׃要被持久?所以不能存放xxxxManagercM对象,
     * 只能从每ơ从QuartzJobBean注入的ApplicationContext 中去取出
     *
     * </p>    
     *
     *
     
    */
    public class SpringQuertzClusterTaskSchedulerTester extends QuartzJobBean {
        
        private static Logger logger = LoggerFactory.getLogger(SpringQuertzClusterTaskSchedulerTester.class);
        
        @Autowired
        private UrlUtil urlUtil;
        
        
        protected void executeInternal(JobExecutionContext arg0)
                throws JobExecutionException {
            logger.info("------" + TimeUtils.formatTime(new Date()) + "------" + urlUtil.getNginxHost());
            System.out.println("------" + TimeUtils.formatTime(new Date()) + "------" + urlUtil.getNginxHost());
        }
        
    }


    如果JOB中有需要调用SPRING的BEANQ则需要此文gAutowiringSpringBeanJobFactory.java
    package common.scheduling;

    import org.quartz.spi.TriggerFiredBundle;
    import org.springframework.beans.factory.config.AutowireCapableBeanFactory;
    import org.springframework.context.ApplicationContext;
    import org.springframework.context.ApplicationContextAware;
    import org.springframework.scheduling.quartz.SpringBeanJobFactory;

    /**
     * Autowire Quartz Jobs with Spring context dependencies
     * 
    @see http://stackoverflow.com/questions/6990767/inject-bean-reference-into-a-quartz-job-in-spring/15211030#15211030
     
    */
    public final class AutowiringSpringBeanJobFactory extends SpringBeanJobFactory implements ApplicationContextAware {
        
        private transient AutowireCapableBeanFactory beanFactory;
     
        public void setApplicationContext(final ApplicationContext context) {
            beanFactory = context.getAutowireCapableBeanFactory();
        }
     
        @Override
        protected Object createJobInstance(final TriggerFiredBundle bundle) throws Exception {
            final Object job = super.createJobInstance(bundle);
            beanFactory.autowireBean(job);
            return job;
        }
    }


    ׃JOB需要存储到数据库中Q会产生PROPERTY的问题,需剔除JOB-DATAQ需此文件PersistableCronTriggerFactoryBean.java
    package common.scheduling;

    import org.springframework.scheduling.quartz.CronTriggerFactoryBean;
    import org.springframework.scheduling.quartz.JobDetailAwareTrigger;

    /**
     * Needed to set Quartz useProperties=true when using Spring classes,
     * because Spring sets an object reference on JobDataMap that is not a String
     * 
     * 
    @see http://site.trimplement.com/using-spring-and-quartz-with-jobstore-properties/
     * 
    @see http://forum.springsource.org/showthread.php?130984-Quartz-error-IOException
     
    */
    public class PersistableCronTriggerFactoryBean extends CronTriggerFactoryBean {
        @Override
        public void afterPropertiesSet() {
            super.afterPropertiesSet();
     
            //Remove the JobDetail element
            getJobDataMap().remove(JobDetailAwareTrigger.JOB_DETAIL_KEY);
        }
    }


    语句QMYSQLQquartzTables.sql
    #
    # Quartz seems to work best with the driver mm.mysql-2.0.7-bin.jar
    #
    In your Quartz properties file, you'll need to set 
    # org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.StdJDBCDelegate
    #

    DROP TABLE IF EXISTS QRTZ_JOB_LISTENERS;
    DROP TABLE IF EXISTS QRTZ_TRIGGER_LISTENERS;
    DROP TABLE IF EXISTS QRTZ_FIRED_TRIGGERS;
    DROP TABLE IF EXISTS QRTZ_PAUSED_TRIGGER_GRPS;
    DROP TABLE IF EXISTS QRTZ_SCHEDULER_STATE;
    DROP TABLE IF EXISTS QRTZ_LOCKS;
    DROP TABLE IF EXISTS QRTZ_SIMPLE_TRIGGERS;
    DROP TABLE IF EXISTS QRTZ_CRON_TRIGGERS;
    DROP TABLE IF EXISTS QRTZ_BLOB_TRIGGERS;
    DROP TABLE IF EXISTS QRTZ_TRIGGERS;
    DROP TABLE IF EXISTS QRTZ_JOB_DETAILS;
    DROP TABLE IF EXISTS QRTZ_CALENDARS;


    CREATE TABLE QRTZ_JOB_DETAILS
      (
        JOB_NAME  VARCHAR(200) NOT NULL,
        JOB_GROUP VARCHAR(200) NOT NULL,
        DESCRIPTION VARCHAR(250) NULL,
        JOB_CLASS_NAME   VARCHAR(250) NOT NULL,
        IS_DURABLE VARCHAR(1) NOT NULL,
        IS_VOLATILE VARCHAR(1) NOT NULL,
        IS_STATEFUL VARCHAR(1) NOT NULL,
        REQUESTS_RECOVERY VARCHAR(1) NOT NULL,
        JOB_DATA BLOB NULL,
        PRIMARY KEY (JOB_NAME,JOB_GROUP)
    );

    CREATE TABLE QRTZ_JOB_LISTENERS
      (
        JOB_NAME  VARCHAR(200) NOT NULL,
        JOB_GROUP VARCHAR(200) NOT NULL,
        JOB_LISTENER VARCHAR(200) NOT NULL,
        PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER),
        FOREIGN KEY (JOB_NAME,JOB_GROUP)
            REFERENCES QRTZ_JOB_DETAILS(JOB_NAME,JOB_GROUP)
    );

    CREATE TABLE QRTZ_TRIGGERS
      (
        TRIGGER_NAME VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        JOB_NAME  VARCHAR(200) NOT NULL,
        JOB_GROUP VARCHAR(200) NOT NULL,
        IS_VOLATILE VARCHAR(1) NOT NULL,
        DESCRIPTION VARCHAR(250) NULL,
        NEXT_FIRE_TIME BIGINT(13) NULL,
        PREV_FIRE_TIME BIGINT(13) NULL,
        PRIORITY INTEGER NULL,
        TRIGGER_STATE VARCHAR(16) NOT NULL,
        TRIGGER_TYPE VARCHAR(8) NOT NULL,
        START_TIME BIGINT(13) NOT NULL,
        END_TIME BIGINT(13) NULL,
        CALENDAR_NAME VARCHAR(200) NULL,
        MISFIRE_INSTR SMALLINT(2) NULL,
        JOB_DATA BLOB NULL,
        PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
        FOREIGN KEY (JOB_NAME,JOB_GROUP)
            REFERENCES QRTZ_JOB_DETAILS(JOB_NAME,JOB_GROUP)
    );

    CREATE TABLE QRTZ_SIMPLE_TRIGGERS
      (
        TRIGGER_NAME VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        REPEAT_COUNT BIGINT(7) NOT NULL,
        REPEAT_INTERVAL BIGINT(12) NOT NULL,
        TIMES_TRIGGERED BIGINT(10) NOT NULL,
        PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
        FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
            REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
    );

    CREATE TABLE QRTZ_CRON_TRIGGERS
      (
        TRIGGER_NAME VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        CRON_EXPRESSION VARCHAR(200) NOT NULL,
        TIME_ZONE_ID VARCHAR(80),
        PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
        FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
            REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
    );

    CREATE TABLE QRTZ_BLOB_TRIGGERS
      (
        TRIGGER_NAME VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        BLOB_DATA BLOB NULL,
        PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
        FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
            REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
    );

    CREATE TABLE QRTZ_TRIGGER_LISTENERS
      (
        TRIGGER_NAME  VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        TRIGGER_LISTENER VARCHAR(200) NOT NULL,
        PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER),
        FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
            REFERENCES QRTZ_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
    );


    CREATE TABLE QRTZ_CALENDARS
      (
        CALENDAR_NAME  VARCHAR(200) NOT NULL,
        CALENDAR BLOB NOT NULL,
        PRIMARY KEY (CALENDAR_NAME)
    );



    CREATE TABLE QRTZ_PAUSED_TRIGGER_GRPS
      (
        TRIGGER_GROUP  VARCHAR(200) NOT NULL, 
        PRIMARY KEY (TRIGGER_GROUP)
    );

    CREATE TABLE QRTZ_FIRED_TRIGGERS
      (
        ENTRY_ID VARCHAR(95) NOT NULL,
        TRIGGER_NAME VARCHAR(200) NOT NULL,
        TRIGGER_GROUP VARCHAR(200) NOT NULL,
        IS_VOLATILE VARCHAR(1) NOT NULL,
        INSTANCE_NAME VARCHAR(200) NOT NULL,
        FIRED_TIME BIGINT(13) NOT NULL,
        PRIORITY INTEGER NOT NULL,
        STATE VARCHAR(16) NOT NULL,
        JOB_NAME VARCHAR(200) NULL,
        JOB_GROUP VARCHAR(200) NULL,
        IS_STATEFUL VARCHAR(1) NULL,
        REQUESTS_RECOVERY VARCHAR(1) NULL,
        PRIMARY KEY (ENTRY_ID)
    );

    CREATE TABLE QRTZ_SCHEDULER_STATE
      (
        INSTANCE_NAME VARCHAR(200) NOT NULL,
        LAST_CHECKIN_TIME BIGINT(13) NOT NULL,
        CHECKIN_INTERVAL BIGINT(13) NOT NULL,
        PRIMARY KEY (INSTANCE_NAME)
    );

    CREATE TABLE QRTZ_LOCKS
      (
        LOCK_NAME  VARCHAR(40) NOT NULL, 
        PRIMARY KEY (LOCK_NAME)
    );


    INSERT INTO QRTZ_LOCKS values(
    'TRIGGER_ACCESS');
    INSERT INTO QRTZ_LOCKS values(
    'JOB_ACCESS');
    INSERT INTO QRTZ_LOCKS values(
    'CALENDAR_ACCESS');
    INSERT INTO QRTZ_LOCKS values(
    'STATE_ACCESS');
    INSERT INTO QRTZ_LOCKS values(
    'MISFIRE_ACCESS');


    commit;


    参考:
    http://wenku.baidu.com/view/82e3bcbdfd0a79563c1e7223.html

    Quartz集成springMVC 的方案二Q持久化d、集和分布式)
    http://blog.csdn.net/congcong68/article/details/39256307

















    paulwong 2014-11-14 18:46 发表评论
    ]]>
    樂視 TV 載入 4K 片點解咁快?CDN E絡解構http://www.aygfsteel.com/paulwong/archive/2014/11/07/419670.htmlpaulwongpaulwongFri, 07 Nov 2014 09:03:00 GMThttp://www.aygfsteel.com/paulwong/archive/2014/11/07/419670.htmlhttp://www.aygfsteel.com/paulwong/comments/419670.htmlhttp://www.aygfsteel.com/paulwong/archive/2014/11/07/419670.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/419670.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/419670.html
    而它提供的內容中Q最吸引的肯定是 4K q及劇集。相信大安知道Q?K 內容檔案本n定w十分大,還要透過E絡進行串流Q一般情況也會「窒下窒下」,但為何在 X50 Air 上會如此順暢Q以下小R就為大家解構一下:

    letvcdn

     

    好了Q謎底揭曉!

    其實很多時候欣賞串內容(streamingQ時要等Q是因為 cache 時間十分P致影響載入時候?br />
    ?Letv 採用了 CDNQContent Delivery / Distribution NetworkQ內容傳遞網路)E\Q它的總承載量比單一骨嘋最大的d還要大,而且有異地備_萬一某個伺服器出現故障Q系i就會自動調用其他鄰q地區的伺服器資源Q所以可靠度極之接近 100%Q?br />
    q沒有故障時,樂視香港?CDN E絡亦可有效回避J忙擠塞的網i,並自動尋找距離用家最接近的快取伺服器接收內容Q因此可以改善內容存取速度Q大大縮短下載時間,自然可以用串網i,順暢ƣ賞極致 4K q內容啦?img src ="http://www.aygfsteel.com/paulwong/aggbug/419670.html" width = "1" height = "1" />

    paulwong 2014-11-07 17:03 发表评论
    ]]>
    Javaq行处理框架 JPPFhttp://www.aygfsteel.com/paulwong/archive/2014/07/19/415998.htmlpaulwongpaulwongSat, 19 Jul 2014 01:55:00 GMThttp://www.aygfsteel.com/paulwong/archive/2014/07/19/415998.htmlhttp://www.aygfsteel.com/paulwong/comments/415998.htmlhttp://www.aygfsteel.com/paulwong/archive/2014/07/19/415998.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/415998.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/415998.html
    http://www.jppf.org/doc/v4/index.php?title=Main_Page

    paulwong 2014-07-19 09:55 发表评论
    ]]>
    腾讯CKV量分布式存储系l?/title><link>http://www.aygfsteel.com/paulwong/archive/2014/07/16/415866.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Tue, 15 Jul 2014 23:58:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2014/07/16/415866.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/415866.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2014/07/16/415866.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/415866.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/415866.html</trackback:ping><description><![CDATA[<img src="http://ww4.sinaimg.cn/bmiddle/a1ab8e59jw1eeeg66h72fj20c83ek7ra.jpg" width="440" height="4412" alt="" /><img src ="http://www.aygfsteel.com/paulwong/aggbug/415866.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2014-07-16 07:58 <a href="http://www.aygfsteel.com/paulwong/archive/2014/07/16/415866.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>【{载】经典O画讲解HDFS原理 http://www.aygfsteel.com/paulwong/archive/2013/10/26/405663.htmlpaulwongpaulwongSat, 26 Oct 2013 01:15:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/10/26/405663.htmlhttp://www.aygfsteel.com/paulwong/comments/405663.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/10/26/405663.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/405663.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/405663.html分布式文件系l比较出名的有HDFS  ?GFSQ其中HDFS比较单一炏V本文是一描q非常简z易懂的漫画形式讲解HDFS的原理。比一般PPT要通俗易懂很多。不隑־的学习资料?br />

    1、三个部? 客户端、nameserverQ可理解Z控和文g索引,cMlinux的inodeQ、datanodeQ存攑֮际数据)

    在这里,client的Ş式我所了解的有两种Q通过hadoop提供的api所~写的程序可以和hdfsq行交互Q另外一U就是安装了hadoop的datanode其也可以通过命o行与hdfspȝq行交互Q如在datanode上上传则使用如下命o行:bin/hadoop fs -put example1 user/chunk/


    2、如何写数据q程





    3、读取数据过E?/span>



    4、容错:W一部分Q故障类型及其检方法(nodeserver 故障Q和|络故障Q和脏数据问题)




    5、容错第二部分:d定w



    6、容错第三部分:dataNode 失效



    7、备份规?/span>



    8、结束语


    paulwong 2013-10-26 09:15 发表评论
    ]]>
    一些数据切分、缓存、rpc框架、nosqlҎ资料http://www.aygfsteel.com/paulwong/archive/2013/10/14/404954.htmlpaulwongpaulwongMon, 14 Oct 2013 02:14:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/10/14/404954.htmlhttp://www.aygfsteel.com/paulwong/comments/404954.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/10/14/404954.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/404954.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/404954.html1、数据切?/span>

            1.1?a style="color: #108ac6;" target="_blank">mysql中间件研IӞAtlasQcobarQTDDLQ?/a> 

            1.2?a style="color: #108ac6;" target="_blank">利用 MySQL Proxy 实现数据切分及整?nbsp;

            1.3?a style="color: #108ac6;" target="_blank">ZMySQL分库分表Ҏ?nbsp;

            1.4?a style="color: #108ac6;" target="_blank">tddl和diamond  

    2、缓?/p>

            2.1?a style="color: #108ac6;" target="_blank">java客户端三U方式操?nbsp;

            2.2?a style="color: #108ac6;" target="_blank">myibatis配置memcached评测 

            2.3?a style="color: #108ac6;" target="_blank">Memcached + Spring Caching 

            2.4?a style="color: #108ac6;" target="_blank">memcachedb-持久化存储的~存pȝ 

            2.5?a style="color: #108ac6;" target="_blank">memcachedb让memcache的数据持久化 

            2.6?a style="color: #108ac6;" target="_blank">淘宝kv~存框架tair 

            2.7?a style="color: #108ac6;" target="_blank">ibatis之扩展缓存ibatis-tair-cache 

    3、rpc框架

            3.1?a style="color: #108ac6;">dubbo 

            3.2、hsf 未开?/p>

            3.3?a style="color: #108ac6;" target="_blank">服务框架HSF分析之一容器启动

    4、noSql

            4.1?a style="color: #108ac6;" target="_blank">学习NoSQL数据库的必读资料 



    paulwong 2013-10-14 10:14 发表评论
    ]]>
    分布式搜索资?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/08/31/403522.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sat, 31 Aug 2013 07:52:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/08/31/403522.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/403522.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/08/31/403522.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/403522.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/403522.html</trackback:ping><description><![CDATA[云端分布式搜索技?br /><a target="_blank">http://www.searchtech.pro</a><br /><br /><br />ELASTICSEARCH中文C֌<br /><a target="_blank">http://es-bbs.medcl.net/categories/%E6%9C%80%E6%96%B0%E5%8A%A8%E6%80%81</a><br /><br /><br /><a target="_blank">http://wangwei3.iteye.com/blog/1818599</a><br /><br /><br />Welcome to the Apache Nutch Wiki<br /><a target="_blank">https://wiki.apache.org/nutch/FrontPage</a><br /><br /><br />elasticsearch客户端大?br /><a target="_blank">http://www.searchtech.pro/elasticsearch-clients</a><br /><br /><br />客户?br /><a target="_blank">http://es-cn.medcl.net/guide/concepts/scaling-lucene/</a><br /><a target="_blank">https://github.com/aglover/elasticsearch_article/blob/master/src/main/java/com/b50/usat/load/MusicReviewSearch.java</a><br /><br /><br /> <img src ="http://www.aygfsteel.com/paulwong/aggbug/403522.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-08-31 15:52 <a href="http://www.aygfsteel.com/paulwong/archive/2013/08/31/403522.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Install hadoop+hbase+nutch+elasticsearchhttp://www.aygfsteel.com/paulwong/archive/2013/08/31/403513.htmlpaulwongpaulwongFri, 30 Aug 2013 17:17:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/08/31/403513.htmlhttp://www.aygfsteel.com/paulwong/comments/403513.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/08/31/403513.html#Feedback3http://www.aygfsteel.com/paulwong/comments/commentRss/403513.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/403513.htmlH...  阅读全文

    paulwong 2013-08-31 01:17 发表评论
    ]]>
    Implementation for CombineFileInputFormat Hadoop 0.20.205http://www.aygfsteel.com/paulwong/archive/2013/08/29/403442.htmlpaulwongpaulwongThu, 29 Aug 2013 08:08:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/08/29/403442.htmlhttp://www.aygfsteel.com/paulwong/comments/403442.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/08/29/403442.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/403442.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/403442.html
    具体的原理是下述三步:

    1.Ҏ输入目录下的每个文g,如果光度超qmapred.max.split.size,以block为单位分成多个split(一个split是一个map的输?,每个split的长度都大于mapred.max.split.size, 因ؓ以block为单? 因此也会大于blockSize, 此文件剩下的长度如果大于mapred.min.split.size.per.node, 则生成一个split, 否则先暂时保?

    2. 现在剩下的都是一些长度效短的片,把每个rack下碎片合q? 只要长度过mapred.max.split.size合q成一个split, 最后如果剩下的片比mapred.min.split.size.per.rack? 合q成一个split, 否则暂时保留.

    3. 把不同rack下的片合ƈ, 只要长度过mapred.max.split.size合q成一个split, 剩下的碎片无论长? 合ƈ成一个split.
    举例: mapred.max.split.size=1000
    mapred.min.split.size.per.node=300
    mapred.min.split.size.per.rack=100
    输入目录下五个文?rack1下三个文?长度?050,1499,10, rack2下两个文?长度?010,80. 另外blockSize?00.
    l过W一? 生成五个split: 1000,1000,1000,499,1000. 剩下的碎片ؓrack1?50,10; rack2?0:80
    ׃两个rack下的片和都不超q?00, 所以经q第二步, split和碎片都没有变化.
    W三?合ƈ四个片成一个split, 长度?50.

    如果要减map数量, 可以调大mapred.max.split.size, 否则调小卛_.

    其特Ҏ: 一个块臛_作ؓ一个map的输入,一个文件可能有多个块,一个文件可能因为块多分l做Z同map的输入, 一个map可能处理多个块,可能处理多个文g?br />
    注:CombineFileInputFormat是一个抽象类Q需要编写一个承类?br />

    import java.io.IOException;

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileSplit;
    import org.apache.hadoop.mapred.InputSplit;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.LineRecordReader;
    import org.apache.hadoop.mapred.RecordReader;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.lib.CombineFileInputFormat;
    import org.apache.hadoop.mapred.lib.CombineFileRecordReader;
    import org.apache.hadoop.mapred.lib.CombineFileSplit;

    @SuppressWarnings("deprecation")
    public class CombinedInputFormat extends CombineFileInputFormat<LongWritable, Text> {

        @SuppressWarnings({ "unchecked", "rawtypes" })
        @Override
        public RecordReader<LongWritable, Text> getRecordReader(InputSplit split, JobConf conf, Reporter reporter) throws IOException {

            return new CombineFileRecordReader(conf, (CombineFileSplit) split, reporter, (Class) myCombineFileRecordReader.class);
        }

        public static class myCombineFileRecordReader implements RecordReader<LongWritable, Text> {
            private final LineRecordReader linerecord;

            public myCombineFileRecordReader(CombineFileSplit split, Configuration conf, Reporter reporter, Integer index) throws IOException {
                FileSplit filesplit = new FileSplit(split.getPath(index), split.getOffset(index), split.getLength(index), split.getLocations());
                linerecord = new LineRecordReader(conf, filesplit);
            }

            @Override
            public void close() throws IOException {
                linerecord.close();

            }

            @Override
            public LongWritable createKey() {
                // TODO Auto-generated method stub
                return linerecord.createKey();
            }

            @Override
            public Text createValue() {
                // TODO Auto-generated method stub
                return linerecord.createValue();
            }

            @Override
            public long getPos() throws IOException {
                // TODO Auto-generated method stub
                return linerecord.getPos();
            }

            @Override
            public float getProgress() throws IOException {
                // TODO Auto-generated method stub
                return linerecord.getProgress();
            }

            @Override
            public boolean next(LongWritable key, Text value) throws IOException {

                // TODO Auto-generated method stub
                return linerecord.next(key, value);
            }

        }
    }


    在运行时q样讄Q?br />
    if (argument != null) {
                    conf.set("mapred.max.split.size", argument);
                } else {
                    conf.set("mapred.max.split.size", "134217728"); // 128 MB
                }
    //

                conf.setInputFormat(CombinedInputFormat.class);




    paulwong 2013-08-29 16:08 发表评论
    ]]>
    使用Sqoop实现HDFS与Mysql互{http://www.aygfsteel.com/paulwong/archive/2013/05/11/399153.htmlpaulwongpaulwongSat, 11 May 2013 13:27:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/05/11/399153.htmlhttp://www.aygfsteel.com/paulwong/comments/399153.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/05/11/399153.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/399153.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/399153.html ?br /> Sqoop是一个用来将Hadoop和关pd数据库中的数据相互{Uȝ工具Q可以将一个关pd数据库(例如 Q?MySQL ,Oracle ,Postgres{)中的数据导入到Hadoop的HDFS中,也可以将HDFS的数据导入到关系型数据库中?br />
    http://sqoop.apache.org/

    环境
    当调试过E出现IncompatibleClassChangeError一般都是版本兼定w题?br />
    Z保证hadoop和sqoop版本的兼Ҏ,使用ClouderaQ?br />
    Cloudera介:

    ClouderaZ让Hadoop的配|标准化Q可以帮助企业安装,配置Q运行hadoop以达到大规模企业数据的处理和分析?br />
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDHTarballs/3.25.2013/CDH4-Downloadable-Tarballs/CDH4-Downloadable-Tarballs.html

    下蝲安装hadoop-0.20.2-cdh3u6Qsqoop-1.3.0-cdh3u6?br />
    安装
    安装比较单,直接解压卛_

    唯一需要做的就是将mysql的jdbc适配包mysql-connector-java-5.0.7-bin.jar copy?SQOOP_HOME/lib下?br />
    配置好环境变量:/etc/profile

    export SQOOP_HOME=/home/hadoop/sqoop-1.3.0-cdh3u6/

    export PATH=$SQOOP_HOME/bin:$PATH

    MYSQL转HDFS-CZ
    ./sqoop import --connect jdbc:mysql://10.8.210.166:3306/recsys --username root --password root --table shop -m 1 --target-dir /user/recsys/input/shop/$today


    HDFS转MYSQ-CZ
    ./sqoop export --connect jdbc:mysql://10.8.210.166:3306/recsys --username root --password root --table shopassoc --fields-terminated-by ',' --export-dir /user/recsys/output/shop/$today

    CZ参数说明
    (其他参数我未使用Q故不作解释Q未使用Q就没有发言权,详见命ohelp)


    参数cd

    参数?br />
    解释

    公共

    connect

    Jdbc-url

    公共

    username

    ---

    公共

    password

    ---

    公共

    table

    表名

    Import

    target-dir

    制定输出hdfs目录Q默认输出到/user/$loginName/

    export

    fields-terminated-by

    Hdfs文g中的字段分割W,默认?#8220;\t”

    export

    export-dir

    hdfs文g的\?img src ="http://www.aygfsteel.com/paulwong/aggbug/399153.html" width = "1" height = "1" />

    paulwong 2013-05-11 21:27 发表评论
    ]]>
    全方位的技术服务及相关技术解x案(Ujava解决ҎQ?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/05/11/399132.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Fri, 10 May 2013 16:17:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/05/11/399132.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/399132.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/05/11/399132.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/399132.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/399132.html</trackback:ping><description><![CDATA[<a target="_blank">http://www.iteye.com/topic/1128561</a> @import url(http://www.aygfsteel.com/CuteSoft_Client/CuteEditor/Load.ashx?type=style&file=SyntaxHighlighter.css);@import url(/css/cuteeditor.css);<img src ="http://www.aygfsteel.com/paulwong/aggbug/399132.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-05-11 00:17 <a href="http://www.aygfsteel.com/paulwong/archive/2013/05/11/399132.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>一|打?3Ƒּ源Java大数据工?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/05/03/398700.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Fri, 03 May 2013 01:05:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/05/03/398700.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/398700.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/05/03/398700.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/398700.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/398700.html</trackback:ping><description><![CDATA[<p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>下面介l大数据领域支持Java的主开源工?/strong>Q?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce391277b5.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>1. HDFS</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">HDFS是Hadoop应用E序中主要的分布式储存系l, HDFS集群包含了一个NameNodeQ主节点Q,q个节点负责理所有文件系l的元数据及存储了真实数据的DataNodeQ数据节点,可以有很多)。HDFS针对量数据所设计Q所以相比传l文件系l在大批量小文g上的优化QHDFS优化的则是对批量大型文件的讉K和存储?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"></a><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce3c49ded6.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>2. MapReduce</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Hadoop MapReduce是一个Y件框Ӟ用以L~写处理量QTBU)数据的ƈ行应用程序,以可靠和定w的方式连?span style="line-height: 1.45em;">大型集群?/span><span style="line-height: 1.45em;">上万个节点(商用gQ?/span></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce3ee64519.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>3. HBase</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache HBase是Hadoop数据库,一个分布式、可扩展的大数据存储。它提供了大数据集上随机和实时的?写访问,q对了商用服务器集上的大型表格做Z?#8212;—上百亿行Q上千万列。其核心是Google Bigtable论文的开源实玎ͼ分布式列式存储。就像Bigtable利用GFSQGoogle File SystemQ提供的分布式数据存储一P它是Apache Hadoop在HDFS基础上提供的一个类Bigatable?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce413366c7.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>4. Cassandra</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Cassandra是一个高性能、可U性扩展、高有效性数据库Q可以运行在商用g或云基础设施上打造完的d关键性数据^台。在横跨数据中心的复制中QCassandra同类最佻I为用h供更低的延时以及更可靠的N备䆾。通过log-structured update、反规范化和物化视图的强支持以及强大的内|缓存,Cassandra的数据模型提供了方便的二U烦引(column indexeQ?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce4611885c.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>5. Hive</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Hive是Hadoop的一个数据仓库系l,促进了数据的lDQ将l构化的数据文g映射Z张数据库表)、即席查询以及存储在Hadoop兼容pȝ中的大型数据集分析。Hive提供完整的SQL查询功能——HiveQL语言Q同时当使用q个语言表达一?span style="line-height: 1.45em;">逻辑</span><span style="line-height: 1.45em;">变得低效和繁?/span><span style="line-height: 1.45em;">ӞHiveQLq允怼l的Map/ReduceE序员用自己定制的Mapper和Reducer?/span></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce470085ed.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>6. Pig</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Pig是一个用于大型数据集分析的^収ͼ它包含了一个用于数据分析应用的高语言以及评估q些应用的基设施。Pig应用的闪光特性在于它们的l构l得起大量的q行Q也是说让它们支撑起非常大的数据集。Pig的基设施层包含了产生Map-Reduced的编译器。Pig的语a层当前包含了一个原生语a——Pig LatinQ开发的初衷是易于编E和保证可扩展性?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce47b8e077.jpg" border="0" alt="" style="vertical-align: middle; border: none; width: 99px; height: 99px; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>7. Chukwa</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Chukwa是个开源的数据攉pȝQ用以监视大型分布系l。徏立于HDFS和Map/Reduce框架之上Q承了Hadoop的可扩展性和E_性。Chukwa同样包含了一个灵zd强大的工具包Q用以显C、监视和分析l果Q以保证数据的用达到最x果?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce4870b072.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>8. Ambari</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Ambari是一个基于web的工P用于配置、管理和监视Apache Hadoop集群Q支持Hadoop HDFS,、Hadoop MapReduce、Hive、HCatalog,、HBase、ZooKeeper、Oozie、Pig和Sqoop。Ambari同样q提供了集群状况仪表盘,比如heatmaps和查看MapReduce、Pig、Hive应用E序的能力,以友好的用户界面对它们的性能Ҏ进行诊断?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce49282930.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>9. ZooKeeper</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache ZooKeeper是一个针对大型分布式pȝ的可靠协调系l,提供的功能包括:配置l护、命名服务、分布式同步、组服务{。ZooKeeper的目标就是封装好复杂易出错的关键服务Q将单易用的接口和性能高效、功能稳定的pȝ提供l用戗?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce49e31e19.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>10. Sqoop</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Sqoop是一个用来将Hadoop和关pd数据库中的数据相互{Uȝ工具Q可以将一个关pd数据库中数据导入Hadoop的HDFS中,也可以将HDFS中数据导入关pd数据库中?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce4b0d3c61.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>11. Oozie</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Oozie是一个可扩展、可靠及可扩充的工作调度系l,用以理Hadoop作业。Oozie Workflow作业是活动的Directed Acyclical GraphsQDAGsQ。Oozie Coordinator作业是由周期性的Oozie Workflow作业触发Q周期一般决定于旉Q频率)和数据可用性。Oozie与余下的Hadoop堆栈l合使用Q开即用的支持多种cdHadoop作业Q比如:Java map-reduce、Streaming map-reduce、Pig?Hive、Sqoop和DistcpQ以及其它系l作业(比如JavaE序和Shell脚本Q?/p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce4bdedb23.jpg" border="0" alt="" style="vertical-align: middle; border: none; width: 100px; height: 100px; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>12. Mahout</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache Mahout是个可扩展的机器学习和数据挖掘库Q当前Mahout支持主要?个用例:</p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"></p><ul style="margin: 0px 0px 1em 20px; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">推荐挖掘Q搜集用户动作ƈ以此l用h荐可能喜Ƣ的事物?/span></li><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">聚集Q收集文件ƈq行相关文g分组?/span></li><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">分类Q从现有的分cL档中学习Q寻找文档中的相似特征,qؓ无标{文档q行正确的归cR?/span></li><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">频繁w挖掘Q将一l项分组Qƈ识别哪些个别会l常一起出现?/span></li></ul><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><a target="_blank" style="cursor: pointer; color: #0066cc; text-decoration: none;"><img src="http://cms.csdnimg.cn/article/201304/28/517ce4cf93346.jpg" border="0" alt="" style="vertical-align: middle; border: none; float: right; margin: 0px 0px 10px 10px;" /></a></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><strong>13. HCatalog</strong></p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;">Apache HCatalog是Hadoop建立数据的映表和存储管理服务,它包括:</p><p style="margin: 0px 0px 1.5em; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"></p><ul style="margin: 0px 0px 1em 20px; padding: 0px; list-style: none; color: #333333; font-family: Helvetica, Tahoma, Arial, sans-serif; line-height: 24px; background-color: #ffffff;"><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">提供一个共享模式和数据cd机制?/span></li><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">提供一个抽象表Q这L户就不需要关注数据存储的方式和地址?/span></li><li style="margin: 0px; padding: 0px; list-style: disc;"><span style="line-height: 1.45em;">为类似Pig、MapReduce及Hiveq些数据处理工具提供互操作性?/span></li></ul><img src ="http://www.aygfsteel.com/paulwong/aggbug/398700.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-05-03 09:05 <a href="http://www.aygfsteel.com/paulwong/archive/2013/05/03/398700.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>一个PIG脚本例子分析http://www.aygfsteel.com/paulwong/archive/2013/04/13/397791.htmlpaulwongpaulwongSat, 13 Apr 2013 07:21:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/04/13/397791.htmlhttp://www.aygfsteel.com/paulwong/comments/397791.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/04/13/397791.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/397791.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/397791.html
    PIGGYBANK_PATH=$PIG_HOME/contrib/piggybank/java/piggybank.jar
    INPUT=pig/input/test-pig-full.txt
    OUTPUT=pig/output/test-pig-output-$(date  +%Y%m%d%H%M%S)
    PIGSCRIPT=analyst_status_logs.pig

    #analyst_500_404_month.pig
    #
    analyst_500_404_day.pig
    #
    analyst_404_percentage.pig
    #
    analyst_500_percentage.pig
    #
    analyst_unique_path.pig
    #
    analyst_user_logs.pig
    #
    analyst_status_logs.pig


    pig -p PIGGYBANK_PATH=$PIGGYBANK_PATH -p INPUT=$INPUT -p OUTPUT=$OUTPUT $PIGSCRIPT


    要分析的数据源,LOG 文g
    46.20.45.18 - - [25/Dec/2012:23:00:25 +0100] "GET / HTTP/1.0" 302 - "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" "-" "-" 46.20.45.18 "" 11011AEC9542DB0983093A100E8733F8 0
    46.20.45.18 - - [25/Dec/2012:23:00:25 +0100] "GET /sign-in.jspx HTTP/1.0" 200 3926 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" "-" "-" 46.20.45.18 "" 11011AEC9542DB0983093A100E8733F8 0
    69.59.28.19 - - [25/Dec/2012:23:01:25 +0100] "GET / HTTP/1.0" 302 - "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" "-" "-" 69.59.28.19 "" 36D80DE7FE52A2D89A8F53A012307B0A 15


    PIG脚本Q?br />
    --注册JAR包,因ؓ要用到DateExtractor
    register '$PIGGYBANK_PATH';

    --声明一个短函数?br />DEFINE DATE_EXTRACT_MM 
    org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('yyyy-MM');

    DEFINE DATE_EXTRACT_DD 
    org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor('yyyy-MM-dd');

    -- pig/input/test-pig-full.txt
    --把数据从变量所指的文g加蝲到PIG中,q定义数据列名,此时的数据集为数l?a,b,c)
    raw_logs = load '$INPUT' USING org.apache.pig.piggybank.storage.MyRegExLoader('^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(\\S+) (\\S+) (HTTP[^"]+)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" "(\\S+)" "(\\S+)" (\\S+) "(.*)" (\\S+) (\\S+)')
    as (remoteAddr: chararray, 
    n2: chararray, 
    n3: chararray, 
    time: chararray, 
    method: chararray,
    path:chararray,
    protocol:chararray,
    status: int, 
    bytes_string: chararray, 
    referrer: chararray, 
    browser: chararray, 
    n10:chararray,
    remoteLogname: chararray, 
    remoteAddr12: chararray, 
    path2: chararray, 
    sessionid: chararray, 
    n15: chararray
    );

    --qo数据
    filter_logs = FILTER raw_logs BY not (browser matches '.*pingdom.*');
    --item_logs = FOREACH raw_logs GENERATE browser;

    --percent 500 logs
    --重定义数据项Q数据集只取2status,month
    reitem_percent_500_logs = FOREACH filter_logs GENERATE status,DATE_EXTRACT_MM(time) as month;
    --分组数据集,此时的数据结构ؓMAP(a{(aa,bb,cc),(dd,ee,ff)},b{(bb,cc,dd),(ff,gg,hh)})
    group_month_percent_500_logs = GROUP reitem_percent_500_logs BY (month);
    --重定义分l数据集数据,q行分组l计Q此时要联合分组数据集和原数据集l计
    final_month_500_logs = FOREACH group_month_percent_500_logs 
    {
        --对原数据集做countQ因为是在foreachj里做count的,即是对原数据集Q也会自动会加month==group的条?br />    --从这里可以看出对于group里的数据集,完全没用?br />    --q时是以每一行ؓ单位的,l计MAP中的KEY-a对应的数l在原数据集中的个数
        total = COUNT(reitem_percent_500_logs);
        --对原数据集做filterQ因为是在foreachj里做count的,即是对原数据集Q也会自动会加month==group的条?br />    --重新qo一下原数据集,得到status==500,month==group的数据集
        t = filter reitem_percent_500_logs by status== 500; --create a bag which contains only T values
        --重定义数据项Q取groupQ统计结?br />    generate flatten(group) as col1, 100*(double)COUNT(t)/(double)total;
    }
    STORE final_month_500_logs into '$OUTPUT' using PigStorage(',');



    paulwong 2013-04-13 15:21 发表评论
    ]]>
    把命令行中的gqPIG?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/04/10/397645.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Wed, 10 Apr 2013 07:32:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/04/10/397645.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/397645.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/04/10/397645.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/397645.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/397645.html</trackback:ping><description><![CDATA[<a target="_blank">http://wiki.apache.org/pig/ParameterSubstitution<br /> <br /> <br /> </a> <div> <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> -->%pig -param input=/user/paul/sample.txt -param output=/user/paul/output/</div> </div><br /><br />PIG中获?br /><div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />-->records = LOAD <span style="color: #800080; ">$input</span>;</div><img src ="http://www.aygfsteel.com/paulwong/aggbug/397645.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-04-10 15:32 <a href="http://www.aygfsteel.com/paulwong/archive/2013/04/10/397645.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>PIG议http://www.aygfsteel.com/paulwong/archive/2013/04/05/397411.htmlpaulwongpaulwongFri, 05 Apr 2013 13:33:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/04/05/397411.htmlhttp://www.aygfsteel.com/paulwong/comments/397411.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/04/05/397411.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/397411.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/397411.html什么是PIG
    是一U设计语aQ通过设计数据怎么动Q然后由相应的引擎将此变成MAPREDUCE JOB去HADOOP中运行?/div>
    PIG与SQL
    两者有相同之处Q执行一个或多个语句Q然后出来一些结果?/div>
    但不同的是,SQL要先把数据导到表中才能执行,SQL不关心中间如何做Q即发一个SQL语句q去Q就有结果出来?/div>
    PIGQ无d数据到表中,但要设计直到出结果的中间q程Q步骤如何等{?/div>

    paulwong 2013-04-05 21:33 发表评论
    ]]>hadoop集群中添加节Ҏ?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/03/16/396544.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sat, 16 Mar 2013 15:04:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/03/16/396544.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/396544.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/03/16/396544.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/396544.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/396544.html</trackback:ping><description><![CDATA[在新节点安装好hadoop<br /><br /><br />把namenode的有关配|文件复制到该节?br /><br /><br />修改masters和slaves文g,增加该节?br /><br /><br />讄ssh免密码进节点<br /><br /><br />单独启动该节点上的datanode和tasktracker(hadoop-daemon.sh start datanode/tasktracker)<br /><br /><br />q行start-balancer.shq行数据负蝲均衡<br /> <br /><br />负蝲均衡:作用:当节点出现故?或新增加节点?数据块分布可能不均匀,负蝲均衡可以重新q各个datanode上数据块的分?img src ="http://www.aygfsteel.com/paulwong/aggbug/396544.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-03-16 23:04 <a href="http://www.aygfsteel.com/paulwong/archive/2013/03/16/396544.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Phoenix: HBasel于有SQL接口了~http://www.aygfsteel.com/paulwong/archive/2013/02/19/395432.htmlpaulwongpaulwongTue, 19 Feb 2013 15:15:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/02/19/395432.htmlhttp://www.aygfsteel.com/paulwong/comments/395432.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/02/19/395432.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/395432.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/395432.html
    具体详见Q?a target="_blank">https://github.com/forcedotcom/phoenix

    支持selectQfromQwhereQgroupbyQhavingQorderby和徏表操作,未来支持二U烦引,join操作Q动态列等功能?br />
    是徏立在原生HBASE API基础上的Q响应时?0MU别的数据是毫秒Q?00MU别是秒?br />



    paulwong 2013-02-19 23:15 发表评论
    ]]>
    监控HBASEhttp://www.aygfsteel.com/paulwong/archive/2013/02/04/395107.htmlpaulwongpaulwongMon, 04 Feb 2013 07:08:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/02/04/395107.htmlhttp://www.aygfsteel.com/paulwong/comments/395107.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/02/04/395107.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/395107.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/395107.htmlHadoop/Hbase是开源版的google Bigtable, GFS, MapReduce的实玎ͼ随着互联|的发展Q大数据的处理显得越发重要,Hadoop/Hbase的用武之C发q泛。ؓ了更好的使用Hadoop/HbasepȝQ需要有一套完善的监控pȝQ来了解pȝq行的实时状态,做到一切尽在掌握。Hadoop/Hbase有自己非常完善的metrics framework, 里面包种各种l度的系l指标的l计Q另外,q套metrics framework设计的也非常不错Q用户可以很方便地添加自定义的metrics。更为重要的一Ҏmetrics的展C方式,目前它支持三U方式:一U是落地到本地文Ӟ一U是reportlGangliapȝQ另一U是通过JMX来展C。本文主要介l怎么把Hadoop/Hbase的metrics reportlGangliapȝQ通过览器来查看?br />
    介绍后面的内容之前有必要先简单介l一下Gangliapȝ。Ganglia是一个开源的用于pȝ监控的系l,它由三部分组成:gmond, gmetad, webfrontend, 三部分是q样分工的:

    gmond: 是一个守护进E,q行在每一个需要监的节点上,攉监测l计Q发送和接受在同一个组播或单播通道上的l计信息
    gmetad: 是一个守护进E,定期查gmondQ从那里拉取数据Qƈ他们的指标存储在RRD存储引擎?br /> webfrontend: 安装在有gmetadq行的机器上Q以便读取RRD文gQ用来做前台展示

    单ȝ它们三者的各自的功用,gmond攉数据各个node上的metrics数据Qgmetad汇总gmond攉到的数据Qwebfrontend在前台展Cgmetad汇ȝ数据。Ganglia~省是对pȝ的一些metricq行监控Q比如cpu/memory/net{。不qHadoop/Hbase内部做了对Ganglia的支持,只需要简单的攚w|就可以Hadoop/Hbase的metrics也接入到gangliapȝ中进行监控?br />
    接下来介l如何把Hadoop/Hbase接入到GangliapȝQ这里的Hadoop/Hbase的版本号?.94.2Q早期的版本可能会有一些不同,h意区别。Hbase本来是Hadoop下面的子目Q因此所用的metrics framework原本是同一套Hadoop metricsQ但后面hadoop有了改进版本的metrics framework:metrics2(metrics version 2), Hadoop下面的项目都已经开始用metrics2, 而Hbase成了Apache的顶U子目Q和Hadoop成ؓq的项目后Q目前还没跟qmetrics2Q它用的q是原始的metrics.因此q里需要把Hadoop和Hbase的metrics分开介绍?br />
    Hadoop接入Ganglia:

    1. Hadoop metrics2对应的配|文件ؓQhadoop-metrics2.properties
    2. hadoop metrics2中引用了source和sink的概念,source是用来收集数据的, sink是用来把source攉的数据consume的(包括落地文gQ上报gangliaQJMX{)
    3. hadoop metrics2配置支持Ganglia:
    #*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink30
    *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
     
    *.sink.ganglia.period=10
    *.sink.ganglia.supportsparse=true
    *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
    *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
     
    #uncomment as your needs
    namenode.sink.ganglia.servers=10.235.6.156:8649
    #datanode.sink.ganglia.servers=10.235.6.156:8649
    #jobtracker.sink.ganglia.servers=10.0.3.99:8649
    #tasktracker.sink.ganglia.servers=10.0.3.99:8649
    #maptask.sink.ganglia.servers=10.0.3.99:8649
    #reducetask.sink.ganglia.servers=10.0.3.99:8649


    q里需要注意的几点Q?br />
    (1) 因ؓGanglia3.1?.0不兼容,需要根据Ganglia的版本选择使用GangliaSink30或者GangliaSink31
    (2) period配置上报周期Q单位是U?s)
    (3) namenode.sink.ganglia.servers指定Ganglia gmetad所在的host:portQ用来向其上报数?br /> (4) 如果同一个物理机器上同时启动了多个hadoopq程(namenode/datanode, etc)Q根据需要把相应的进E的sink.ganglia.servers配置好即?br /> Hbase接入Ganglia:

    1. Hbase所用的hadoop metrics对应的配|文件是: hadoop-metrics.properties
    2. hadoop metrics里核心是ContextQ写文g有写文g的TimeStampingFileContext, 向Ganglia上报有GangliaContext/GangliaContext31
    3. hadoop metrics配置支持Ganglia:
    # Configuration of the "hbase" context for ganglia
    # Pick one: Ganglia 3.0 (former) or Ganglia 3.1 (latter)
    # hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext
    hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
    hbase.period=10
    hbase.servers=10.235.6.156:8649

    q里需要注意几点:

    (1) 因ؓGanglia3.1?.0不兼容,所以如果是3.1以前的版本,需要用GangliaContext, 如果?.1版的GangliaQ需要用GangliaContext31
    (2) period的单位是U?s)Q通过period可以配置向Ganglia上报数据的周?br /> (3) servers指定的是Ganglia gmetad所在的host:portQ把数据上报到指定的gmetad
    (4) 对rpc和jvm相关的指标都可以q行cM的配|?/div>







    paulwong 2013-02-04 15:08 发表评论
    ]]>HBASE部v要点http://www.aygfsteel.com/paulwong/archive/2013/02/04/395101.htmlpaulwongpaulwongMon, 04 Feb 2013 04:10:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/02/04/395101.htmlhttp://www.aygfsteel.com/paulwong/comments/395101.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/02/04/395101.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/395101.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/395101.htmlREGIONS SERVER和TASK TRACKER SERVER不要在同一台机器上Q最好如果有MAPREDUCE JOBq行的话Q应该分开两个CLUSTERQ即两群不同的服务器上,q样MAPREDUCE 的线下负载不会媄响到SCANERq些U上负蝲?/div>

    如果主要是做MAPREDUCE JOB的话Q将REGIONS SERVER和TASK TRACKER SERVER攑֜一h可以的?/div>


    原始集群模式

    10个或以下节点Q无MAPREDUCE JOBQ主要用于低延迟的访问。每个节点上的配|ؓQCPU4-6COREQ内?4-32GQ?个SATA盘。Hadoop NameNode, JobTracker, HBase Master, 和ZooKeeper全都在同一个NODE上?


    型集群模式Q?0-20台服务器Q?/span>

    HBase Master攑֜单独一台机器上, 以便于用较低配|的机器。ZooKeeper也放在单独一台机器上QNameNode和JobTracker攑֜同一台机器上?/div>

    中型集群模式Q?0-50台服务器Q?/span>

    ׃无须再节省费用,可以HBase Master和ZooKeeper攑֜同一台机器上, ZooKeeper和HBase Master要三个实例。NameNode和JobTracker攑֜同一台机器上?/div>

    大型集群模式Q?gt;50台服务器Q?/span>

    和中型集模式相|但ZooKeeper和HBase Master要五个实例。NameNode和Second NameNode要有_大的内存?/div>

    HADOOP MASTER节点

    NameNode和Second NameNode服务器配|要求:Q小型)8CORE CPUQ?6G内存Q?G|卡和SATA 盘Q中弄再增加?6G内存Q大型则再增加多32G内存?/div>

    HBASE MASTER节点

    服务器配|要求:4CORE CPUQ?-16G内存Q?G|卡?个SATA 盘Q一个用于操作系l,另一个用于HBASE MASTER LOGS?/div>

    HADOOP DATA NODES和HBASE REGION SERVER节点

    DATA NODE和REGION SERVER应在同一台服务器上,且不应该和TASK TRACKER在一赗服务器配置要求Q?-12CORE CPUQ?4-32G内存Q?G|卡?2*1TB SATA 盘Q一个用于操作系l,另一个用于HBASE MASTER LOGS?/div>

    ZOOPKEEPERS节点

    服务器配|和HBASE MASTER怼Q也可以与HBASE MASTER攑֜一P但就要多增加一个硬盘单独给ZOOPKEEPER使用?/div>

    安装各节?/span>

    JVM配置Q?/div> -Xmx8g—讄HEAP的最大值到8GQ不讑ֈ15 GB.
    -Xms8g—讄HEAP的最值到8GS.
    -Xmn128m—讄新生代的值到128 MBQ默认值太?br /> -XX:+UseParNewGC—讄对于新生代的垃圾回收器类型,q种cd是会停止JAVAq程Q然后再q行回收的,但由于新生代体积比较,持箋旉通常只有几毫U,因此可以接受?br /> -XX:+UseConcMarkSweepGC—讄老生代的垃圾回收cdQ如果用新生代的那个会不合适,即会DJAVAq程停止的时间太长,用这U不会停止JAVAq程Q而是在JAVAq程q行的同Ӟq行的进行回收?br /> -XX:CMSInitiatingOccupancyFraction—讄CMS回收器运行的频率?br />






    paulwong 2013-02-04 12:10 发表评论
    ]]>HBASEMW记http://www.aygfsteel.com/paulwong/archive/2013/02/01/395020.htmlpaulwongpaulwongFri, 01 Feb 2013 05:55:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/02/01/395020.htmlhttp://www.aygfsteel.com/paulwong/comments/395020.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/02/01/395020.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/395020.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/395020.htmlGET、PUT是ONLINE的操作,MAPREDUCE是OFFLINE的操?/div>


    HDFS写流E?/span>
    客户端收到要保存文g的请求后Q将文g?4M为单位拆成若q䆾BLOCKQŞ成一个列表,即由几个BLOCKl成Q将q些信息告诉NAME NODEQ我要保存这个,NAME NODE出一个列表,哪段BLOCK应该写到哪个DATA NODEQ客L第一个BLOCK传到W一个节点DATA NODE AQ通知其保存,同时让它通知DATA NODE D和DATA NODE B也保存一份,DATA NODE D收到信息后进行了保存Q同旉知DATA NODE B保存一份,DATA NODE B保存完成后则通知客户端保存完成,客户端再dNAME NODE中取下一个BLOCK要保存的位置Q重复以上的动作Q直到所有的BLOCK都保存完成?/div>

    HDFSLE?/span>
    客户端向NAME NODEhM个文ӞNAME NODEq回q个文g所构成的所有BLOCK的DATA NODE IP及BLOCK IDQ客Lq行的向各DATA NODE发出hQ要取某个BLOCK ID的BLOCKQDATA NODE发回所要的BLOCKl客LQ客L攉到所有的BLOCK后,整合成一个完整的文g后,此流E结束?br />

    MAPREDUCE程
    输入数据 -- 非多U程了,而是多进E的挑选数据,卛_输入数据分成多块Q每个进E处理一?-- 分组 -- 多进E的汇集数据 -- 输出

    HBASE表结?/span>
    HBASE中将一个大表数据分成不同的表Q每个小表叫REGIONQ存放REGION的服务器叫REGIONSERVERQ一个REGIONSERVER可以存放多个REGION。通常REGIONSERVER和DATA NODE是在同一服务器,以减NETWORK IO?/div>
    -ROOT-表存放于MASTER SERVER上,记录了一共有多少个REGIONSERVERQ每个REGION SERVER上都有一?META.表,上面记录了本REGION SERVER放有哪几个表的哪几个REGION。如果要知道某个表共有几个REGIONQ就得去所有的REGION SERVER上查.META.表,q行汇L能得知?/div>
    客户端如果要查ROW009的信息,先去咨询ZOOPKEEPERQ?ROOT-表在哪里Q然后问-ROOT-表,哪个.META.知道q个信息Q然后去?META.表,哪个REGION有这个信息,然后去那个REGION问ROW009的信息,然后那个REGIONq回此信息?br />


    HBASE MAPREDUCE
    一个REGION一个MAPdQ而Q务里的mapҎ执行多少ơ,则由查询出来的记录有多少条,则执行多次?/div>
    REDUCEd负责向REGION写数据,但写到哪个REGION则由那个KEY归属哪个REGION,则写到哪个REGIONQ有可能REDUCEd会和所有的REGION SERVER交互?br />


    在HBASE的MAPREDUCE JOB中用JOIN
    REDUCE-SIDE JOIN
    利用现有的SHUTTLE分组机制Q在REDUCE阶段做JOINQ但׃MAP阶段数据大,可能会有性能问题?/div>
    MAP-SIDE JOIN
    数据较的一表读C公共文g中,然后在MPAҎ中@环另一表的数据Q再要的数据从公共文g中读取。这样可以减SHUTTLE和SORT的时_同时也不需要REDUCEd?/div>


    paulwong 2013-02-01 13:55 发表评论
    ]]>Hadoop的几UJoinҎhttp://www.aygfsteel.com/paulwong/archive/2013/01/31/395000.htmlpaulwongpaulwongThu, 31 Jan 2013 10:24:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/01/31/395000.htmlhttp://www.aygfsteel.com/paulwong/comments/395000.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/01/31/395000.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/395000.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/395000.html2) 压羃字段,Ҏ据预处理,qo不需要的字段.
    3) 最后一步就是在Mapper阶段qo,q个是Bloom Filter的用武之C.也就是需要详l说明的地方.


    下面拿一个我们大安熟悉的场景来说明q个问题: 扑և上个月动感地带的客户资费的用情?包括接入和拨?

    (q个只是我臆惛_来的例子,Ҏ实际的DB数据存储l构,在这个场景下肯定有更好的解决Ҏ,大家不要太较真哦)

    q个时候的两个个数据集都是比较大的,q两个数据集分别?上个月的通话记录,动感地带的手机号码列?


    比较直接的处理方法有2U?

    1)?Reduce 阶段,通过动感地带L来过?

    优点:q样需要处理的数据相对比较?q个也是比较常用的方?

    ~点:很多数据在Mapper阶段׃老E子力气汇M,q通过|络Shuffle到Reduce节点,l果到这个阶D늻qo?



    2)?Mapper 阶段?通过动感地带L来过滤数?

    优点:q样可以qo很多不是动感地带的数?比如州?全球?q些qo的数据就可以节省很多|络带宽?

    ~点:是动感地带的号码不是小数目,如果q样处理需要把q个大块头复制到所有的Mapper节点,甚至是Distributed Cache.(Bloom Filter是用来解决q个问题?


    Bloom Filter是用来解决上面Ҏ2的缺点的.

    Ҏ2的缺点就是大量的数据需要在多个节点复制.Bloom Filter通过多个Hash法, 把这个号码列表压~到了一个Bitmap里面. 通过允许一定的错误率来换空? q个和我们^时经常提到的旉和空间的互换cM.详细情况可以参?

    http://blog.csdn.net/jiaomeng/article/details/1495500

    但是q个法也是有缺L,是会把很多州?全球通之cȝL当成动感地带.但在q个场景?q根本不是问?因ؓq个法只是qo一些号?漏网之鱼会在Reduce阶段q行_匚w旉虑掉.

    q个Ҏ改进之后基本上完全回避了Ҏ2的缺?

    1) 没有大量的动感地带号码发送到所有的Mapper节点.
    2) 很多非动感地带号码在Mapper阶段p滤了(虽然不是100%),避免了网l带宽的开销及g?


    l箋需要学习的地方:Bitmap的大? Hash函数的多? 以及存储的数据的多少. q?个变量如何取值才能才能在存储I间与错误率之间取得一个^?

    paulwong 2013-01-31 18:24 发表评论
    ]]>
    配置secondarynamenodehttp://www.aygfsteel.com/paulwong/archive/2013/01/31/394998.htmlpaulwongpaulwongThu, 31 Jan 2013 09:39:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/01/31/394998.htmlhttp://www.aygfsteel.com/paulwong/comments/394998.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/01/31/394998.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/394998.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/394998.html

    SECOND NAME NODE会以HTTP的方式向NAME NODE要这两个文gQ当NAME NODE收到hӞ׃韦一个新的EditLog来记录,q时SECOND NAME NODE׃取得的q两个文件合qӞ成一个新的FsImageQ再发给NAME NODEQNAME NODE收到后,׃以这个ؓ准,旧的׃归档不用?br />

    SECOND NAME NODEq有一个用途就是当NAME NODE DOWN了的时候,可以改SECOND NAME NODE的IP为NAME NODE所用的IPQ当NAME NODE用?br />
    secondary namenoded 配置很容易被忽视Q如果jps查都正常Q大安常不会太关心,除非namenode发生问题的时候,才会惌vq有个secondary namenodeQ它的配|共两步Q?br />
    1. 集群配置文gconf/master中添加secondarynamenode的机?/li>
    2. 修改/d hdfs-site.xml中如下属性:

    <property>
     <name>dfs.http.address</name>
     <value>{your_namenode_ip}:50070</value>
     <description>
     The address and the base port where the dfs namenode web ui will listen on.
     If the port is 0 then the server will start on a free port.
     </description>
     </property>


    q两w|OK后,启动集群。进入secondary namenode 机器Q检查fs.checkpoint.dirQcore-site.xml文gQ默认ؓ${hadoop.tmp.dir}/dfs/namesecondaryQ目录同步状态是否和namenode一致的?br />
    如果不配|第二项则,secondary namenode同步文gҎqؓI,q时查看secondary namenode的log昄错误为:


    2011-06-09 11:06:41,430 INFO org.apache.hadoop.hdfs.server.common.Storage: Recovering storage directory /tmp/hadoop-hadoop/dfs/namesecondary from failed checkpoint.
    2011-06-09 11:06:41,433 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint: 
    2011-06-09 11:06:41,434 ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: java.net.ConnectException: Connection refused
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
    at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:211)
    at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
    at java.net.Socket.connect(Socket.java:529)
    at java.net.Socket.connect(Socket.java:478)
    at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
    at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
    at sun.net.www.http.HttpClient.<init>(HttpClient.java:233)
    at sun.net.www.http.HttpClient.New(HttpClient.java:306)
    at sun.net.www.http.HttpClient.New(HttpClient.java:323)
    at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
    at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
    at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
    at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:151)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.downloadCheckpointFiles(SecondaryNameNode.java:256)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:313)
    at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
    at java.lang.Thread.run(Thread.java:662)


    可能用到的core-site.xml文g相关属?/span>Q?br />
    <property>
    <name>fs.checkpoint.period</name>
    <value>300</value>
    <description>The number of seconds between two periodic checkpoints.
    </description>
    </property>

    <property>
     <name>fs.checkpoint.dir</name>
     <value>${hadoop.tmp.dir}/dfs/namesecondary</value>
     <description>Determines where on the local filesystem the DFS secondary
     name node should store the temporary images to merge.
     If this is a comma-delimited list of directories then the image is
     replicated in all of the directories for redundancy.
     </description>
    </property>


    paulwong 2013-01-31 17:39 发表评论
    ]]>
    大规模数据查重的多种ҎQ及Bloom Filter的应?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/01/31/394980.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Thu, 31 Jan 2013 05:55:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/01/31/394980.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/394980.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/01/31/394980.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/394980.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/394980.html</trackback:ping><description><![CDATA[挺有意思的题目?br /><br /><br /><strong>1. l你A,B两个文gQ各存放50亿条URLQ每条URL占用64字节Q内存限制是4GQ让你找?A,B文g共同的URL?/strong> <br />解法一QHash成内存大的块文gQ然后分块内存内查交集?br />解法二:Bloom FilterQ广泛应用于URLqo、查重。参考http://en.wikipedia.org/wiki/Bloom_filter、http://blog.csdn.net/jiaomeng/archive/2007/01/28/1496329.aspxQ?br /><br /><br /><strong>2. ?0个文Ӟ每个文g1GQ?每个文g的每一行都存放的是用户的queryQ每个文件的query都可能重复。要你按照query的频度排序?/strong><br />解法一Q根据数据稀疏程度算法会有不同,通用Ҏ是用Hash把文仉排,让相同query一定会在同一个文Ӟ同时q行计数Q然后归qӞ用最堆来统计频度最大的?br />解法二:cM1Q但是用的是与简单Bloom FilterE有不同的CBFQCounting Bloom FilterQ或者更q一步的SBFQSpectral Bloom FilterQ参考http://blog.csdn.net/jiaomeng/archive/2007/03/19/1534238.aspxQ?br />解法三:MapReduceQ几分钟可以在hadoop集群上搞定。参考http://en.wikipedia.org/wiki/MapReduce<br /><br /><br /><strong>3. 有一?G大小的一个文Ӟ里面每一行是一个词Q词的大不过16个字节,内存限制大小?M。返回频数最高的100个词?/strong><br />解法一Q跟2cMQ只是不需要排序,各个文g分别l计?00Q然后一h?00?img src ="http://www.aygfsteel.com/paulwong/aggbug/394980.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-01-31 13:55 <a href="http://www.aygfsteel.com/paulwong/archive/2013/01/31/394980.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Windows环境下用ECLIPSE提交MAPREDUCE JOB臌EHBASE中运?/title><link>http://www.aygfsteel.com/paulwong/archive/2013/01/29/394851.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Mon, 28 Jan 2013 16:19:00 GMT</pubDate><guid>http://www.aygfsteel.com/paulwong/archive/2013/01/29/394851.html</guid><wfw:comment>http://www.aygfsteel.com/paulwong/comments/394851.html</wfw:comment><comments>http://www.aygfsteel.com/paulwong/archive/2013/01/29/394851.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.aygfsteel.com/paulwong/comments/commentRss/394851.html</wfw:commentRss><trackback:ping>http://www.aygfsteel.com/paulwong/services/trackbacks/394851.html</trackback:ping><description><![CDATA[<ol> <li>假设q程HADOOPL名ؓubuntuQ则应在hosts文g中加?92.168.58.130       ubuntu<br /> <br /><br /> </li> <li>新徏MAVEN目Q加上相应的配置<br /> pom.xml<br /> <div style="background-color: #eeeeee; font-size: 13px; border: 1px solid #cccccc; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> --><span style="color: #0000FF; "><</span><span style="color: #800000; ">project </span><span style="color: #FF0000; ">xmlns</span><span style="color: #0000FF; ">="http://maven.apache.org/POM/4.0.0"</span><span style="color: #FF0000; "> xmlns:xsi</span><span style="color: #0000FF; ">="http://www.w3.org/2001/XMLSchema-instance"</span><span style="color: #FF0000; "><br />   xsi:schemaLocation</span><span style="color: #0000FF; ">="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">modelVersion</span><span style="color: #0000FF; ">></span>4.0.0<span style="color: #0000FF; "></</span><span style="color: #800000; ">modelVersion</span><span style="color: #0000FF; ">></span><br /> <br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>com.cloudputing<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>bigdata<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>1.0<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">packaging</span><span style="color: #0000FF; ">></span>jar<span style="color: #0000FF; "></</span><span style="color: #800000; ">packaging</span><span style="color: #0000FF; ">></span><br /> <br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>bigdata<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">url</span><span style="color: #0000FF; ">></span>http://maven.apache.org<span style="color: #0000FF; "></</span><span style="color: #800000; ">url</span><span style="color: #0000FF; ">></span><br /> <br />   <span style="color: #0000FF; "><</span><span style="color: #800000; ">properties</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">project</span><span style="color: #FF0000; ">.build.sourceEncoding</span><span style="color: #0000FF; ">></span>UTF-8<span style="color: #0000FF; "></</span><span style="color: #800000; ">project.build.sourceEncoding</span><span style="color: #0000FF; ">></span><br />   <span style="color: #0000FF; "></</span><span style="color: #800000; ">properties</span><span style="color: #0000FF; ">></span><br /> <br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependencies</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>junit<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>junit<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>3.8.1<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">scope</span><span style="color: #0000FF; ">></span>test<span style="color: #0000FF; "></</span><span style="color: #800000; ">scope</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>org.springframework.data<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>spring-data-hadoop<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>0.9.0.RELEASE<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>org.apache.hbase<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>hbase<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>0.94.1<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />         <br />         <span style="color: #008000; "><!--</span><span style="color: #008000; "> <dependency><br />             <groupId>org.apache.hbase</groupId><br />             <artifactId>hbase</artifactId><br />             <version>0.90.2</version><br />         </dependency> </span><span style="color: #008000; ">--></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>org.apache.hadoop<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>hadoop-core<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>1.0.3<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span>org.springframework<span style="color: #0000FF; "></</span><span style="color: #800000; ">groupId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span>spring-test<span style="color: #0000FF; "></</span><span style="color: #800000; ">artifactId</span><span style="color: #0000FF; ">></span><br />             <span style="color: #0000FF; "><</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span>3.0.5.RELEASE<span style="color: #0000FF; "></</span><span style="color: #800000; ">version</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependency</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">dependencies</span><span style="color: #0000FF; ">></span><br /> <span style="color: #0000FF; "></</span><span style="color: #800000; ">project</span><span style="color: #0000FF; ">></span></div> </li> <br /><br /> <li> <div>hbase-site.xml<br /> <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> --><span style="color: #0000FF; "><?</span><span style="color: #FF00FF; ">xml version="1.0"</span><span style="color: #0000FF; ">?></span><br /> <span style="color: #0000FF; "><?</span><span style="color: #FF00FF; ">xml-stylesheet type="text/xsl" href="configuration.xsl"</span><span style="color: #0000FF; ">?></span><br /> <span style="color: #008000; "><!--</span><span style="color: #008000; "><br /> /**<br />  * Copyright 2010 The Apache Software Foundation<br />  *<br />  * Licensed to the Apache Software Foundation (ASF) under one<br />  * or more contributor license agreements.  See the NOTICE file<br />  * distributed with this work for additional information<br />  * regarding copyright ownership.  The ASF licenses this file<br />  * to you under the Apache License, Version 2.0 (the<br />  * "License"); you may not use this file except in compliance<br />  * with the License.  You may obtain a copy of the License at<br />  *<br />  *     http://www.apache.org/licenses/LICENSE-2.0<br />  *<br />  * Unless required by applicable law or agreed to in writing, software<br />  * distributed under the License is distributed on an "AS IS" BASIS,<br />  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.<br />  * See the License for the specific language governing permissions and<br />  * limitations under the License.<br />  */<br /> </span><span style="color: #008000; ">--></span><br /> <span style="color: #0000FF; "><</span><span style="color: #800000; ">configuration</span><span style="color: #0000FF; ">></span><br /> <br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>hbase.rootdir<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span>hdfs://ubuntu:9000/hbase<span style="color: #0000FF; "></</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br /> <br />     <span style="color: #008000; "><!--</span><span style="color: #008000; "> 在构造JOBӞ会新Z文gҎ准备所需文g?br />            如果q一D|写,则默认本地环境ؓLINUXQ将用LINUX命od施,在WINDOWS环境下会出错 </span><span style="color: #008000; ">--></span><br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>mapred.job.tracker<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span>ubuntu:9001<span style="color: #0000FF; "></</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />     <br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>hbase.cluster.distributed<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span>true<span style="color: #0000FF; "></</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />     <br />     <span style="color: #008000; "><!--</span><span style="color: #008000; "> 此处会向ZOOKEEPER咨询JOB TRACKER的可用IP </span><span style="color: #008000; ">--></span><br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>hbase.zookeeper.quorum<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span>ubuntu<span style="color: #0000FF; "></</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "><</span><span style="color: #800000; ">property </span><span style="color: #FF0000; ">skipInDoc</span><span style="color: #0000FF; ">="true"</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span>hbase.defaults.for.version<span style="color: #0000FF; "></</span><span style="color: #800000; ">name</span><span style="color: #0000FF; ">></span><br />         <span style="color: #0000FF; "><</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span>0.94.1<span style="color: #0000FF; "></</span><span style="color: #800000; ">value</span><span style="color: #0000FF; ">></span><br />     <span style="color: #0000FF; "></</span><span style="color: #800000; ">property</span><span style="color: #0000FF; ">></span><br /> <br /> <span style="color: #0000FF; "></</span><span style="color: #800000; ">configuration</span><span style="color: #0000FF; ">></span></div> </div> </li> <br /><br /> <li>试文gQMapreduceTest.java<br /> <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> --><span style="color: #0000FF; ">package</span> com.cloudputing.mapreduce;<br /> <br /> <span style="color: #0000FF; ">import</span> java.io.IOException;<br /> <br /> <span style="color: #0000FF; ">import</span> junit.framework.TestCase;<br /> <br /> <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">class</span> MapreduceTest <span style="color: #0000FF; ">extends</span> TestCase{<br />     <br />     <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">void</span> testReadJob() <span style="color: #0000FF; ">throws</span> IOException, InterruptedException, ClassNotFoundException<br />     {<br />         MapreduceRead.read();<br />     }<br /> <br /> }</div> </li> <br /><br /> <li> <div>MapreduceRead.java</div> <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /> <br /> Code highlighting produced by Actipro CodeHighlighter (freeware)<br /> http://www.CodeHighlighter.com/<br /> <br /> --><span style="color: #0000FF; ">package</span> com.cloudputing.mapreduce;<br /> <br /> <span style="color: #0000FF; ">import</span> java.io.IOException;<br /> <br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.conf.Configuration;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.fs.FileSystem;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.fs.Path;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.HBaseConfiguration;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.client.Result;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.client.Scan;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.io.ImmutableBytesWritable;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.mapreduce.TableMapper;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.hbase.util.Bytes;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.io.Text;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.mapreduce.Job;<br /> <span style="color: #0000FF; ">import</span> org.apache.hadoop.mapreduce.lib.output.NullOutputFormat;<br /> <br /> <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">class</span> MapreduceRead {<br />     <br />     <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">static</span> <span style="color: #0000FF; ">void</span> read() <span style="color: #0000FF; ">throws</span> IOException, InterruptedException, ClassNotFoundException<br />     {<br />         <span style="color: #008000; ">//</span><span style="color: #008000; "> Add these statements. XXX<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        File jarFile = EJob.createTempJar("target/classes");<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        EJob.addClasspath("D:/PAUL/WORK/WORK-SPACES/TEST1/cloudputing/src/main/resources");<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        ClassLoader classLoader = EJob.getClassLoader();<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        Thread.currentThread().setContextClassLoader(classLoader);</span><span style="color: #008000; "><br /> </span><br />         Configuration config = HBaseConfiguration.create();<br />         addTmpJar("file:/D:/PAUL/WORK/WORK-SPACES/TEST1/cloudputing/target/bigdata-1.0.jar",config);<br />         <br />         Job job = <span style="color: #0000FF; ">new</span> Job(config, "ExampleRead");<br />         <span style="color: #008000; ">//</span><span style="color: #008000; "> And add this statement. XXX<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        ((JobConf) job.getConfiguration()).setJar(jarFile.toString());<br /> <br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        TableMapReduceUtil.addDependencyJars(job);<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        TableMapReduceUtil.addDependencyJars(job.getConfiguration(),<br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">                MapreduceRead.class,MyMapper.class);</span><span style="color: #008000; "><br /> </span>        <br />         job.setJarByClass(MapreduceRead.<span style="color: #0000FF; ">class</span>);     <span style="color: #008000; ">//</span><span style="color: #008000; "> class that contains mapper</span><span style="color: #008000; "><br /> </span>        <br />         Scan scan = <span style="color: #0000FF; ">new</span> Scan();<br />         scan.setCaching(500);        <span style="color: #008000; ">//</span><span style="color: #008000; "> 1 is the default in Scan, which will be bad for MapReduce jobs</span><span style="color: #008000; "><br /> </span>        scan.setCacheBlocks(<span style="color: #0000FF; ">false</span>);  <span style="color: #008000; ">//</span><span style="color: #008000; "> don't set to true for MR jobs<br />         </span><span style="color: #008000; ">//</span><span style="color: #008000; "> set other scan attrs</span><span style="color: #008000; "><br /> </span>        <br />         TableMapReduceUtil.initTableMapperJob(<br />                 "wiki",        <span style="color: #008000; ">//</span><span style="color: #008000; "> input HBase table name</span><span style="color: #008000; "><br /> </span>                scan,             <span style="color: #008000; ">//</span><span style="color: #008000; "> Scan instance to control CF and attribute selection</span><span style="color: #008000; "><br /> </span>                MapreduceRead.MyMapper.<span style="color: #0000FF; ">class</span>,   <span style="color: #008000; ">//</span><span style="color: #008000; "> mapper</span><span style="color: #008000; "><br /> </span>                <span style="color: #0000FF; ">null</span>,             <span style="color: #008000; ">//</span><span style="color: #008000; "> mapper output key </span><span style="color: #008000; "><br /> </span>                <span style="color: #0000FF; ">null</span>,             <span style="color: #008000; ">//</span><span style="color: #008000; "> mapper output value</span><span style="color: #008000; "><br /> </span>                job);<br />         job.setOutputFormatClass(NullOutputFormat.<span style="color: #0000FF; ">class</span>);   <span style="color: #008000; ">//</span><span style="color: #008000; "> because we aren't emitting anything from mapper<br />         <br /> </span><span style="color: #008000; ">//</span><span style="color: #008000; ">        DistributedCache.addFileToClassPath(new Path("hdfs:</span><span style="color: #008000; ">//</span><span style="color: #008000; ">node.tracker1:9000/user/root/lib/stat-analysis-mapred-1.0-SNAPSHOT.jar"),job.getConfiguration());</span><span style="color: #008000; "><br /> </span>        <br />         <span style="color: #0000FF; ">boolean</span> b = job.waitForCompletion(<span style="color: #0000FF; ">true</span>);<br />         <span style="color: #0000FF; ">if</span> (!b) {<br />             <span style="color: #0000FF; ">throw</span> <span style="color: #0000FF; ">new</span> IOException("error with job!");<br />         }<br />         <br />     }<br />     <br />     <span style="color: #008000; ">/**</span><span style="color: #008000; "><br />      * 为MapreducedW三方jar?br />      * <br />      * </span><span style="color: #808080; ">@param</span><span style="color: #008000; "> jarPath<br />      *            举例QD:/Java/new_java_workspace/scm/lib/guava-r08.jar<br />      * </span><span style="color: #808080; ">@param</span><span style="color: #008000; "> conf<br />      * </span><span style="color: #808080; ">@throws</span><span style="color: #008000; "> IOException<br />      </span><span style="color: #008000; ">*/</span><br />     <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">static</span> <span style="color: #0000FF; ">void</span> addTmpJar(String jarPath, Configuration conf) <span style="color: #0000FF; ">throws</span> IOException {<br />         System.setProperty("path.separator", ":");<br />         FileSystem fs = FileSystem.getLocal(conf);<br />         String newJarPath = <span style="color: #0000FF; ">new</span> Path(jarPath).makeQualified(fs).toString();<br />         String tmpjars = conf.get("tmpjars");<br />         <span style="color: #0000FF; ">if</span> (tmpjars == <span style="color: #0000FF; ">null</span> || tmpjars.length() == 0) {<br />             conf.set("tmpjars", newJarPath);<br />         } <span style="color: #0000FF; ">else</span> {<br />             conf.set("tmpjars", tmpjars + ":" + newJarPath);<br />         }<br />     }<br />     <br />     <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">static</span> <span style="color: #0000FF; ">class</span> MyMapper <span style="color: #0000FF; ">extends</span> TableMapper<Text, Text> {<br /> <br />         <span style="color: #0000FF; ">public</span> <span style="color: #0000FF; ">void</span> map(ImmutableBytesWritable row, Result value,<br />                 Context context) <span style="color: #0000FF; ">throws</span> InterruptedException, IOException {<br />             String val1 = getValue(value.getValue(Bytes.toBytes("text"), Bytes.toBytes("qual1")));<br />             String val2 = getValue(value.getValue(Bytes.toBytes("text"), Bytes.toBytes("qual2")));<br />             System.out.println(val1 + " -- " + val2);<br />         }<br />         <br />         <span style="color: #0000FF; ">private</span> String getValue(<span style="color: #0000FF; ">byte</span> [] value)<br />         {<br />             <span style="color: #0000FF; ">return</span> value == <span style="color: #0000FF; ">null</span>? "null" : <span style="color: #0000FF; ">new</span> String(value);<br />         }<br />     } <br /> <br /> }</div> </li> </ol><img src ="http://www.aygfsteel.com/paulwong/aggbug/394851.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.aygfsteel.com/paulwong/" target="_blank">paulwong</a> 2013-01-29 00:19 <a href="http://www.aygfsteel.com/paulwong/archive/2013/01/29/394851.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>某hadoop视频教程内容http://www.aygfsteel.com/paulwong/archive/2013/01/05/393807.htmlpaulwongpaulwongSat, 05 Jan 2013 04:59:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/01/05/393807.html > Hadoop背景
    > HDFS设计目标
    > HDFS不适合的场?
    > HDFS架构详尽分析
    > MapReduce的基本原?



    W?章节
    > Hadoop的版本介l?
    > 安装单机版Hadoop
    > 安装Hadoop集群



    W?章节
    > HDFS命o行基本操?
    > Namenode的工作机?
    > HDFS基本配置理



    W?章节
    > HDFS应用实战Q图片服务器(1) - pȝ设计
    > 应用的环境搭?php + bootstrap + java
    > 使用Hadoop Java API实现向HDFS写入文g



    W?章节
    > HDFS应用实战Q图片服务器(2)
    > 使用Hadoop Java API实现dHDFS中的文g
    > 使用Hadoop Java API实现获取HDFS目录列表
    > 使用Hadoop Java API实现删除HDFS中的文g


    W?章节
    > MapReduce的基本原?
    > MapReduce的运行过E?
    > 搭徏MapReduce的java开发环?
    > 使用MapReduce的java接口实现WordCount



    W?章节
    > WordCountq算q程分析
    > MapReduce的combiner
    > 使用MapReduce实现数据去重
    > 使用MapReduce实现数据排序
    > 使用MapReduce实现数据q_成W计算



    W?章节
    > HBase详细介绍
    > HBase的系l架?
    > HBase的表l构QRowKeyQ列族和旉?
    > HBase中的MasterQRegion以及Region Server


    W?章节
    > 使用HBase实现微博应用Q?Q?
    > 用户注册Q登陆和注销的设?
    > 搭徏环境 struts2 + jsp + bootstrap + jquery + HBase Java API
    > HBase和用L关的表结构设?
    > 用户注册的实?



    W?0章节
    > 使用HBase实现微博应用Q?Q?
    > 使用session实现用户d和注销
    > “x"功能的设?
    > “x"功能的表l构设计
    > “x"功能的实?


    W?1章节
    > 使用HBase实现微博应用Q?Q?
    > “发微?功能的设?
    > “发微?功能的表l构设计
    > “发微?功能的实?
    > 展现整个应用的运?



    W?2章节
    > HBase与MapReduce介绍
    > HBase如何使用MapReduce



    W?3章节

    > HBase应用实战Q话单查询与l计Q?Q?
    > 应用的整体设?
    > 开发环境搭?
    > 表结构设?



    W?4章节
    > HBase应用实战Q话单查询与l计Q?Q?
    > 话单入库单设计与实现
    > 话单查询的设计与实现



    W?5章节
    > HBase应用实战Q话单查询与l计Q?Q?
    > l计功能设计
    > l计功能实现



    W?6章节
    > 深入MapReduceQ?Q?
    > split的实现详?
    > 自定义输入的实现
    > 实例讲解



    W?7章节
    > 深入MapReduceQ?Q?
    > Reduce的partition
    > 实例讲解



    W?8章节
    > Hive入门
    > 安装Hive
    > 使用Hive向HDFS存入l构化数?
    > Hive的基本?


    W?9章节
    > 使用MySql作ؓHive的元数据?
    > Hivel合MapReduce



    W?0章节
    > Hive应用实战:数据l计Q?Q?
    > 应用设计Q表l构设计



    W?1章节
    > Hive应用实战Q数据统计(2Q?
    > 数据录入与统计的实现 

    paulwong 2013-01-05 12:59 发表评论
    ]]>
    HBase的一些应用设计tiphttp://www.aygfsteel.com/paulwong/archive/2013/01/02/393701.htmlpaulwongpaulwongWed, 02 Jan 2013 15:09:00 GMThttp://www.aygfsteel.com/paulwong/archive/2013/01/02/393701.htmlhttp://www.aygfsteel.com/paulwong/comments/393701.htmlhttp://www.aygfsteel.com/paulwong/archive/2013/01/02/393701.html#Feedback0http://www.aygfsteel.com/paulwong/comments/commentRss/393701.htmlhttp://www.aygfsteel.com/paulwong/services/trackbacks/393701.html0XJJ{2%~G~[G]JBPMW}YE~A 

    2Q从逻辑存储l构到实际的物理存储l构要经历一个foldq程Q所有的columnFamily下的内容被有序的合ƈQ因为HBase把一个ColumnFamily存储Z个StoreFile?

    3Q把HBase的查询等价ؓ一个逐层qo的行为,那么在设计存储时应该明白,使设计越向单一的keyvalue性能会越好;如果是因为复杂的业务逻辑D查询需要确定rowkey、column、timestampQ甚x夸张的是用到了HBase的Filter在server端做value的处理,那么整个性能会非怽?nbsp;

    4Q因此在表结构设计时QHBase里有tall narrow和flat wide两种设计模式Q前者行多列,整个表结构高且窄Q后者行列多,表结构^且宽Q但是由于HBase只能在行的边界做splitQ因此如果选择flat wide的结构,那么在特D行变的大(过file或region的上限)Ӟ那么q种行ؓ会导致compactionQ而这样做是要把rowd存的~~因此Q强烈推荐用tall narrow模式设计表结构,q样l构更趋q于keyvalueQ性能更好?nbsp;

    5Q一U优雅的行设计叫做partial row scanQ我们一般rowkey会设计ؓ<key1>-<key2>-<key3>...Q每个key都是查询条gQ中间用某种分隔W分开Q对于只xkey1的所有这L情况Q在不用filter的情况下Q更高性能Q,我们可以为每个key讑֮一个v始和l束的|比如key1作ؓ开始,key1+1作ؓl束Q这样scan的时候可以通过讑֮start row和stop rowp查到所有的key1的valueQ同理P代,每个子key都可以这栯设计到rowkey中?nbsp;

    6Q对于分|询,推荐的设计方式也不是利用filterQ而是在scan中通过offset和limit的设定来模拟cMRDBMS的分c具体过E就是首先定位start rowQ接着跌offset行,dlimit行,最后关闭scanQ整个流E结束?nbsp;

    7Q对于带有时间范围的查询Q一U设计是把时间放C个key的位|,q样设计有个弊端是查询时一定要先知道查询哪个维度的旉范围|而不能直接通过旉查询所有维度的|另一U设计是把timestamp攑ֈ前面Q同时利用hashcode或者MD5q样的Ş式将其打散,q样对于实时的时序数据,因ؓ其打散D自动分到其他region可以提供更好的ƈ发写优势?nbsp;

    8Q对于读写的qQ下面这张图更好的说明了key的设计:salting{h于hashQpromoted{h于在key中加入其他维度,而random是MDq样的Ş式了?/div>
      VN{YX`@[2P9AQ[@(2U8N9{0

    9Q还有一U高U的设计方式是利用column来当做RDBMScM二索引的应用设计,rowkey的存储达C定程度后Q利用column的有序,完成cM索引的设计,比如Q一个CF叫做data存放数据本nQColumnQualifier是一个MD5形式的indexQ而value是实际的数据Q再Z个CF叫做index存储刚才的MD5Q这个index的CF的ColumnQualifier是真正的索引字段Q比如名字或者Q意的表字D,q样可以允许多个Q,而value是这个烦引字D늚MD5。每ơ查询时可以先在index里找到这个烦引(查询条g不同Q选择的烦引字D不同)Q然后利用这个烦引到data里找到数据,两次查询实现真正的复杂条件业务查询?/div>

    10Q实CU烦引还有其他途径Q?/div>
    比如Q?/div>
    1Q客L控制Q即一ơ读取将所有数据取回,在客L做各U过滤操作,优点自然是控制力比较强,但是~点在性能和一致性的保证上;
    2QIndexed-Transactional HBaseQ这是个开源项目,扩展了HBaseQ在客户端和服务端加入了扩展实现了事务和二索引Q?/div>
    3QIndexed-HBaseQ?/div>
    4QCoprocessor?nbsp;

    11QHBase集成搜烦的方式有多种Q?Q客L控制Q同上;2QLuceneQ?QHBaseneQ?QCoprocessor?nbsp;

    12QHBase集成事务的方式:1QITHBaseQ?QZooKeeperQ通过分布式锁?nbsp;

    13Qtimestamp虽然叫这个名字,但是完全可以存放M内容来Ş成用戯定义的版本信息?


    paulwong 2013-01-02 23:09 发表评论
    ]]> վ֩ģ壺 ˲| ɳ| | ±| ʯ¥| ɽ| | | Ϸ| ָ| | | Т| | | | ɳ| ҵ| ׶| | | | | | ɽ| | | ߰| ɽʡ| | | | ٤ʦ| ר| | | | | ɽ| | ΢ɽ|