paulwong

          Storm performance

          The configuration is used to tune various aspects of the running topology. The two configurations specified here are very common:

          1. TOPOLOGY_WORKERS (set with setNumWorkers) specifies how many processes you want allocated around the cluster to execute the topology. Each component in the topology will execute as many threads. The number of threads allocated to a given component is configured through the setBolt and setSpout methods. Those threadsexist within worker processes. Each worker process contains within it some number of threads for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within.
          2. TOPOLOGY_DEBUG (set with setDebug), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster.

          There's many other configurations you can set for the topology. The various configurations are detailed on the Javadoc for Config.


          Common configurations


          There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found here. The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:

          1. Config.TOPOLOGY_WORKERS: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.
          2. Config.TOPOLOGY_ACKERS: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them onGuaranteeing message processing.
          3. Config.TOPOLOGY_MAX_SPOUT_PENDING: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.
          4. Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. SeeGuaranteeing message processing for more information on how Storm's reliability model works.
          5. Config.TOPOLOGY_SERIALIZATIONS: You can register more serializers to Storm using this config so that you can use custom types within tuples.

          Reference:
          http://storm.incubator.apache.org/documentation/Running-topologies-on-a-production-cluster.html

          storm rebalance 命令調整topology并行數及問題分析
          http://blog.csdn.net/jmppok/article/details/17243857

          flume+kafka+storm+mysql 數據流
          http://blog.csdn.net/jmppok/article/details/17259145



          http://storm.incubator.apache.org/documentation/Tutorial.html

          posted on 2014-05-08 09:19 paulwong 閱讀(276) 評論(0)  編輯  收藏 所屬分類: STORM

          主站蜘蛛池模板: 高阳县| 盱眙县| 马龙县| 沧州市| 泸水县| 随州市| 安泽县| 东莞市| 温州市| 株洲县| 西宁市| 贡山| 彭泽县| 昭苏县| 涟水县| 阜宁县| 武隆县| 辰溪县| 华宁县| 平南县| 灵宝市| 衢州市| 兴城市| 屏东市| 姚安县| 中牟县| 阜城县| 湟中县| 鹤山市| 永年县| 尼木县| 宝山区| 华亭县| 分宜县| 沽源县| 蓝山县| 阿合奇县| 孟州市| 岑溪市| 马龙县| 卓尼县|