justinchen

隨筆 - 41 文章 - 29 trackbacks - 0

<

2009年4月

>

日

一

二

三

四

五

六

29

30

31

1

2

3

4

5

6

7

8

9

10

13

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

1

2

3

4

5

6

7

8

9

常用鏈接

留言簿(5)

隨筆分類(28)

隨筆檔案(23)

收藏夾(6)

Inside JVM

HotSpot: A new breed of virtual machine
Explain the JVM techniques in plain English

Java

Performance of SOAP/HTTP vs. SOAP/JMS

java performance

Java Performance blog - My Photo Markus Kohler - SAP Architect

Solr

搜索

閱讀排行榜

This document is a summary or outline of Sun's document: Tuning Garbage collection with the 1.4.2 Hotspot JVM located here: http://java.sun.com/docs/hotspot/gc1.4.2/

1.0 Introduction

For many applications garbage collection performance is not significant
Default collector should be first choice

2.0 Generations

Most straightforward GC will just iterate over every object in the heap and determine if any other objects reference it.

This gets really slow as the number of objects in the heap increase

GC's therefor make assumptions about how your application runs.
Most common assumption is that an object is most likely to die shortly after it was created: called infant mortality
This assumes that an object that has been around for a while, will likely stay around for a while.
GC organizes objects into generations (young, tenured, and perm) This is important!

2.1 Performance Considerations

Ways to measure GC Performance

Throughput - % of time not spent in GC over a long period of time.
Pauses - app unresponsive because of GC
Footprint - overall memory a process takes to execute
Promptness - time between object death, and time when memory becomes available

There is no one right way to size generations, make the call based on your applications usage.

2.2 Measurement

Throughput and footprint are best measured using metrics particular to the application.
Command line argument -verbose:gc output
[GC 325407K->83000K(776768K), 0.2300771 secs]

GC - Indicates that it was a minor collection (young generation). If it had said Full GC then that indicates that it was a major collection (tenured generation).
325407K - The combined size of live objects before garbage collection.
83000K - The combined size of live objects after garbage collection.
(776768K) - the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces.
0.2300771 secs - time it took for garbage collection to occur.

You can get more detailed output using -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps

3 Sizing the Generations

The -Xmx value determines the size of the heap to reserve at JVM initialization.
The -Xms value is the space in memory that is committed to the VM at init. The JVM can grow to the size of -Xmx.
The difference between -Xmx and -Xms is virtual memory (virtually committed)

3.1 Total Heap

Total available memory is the most important factor affecting GC performance
By default the JVM grows or shrinks the heap at each GC to keep the ratio of free space to live objects at each collection within a specified range.

-XX:MinHeapFreeRatio - when the percentage of free space in a generation falls below this value the generation will be expanded to meet this percentage. Default is 40
-XX:MaxHeapFreeRatio - when the percentage of free space in a generation exceeded this value the generation will shrink to meet this value. Default is 70

For server applications

Unless you have problems with pauses grant as much memory as possible to the JVM
Set -Xms and -Xmx close to each other or equal for a faster startup (removes constant resizing of JVM). But if you make a poor choice the JVM can't compensate for it.
Increase memory sa you increase # of processors because memory allocation can be parallelized.

3.2 The Young Generation

The bigger the young generation the less minor GC's, but this implies a smaller tenured generation which increases the frequency of major collections.
You need to look at your application and determine how long your objects live for to tune this.
-XX:NewRatio=3 - the young generation will occupy 1/4 the overall heap
-XX:NewSize - Size of the young generation at JVM init. Calculated automatically if you specify -XX:NewRatio
-XX:MaxNewSize - The largest size the young generation can grow to (unlimited if this value is not specified at command line)

3.2.1 Young Generation Guarantee

The -XX:SurvivorRatio option can be used to tune the number of survivor spaces.
Not often important for performance

-XX:SurvivorRatio=6 - each survivor space will be 1/8 the young generation
If survivor spaces are too small copying collection overflows directly into the tenured generation.
Survivor spaces too large uselessly empty
-XX:+PrintTenuringDistribution - shows the threshold chosen by JVM to keep survivors half full, and the ages of objects in the new generation.

Server Applications

First decide the total amount of memory you can afford to give the virtual machine. Then graph your own performance metric against young generation sizes to find the best setting.
Unless you find problems with excessive major collection or pause times, grant plenty of memory to the young generation.
Increasing the young generation becomes counterproductive at half the total heap or less (whenever the young generation guarantee cannot be met).
Be sure to increase the young generation as you increase the number of processors, since allocation can be parallelized.

4 Types of Collectors

Everything to this point talks about the default garbage collector, there are other GC's you can use
Throughput Collector - Uses a parallel version of the young generation collector

-XX:+UseParallelGC
Tenured collector is the same as in default

Concurrent Low Pause Collector

Collects tenured collection concurrently with the execution of the app.
The app is paused for short periods during collection
-XX:+UseConcMarkSweepGC
To enable a parallel young generation GC with the concurrent GC add -XX:+UseParNewGC to the startup. Don't add -XX:+UseParallelGC with this option.

Incremental Low Pause Collector

Sometimes called Train Collector
Collects a portion of the tenured generation at each minor collection.
Tries to minimize large pause of major collections
Slower than the default collector when considering overall throughput
Good for client apps (my observation)
-Xincgc

Don't mix these options, JVM may not behave as expected.

4.1 When to use Throughput Collector

Large number of processors
Reduces serial execution time of app, by using multiple threads for GC
App with lots of threads allocating objects should use this with a large young generation
Server Applications (my observation)

4.2 The Throughput collector

By default the throughput collector uses the number of CPU's as its value for number of GC threads.
On a computer with one CPU it will not perform as well as the default collector
Overhead from parallel execution (synchronization costs)
With 2 CPU's the throughput collector performs as well as the default garbage collector.
With more then 2 CPU's you can expect to see a reduction in minor GC pause times
You can control the number of threads with -XX:ParallelGCThreads=n
Fragmentation can occur

Reduce GC threads
Increase Tenured Generation size

4.2.1 Adaptive Sizing

Keeps stats about GC times, allocation rates, and free space then sizes young and tenured generation to best fit the app.
J2SE 1.4.1 and later
-XX:+UseAdaptiveSizePolicy (on by default)

4.2.2 Aggressive Heap

Attempts to make maximum use of physical memory for the heap
Inspects computer resources (memory, num processors) and sets params optimal for long running memory allocation intensive jobs.
Must have at least 256MB of RAM
For lots of CPU's and RAM, but 1.4.1+ has shown improvements on 4-Way machines.
-XX:+AggressiveHeap

4.3 When to use the Concurrent Low Pause Collector

Apps that benefit from shorter GC pauses, and can share resources with GC during execution.
Apps with large sets of long living data (tenured generation)
Two or more processors
Interactive apps with modest tenured generation size, and one CPU

4.4 The Concurrent Low Pause Collector

Uses a separate GC thread to do parts of the major collection concurrently with the app threads.
Pauses App threads in the beginning of a collection and toward the middle (longer pause in middle)
The rest of the GC is in a single thread that runs at the same time as the app

4.4.1 Overhead of Concurrency

Doesn't provide much of an advantage on single processor machines.
Fragmentation can occur.
Two processor machine eliminates pauses due to the GC thread.
The more CPU's the advantages of concurrent collector increase.

4.4.2 Young Generation Guarantee

There has to be enough contiguous space available in the tenured generation for all objects in the eden and one survivor space.
A larger heap is needed compared to the default collector.
Add the size of the young generation to the tenured generation.

4.4.3 Full Collections

If the concurrent collector is unable to finish collecting the tenured generation before the tenured generation fills up, the application is paused and the collection is completed.
When this happens you should make some adjustments to your GC params

4.4.4 Floating Garbage

Floating Garbage - Objects that die while the GC is running (after they have been checked).
Increase the tenured generation by 20% to reduce floating garbage.

4.4.5 Pauses

First Pause - marks live objects - initial marking
Second Pause - remarking phase - checks objects that were missed during the concurrent marking phase due to the concurrent execution of the app threads.

4.4.6 Concurrent Phases

Concurrent Marking phase occurs between initial mark and remarking phase.
Concurrent sweeping phase collects dead objects after the remarking phase.

4.4.7 Measurements with the Concurrent Collector

Use -verbose:gc with -XX:+PrintGCDetails
vCMS-initial-mark shows GC stats for the initial marking phase
CMS-concurrent-mark - shows GC stats for concurrent marking phase.
CMS-concurrent-sweep - shows stats for concurrent sweeping phase
CMS-concurrent-preclean - stats for determining work that can be done concurrently
CMS-remark - stats for the remarking phase.
CMS-concurrent-reset - concurrent stuff is done, ready for next collection.

4.4.8 Parallel Minor Collection Options with Concurrent Collector

-XX:+UseParNewGC - for multiprocessor machines, enables multi threaded young generation collection.
-XX:+CMSParallelRemarkEnabled - reduce remark pauses

4.5 When to use the Incremental Low Pause Collector

Use when you can afford to tradeoff longer and more frequent young generation GC pauses for shorter tenured generation pauses
You have a large tenured generation
Single Processor

4.6 The Incremental Low Pause Collector

Minor collections same as default collector.
Don't use try to use parallel GC with this collector
Incrementally Collects parts of the tenured generation at each young collection.
Tries to avoid long major collections by doing small chunks each minor collection.
Can cause fragmentation of the heap. Sometimes need to increase tenured generation size compared to the default.
There is some overhead required to maintain the position of the incremental collector. Less overhead than is required by the default collector.
First try the default collector, and adjust heap sizing. If major pauses are too long try incremental.
If the incremental collector can't collect the tenured generation fast enough you will run out of memory, try reducing the young generation.
If young generation collections do not free any space, could be because of fragmentation. Increase tenured generation size.

4.6.1 Measurements with the Incremental Collector

-verbose:gc and -XX:+PrintGCDetails
Look for the Train: to see the stats for the incremental collection.

5 Other Considerations

The permanent generation may be a factor on apps that dynamically generate and load many classes (JSP, CFM application servers)
You may need to increase the MaxPermSize, eg: -XX:MaxPermSize=128m
Apps that rely on finalization (finalize method, or finally clauses) will cause lag in garbage collection. This is a bad idea, use only for errorious situations.
Explicit garbage collection calls (System.gc()) force a major collection. You can measure the effectiveness of these calls by disabling them with -XX:+DisableExplicitGC
RMI garbage collection intervals can be controlled with

-Dsun.rmi.dgc.client.gcInteraval=3600000
-Dsun.rmi.dgc.server.gcInterval=3600000

On Solaris 8+ you can enable libthreads, lightweight thread processes, these may increase thread performance.
To enable add /usr/lib/lwp to LD_LIBRARY_PATH
Soft References cleared less aggressively in server.
-XX:SoftRefLRUPolicyMSPerMB=10000
Default value is 1000, or one second per MB

6 Conclusion

GC can be bottleneck in your app.

More Information

FAQ: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
Concurrent collector research paper: http://research.sun.com/techrep/2000/abstract-88.html
HotSpot VM command line options: http://java.sun.com/docs/hotspot/VMOptions.html

posted on 2009-04-01 17:19 Justin Chen 閱讀(1070) 評論(0) 編輯收藏所屬分類: Inside JVM

新用戶注冊刷新評論列表


只有注冊用戶登錄后才能發表評論。




網站導航: 博客園 IT新聞 Chat2DB C++博客博問管理
相關文章: FYI - Tuning Garbage Collection Outline - From http://www.petefreitag.com/articles/gctuning/ [絕對原創] [Java內存管理精彩概述] JVM Memeory Management Summary - 0 of Series "Inside JVM" GC(Garbage collection) Log and Analysis - 2 of Series "Inside JVM" How to use Eclipse Memory Analyzer to analyze JVM Memeory Issue - 3 of Series "Inside JVM" An Overview on Common JVM Level Problems in Java SE Application - 1 of Series "Inside JVM"