| |||||||||
日 | 一 | 二 | 三 | 四 | 五 | 六 | |||
---|---|---|---|---|---|---|---|---|---|
25 | 26 | 27 | 28 | 29 | 30 | 31 | |||
1 | 2 | 3 | 4 | 5 | 6 | 7 | |||
8 | 9 | 10 | 11 | 12 | 13 | 14 | |||
15 | 16 | 17 | 18 | 19 | 20 | 21 | |||
22 | 23 | 24 | 25 | 26 | 27 | 28 | |||
29 | 30 | 1 | 2 | 3 | 4 | 5 |
Best Paper:
Tolerating File-System Mistakes with EnvyFS
Lakshmi N. Bairavasundaram, NetApp., Inc.; Swaminathan Sundararaman, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison
NSDI ‘09
Best Paper:
TrInc: Small Trusted Hardware for Large Distributed Systems
Dave Levin, University of Maryland; John R. Douceur, Jacob R. Lorch, and Thomas Moscibroda, Microsoft Research
Best Paper:
Sora: High Performance Software Radio Using General Purpose Multi-core Processors
Kun Tan and Jiansong Zhang, Microsoft Research Asia; Ji Fang, Beijing Jiaotong University; He Liu, Yusheng Ye, and Shen Wang, Tsinghua University; Yongguang Zhang, Haitao Wu, and Wei Wang, Microsoft Research Asia; Geoffrey M. Voelker, University of California, San Diego
FAST ‘09
Best Paper:
CA-NFS: A Congestion-Aware Network File System
Alexandros Batsakis, NetApp and
OSDI ‘08
Jay Lepreau Best Paper:
Difference Engine: Harnessing Memory Redundancy in Virtual Machines
Diwaker Gupta, University of California, San Diego; Sangmin Lee, University of Texas at Austin; Michael Vrable, Stefan Savage, Alex C. Snoeren, George Varghese, Geoffrey M. Voelker, and Amin Vahdat, University of California, San Diego
Jay Lepreau Best Paper:
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
Yuan Yu, Michael Isard, Dennis Fetterly, and Mihai Budiu, Microsoft Research Silicon Valley; Úlfar Erlingsson, Reykjavík University, Iceland, and Microsoft Research Silicon Valley; Pradeep Kumar Gunda and Jon Currey, Microsoft Research Silicon Valley
Jay Lepreau Best Paper:
KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs
Cristian Cadar, Daniel Dunbar, and
LISA ‘08
Best Paper:
ENAVis: Enterprise Network Activities Visualization
Qi Liao, Andrew Blaich, Aaron Striegel, and Douglas Thain, University of Notre Dame
Best Student Paper:
Automatic Software Fault Diagnosis by Exploiting Application Signatures
Xiaoning Ding, The Ohio State University; Hai Huang, Yaoping Ruan, and Anees Shaikh, IBM T.J. Watson Research Center; Xiaodong Zhang, The Ohio State University
USENIX Security ‘08
Best Paper:
Highly Predictive Blacklisting
Jian Zhang and Phillip Porras, SRI International; Johannes Ullrich, SANS Institute
Best Student Paper:
Lest We Remember: Cold Boot Attacks on Encryption Keys
J. Alex Halderman, Princeton University; Seth D. Schoen, Electronic Frontier Foundation; Nadia Heninger and William Clarkson, Princeton University; William Paul, Wind River Systems; Joseph A. Calandrino and Ariel J. Feldman, Princeton University; Jacob Appelbaum; Edward W. Felten, Princeton University
USENIX ‘08
Best Paper:
Decoupling Dynamic Program Analysis from Execution in Virtual Environments
Jim Chow, Tal Garfinkel, and Peter M. Chen, VMware
Best Student Paper:
Vx32: Lightweight User-level Sandboxing on the x86
Bryan Ford and Russ Cox, Massachusetts Institute of Technology
NSDI ‘08
Best Paper:
Remus: High Availability via Asynchronous Virtual Machine Replication
Brendan Cully, Geoffrey Lefebvre, Dutch Meyer, Mike Feeley, and Norm Hutchinson, University of British Columbia; Andrew Warfield, University of British Columbia and Citrix Systems, Inc.
Best Paper:
Consensus Routing: The Internet as a Distributed System
John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson, University of Washington; Arun Venkataramani, University of Massachusetts Amherst
LEET ‘08
Best Paper:
Designing and Implementing Malicious Hardware (PDF) or read in HTML
Samuel T. King, Joseph Tucek, Anthony Cozzie, Chris Grier, Weihang Jiang, and Yuanyuan Zhou, University of Illinois at Urbana-Champaign
FAST ‘08
Best Paper:
Portably Solving File TOCTTOU Races with Hardness Amplification
Dan Tsafrir, IBM T.J. Watson Research Center; Tomer Hertz, Microsoft Research; David Wagner, University of California, Berkeley; Dilma Da Silva, IBM T.J. Watson Research Center
LISA ‘07
Best Paper:
Application Buffer-Cache Management for Performance: Running the World’s Largest MRTG
David Plonka, Archit Gupta, and Dale Carder, University of Wisconsin Madison
Best Paper:
PoDIM: A Language for High-Level Configuration Management
Thomas Delaet and Wouter Joosen, Katholieke Universiteit Leuven, Belgium
16th USENIX Security Symposium
Best Paper:
Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation
David Brumley, Juan Caballero, Zhenkai Liang, James Newsome, and Dawn Song, Carnegie Mellon University
Best Student Paper:
Keep Your Enemies Close: Distance Bounding Against Smartcard Relay Attacks
Saar Drimer and Steven J. Murdoch, Computer Laboratory, University of Cambridge
USENIX ‘07
Best Paper:
Hyperion: High Volume Stream Archival for Retrospective Querying
Peter Desnoyers and Prashant Shenoy, University of Massachusetts Amherst
Best Paper:
SafeStore: A Durable and Practical Storage System
Ramakrishna Kotla, Lorenzo Alvisi, and Mike Dahlin, The University of Texas at Austin
NSDI ‘07
Best Paper:
Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code
Charles Killian, James W. Anderson, Ranjit Jhala, and Amin Vahdat, University of California, San Diego
Best Student Paper:
Do Incentives Build Robustness in BitTorrent?
Michael Piatek, Tomas Isdal, Thomas Anderson, and Arvind Krishnamurthy, University of Washington; Arun Venkataramani, University of Massachusetts Amherst
FAST ‘07
Best Paper:
Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?
Bianca Schroeder and Garth A. Gibson, Carnegie Mellon University
Best Paper:
TFS: A Transparent File System for Contributory Storage
James Cipar, Mark D. Corner, and Emery D. Berger, University of Massachusetts Amherst
LISA ‘06
Best Paper:
A Platform for RFID Security and Privacy Administration
Melanie R. Rieback, Vrije Universiteit Amsterdam; Georgi N. Gaydadjiev, Delft University of Technology; Bruno Crispo, Rutger F.H. Hofman, and Andrew S. Tanenbaum, Vrije Universiteit Amsterdam
Honorable Mention:
A Forensic Analysis of a Distributed Two-Stage Web-Based Spam Attack
Daniel V. Klein, LoneWolf Systems
OSDI ‘06
Best Paper:
Rethink the Sync
Edmund B. Nightingale, Kaushik Veeraraghavan, Peter M. Chen, and Jason Flinn, University of Michigan
Best Paper:
Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Google, Inc.
15th USENIX Security Symposium
Best Paper:
Evaluating SFI for a CISC Architecture
Stephen McCamant, Massachusetts Institute of Technology; Greg Morrisett, Harvard University
Best Student Paper:
Keyboards and Covert Channels
Gaurav Shah, Andres Molina, and Matt Blaze, University of Pennsylvania
2006 USENIX Annual Technical Conference
Best Paper:
Optimizing Network Virtualization in Xen
Aravind Menon, EPFL; Alan L. Cox, Rice University; Willy Zwaenepoel, EPFL
Best Paper:
Replay Debugging for Distributed Applications
Dennis Geels, Gautam Altekar, Scott Shenker, and Ion Stoica, University of California, Berkeley
NSDI ‘06
Best Paper:
Experience with an Object Reputation System for Peer-to-Peer Filesharing
Kevin Walsh and Emin Gün Sirer, Cornell University
FAST ‘05
Best Paper:
Ursa Minor: Versatile Cluster-based Storage
Michael Abd-El-Malek, William V. Courtright II, Chuck Cranor, Gregory R. Ganger, James Hendricks, Andrew J. Klosterman, Michael Mesnier, Manish Prasad, Brandon Salmon, Raja R. Sambasivan, Shafeeq Sinnamohideen, John D. Strunk, Eno Thereska, Matthew Wachs, and Jay J. Wylie, Carnegie Mellon University
Best Paper:
On Multidimensional Data and Modern Disks
Steven W. Schlosser, Intel Research Pittsburgh; Jiri Schindler, EMC Corporation; Stratos Papadomanolakis, Minglong Shao, Anastassia Ailamaki, Christos Faloutsos, and Gregory R. Ganger, Carnegie Mellon University
LISA ‘05
Best Paper:
Toward a Cost Model for System Administration
Alva L. Couch, Ning Wu, and Hengky Susanto, Tufts University
Best Student Paper:
Toward an Automated Vulnerability Comparison of Open Source IMAP Servers
Chaos Golubitsky, Carnegie Mellon University
Best Student Paper:
Reducing Downtime Due to System Maintenance and Upgrades
Shaya Potter and Jason Nieh, Columbia University
IMC 2005
Best Student Paper:
Measurement-based Characterization of a Collection of On-line Games
Chris Chambers and Wu-chang Feng, Portland State University; Sambit Sahu and Debanjan Saha, IBM Research
Security ‘05
Best Paper:
Mapping Internet Sensors with Probe Response Attacks
John Bethencourt, Jason Franklin, and Mary Vernon University of Wisconsin, Madison
Best Student Paper:
Security Analysis of a Cryptographically-Enabled RFID Device
Steve Bono, Matthew Green, and Adam Stubblefield, Johns Hopkins University; Ari Juels, RSA Laboratories; Avi Rubin, Johns Hopkins University; Michael Szydlo, RSA Laboratories
MobiSys ‘05
Best Paper:
Reincarnating PCs with Portable SoulPads
Ramón Cáceres, Casey Carter, Chandra Narayanaswami, and Mandayam Raghunath, IBM T.J. Watson Research Center
NSDI ‘05
Best Paper:
Detecting BGP Configuration Faults with Static Analysis
Nick Feamster and Hari Balakrishnan, MIT Computer Science and Artificial Intelligence Laboratory
Best Student Paper:
Botz-4-Sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds
Srikanth Kandula and Dina Katabi, Massachusetts Institute of Technology; Matthias Jacob, Princeton University; Arthur Berger, Massachusetts Institute of Technology/Akamai
2005 USENIX Annual Technical Conference
General Track
Best Paper:
Debugging Operating Systems with Time-Traveling Virtual Machines
Samuel T. King, George W. Dunlap, and Peter M. Chen, University of Michigan
Best Student Paper:
Itanium—A System Implementor’s Tale
Charles Gray, University of New South Wales; Matthew Chapman and Peter Chubb, University of New South Wales and National ICT Australia; David Mosberger-Tang, Hewlett-Packard Labs; Gernot Heiser, University of New South Wales and National ICT Australia
At Google, we've learned through experience to treat everything with healthy skepticism. We expect that servers, racks, shared GFS cells, and even entire datacenters will occasionally go down, sometimes with little or no warning. This has led us to try as hard as possible to design our products to run on multiple servers, multiple cells, and even multiple datacenters simultaneously, so that they keep running even if any one (or more) redundant underlying parts go down. We call this multihoming. It's a term that usually applies narrowly, to networking alone, but we use it much more broadly in our internal language.
Multihoming is straightforward for read-only products like web search, but it's more difficult for products that allow users to read and write data in real time, like GMail, Google Calendar, and App Engine. I've personally spent a while thinking about how multihoming applies to the App Engine datastore. I even gave a talk about it at this year's Google I/O.
While I've got you captive, I'll describe how multihoming currently works in App Engine, and how we're going to improve it with a release next week. I'll wrap things up with more detail about App Engine's maintenance schedule.
When we launched App Engine, the datastore served each application's data out of one datacenter at a time. Data was replicated to other datacenters in the background, using
For example, if the datastore was serving data for some apps from datacenter A, and we needed to switch to serving their data from datacenter B, we simply flipped the datastore to read only mode, waited for Bigtable replication to flush any remaining writes from A to B, then flipped the switch back and started serving in read/write mode from B. This generally works well, but it depends on the Bigtable cells in both A and B to be healthy. Of course, we wouldn't want to move to B if it was unhealthy, but we definitely would if B was healthy but A wasn't. Google continuously monitors the overall health of App Engine's underlying services, like GFS and Bigtable, in all of our datacenters. However, unexpected problems can crop up from time to time. When that happens, having backup options available is crucial. You may remember the unplanned outage we had a few months ago. We published a detailed postmortem; in a nutshell, the shared GFS cell we use went down hard, which took us down as well, and it took a while to get the GFS cell back up. The GFS cell is just one example of the extent to which we use shared infrastructure at Google. It's one of our greatest strengths, in my opinion, but it has its drawbacks. One of the most noticeable drawback is loss of isolation. When a piece of shared infrastructure has problems or goes down, it affects everything that uses it. In the example above, if the Bigtable cell in A is unhealthy, we're in trouble. Bigtable replication is fast, but it runs in the background, so it's usually at least a little behind, which is why we wait for that final flush before switching to B. If A is unhealthy, some of its data may be unavailable for extended periods of time. We can't get to it, so we can't flush it, we can't switch to B, and we're stuck in A until its Bigtable cell recovers enough to let us finish the flush. In extreme cases like this, we might not know how soon the data in A will become available. Rather than waiting indefinitely for A to recover, we'd like to have the option to cut our losses and serve out of B instead of A, even if it means a small, bounded amount of disruption to application data. Following our example, that extreme recovery scenario would go something like this: Naturally, when A comes back online, we can recover that unreplicated data, but if we've already started serving from B, we can't automatically copy it over from A, since there may have been conflicting writes in B to the same entities. If your app had unreplicated writes, we can at least provide you with a full dump of those writes from A, so that your data isn't lost forever. We can also provide you with tools to relatively easily apply those unreplicated writes to your current datastore serving out of B. Unfortunately, Bigtable replication on its own isn't quite enough for us to implement the extreme recovery scenario above. We use Bigtable single-row transactions, which let us do read/modify/write operations on multiple columns in a row, to make our datastore writes transactional and consistent. Unfortunately, Bigtable replication operates at the column value level, not the row level. This means that after a Bigtable transaction in A that updates two columns, one of the new column values could be replicated to B but not the other. If this happened, and we switched to B without flushing the other column value, the datastore would be internally inconsistent and difficult to recover to a consistent state without the data in A. In our July 2nd outage, it was partly this expectation of internal inconsistency that prevented us from switching to datacenter B when A became unhealthy. Thankfully, there's a solution to our consistency problem: Megastore replication. Megastore is an internal library on top of Bigtable that supports declarative schemas, multi-row transactions, secondary indices, and recently, consistent replication across datacenters. The App Engine datastore uses Megastore liberally. We don't need all of its features - declarative schemas, for example - but we've been following the consistent replication feature closely during its development. Megastore replication is similar to Bigtable replication in that it replicates data across multiple datacenters, but it replicates at the level of entire entity group transactions, not individual Bigtable column values. Furthermore, transactions on a given entity group are always replicated in order. This means that if Bigtable in datacenter A becomes unhealthy, and we must take the extreme option to switch to B before all of the data in A has flushed, B will be consistent and usable. Some writes may be stuck in A and unavailable in B, but B will always be a consistent recent snapshot of the data in A. Some scattered entity groups may be stale, ie they may not reflect the most recent updates, but we'd at least be able to start serving from B immediately, as opposed waiting for A to recover. Megastore replication was originally intended to replicate across multiple datacenters synchronously and atomically, using Paxos. Unfortunately, as I described in my Google I/O talk, the latency of Paxos across datacenters is simply too high for a low-level, developer facing storage system like the App Engine datastore. Due to that, we've been working with the Megastore team on an alternative: asynchronous, background replication similar to Bigtable's. This system maintains the write latency our developers expect, since it doesn't replicate synchronously (with Paxos or otherwise), but it's still consistent and fast enough that we can switch datacenters at a moment's notice with a minimum of unreplicated data. We've had a fully functional version of asynchronous Megastore replication for a while. We've been testing it heavily, working out the kinks, and stressing it to make sure it's robust as possible. We've also been using it in our internal version of App Engine for a couple months. I'm excited to announce that we'll be migrating the public App Engine datastore to use it in a couple weeks, on September 22nd. This migration does require some datastore downtime. First, we'll switch the datastore to read only mode for a short period, probably around 20-30 minutes, while we do our normal data replication flush, and roll forward any transactions that have been committed but not fully applied. Then, since Megastore replication uses a new transaction log format, we need to take the entire datastore down while we drop and recreate our transaction log columns in Bigtable. We expect this to only take a few minutes. After that, we'll be back up and running on Megastore replication! As described, Megastore replication will make App Engine much more resilient to hiccoughs and outages in individual datacenters and significantly reduce the likelihood of extended outages. It also opens the door to two new options which will give developers more control over how their data is read and written. First, we're exploring allowing reads from the non-primary datastore if the primary datastore is taking too long to respond, which could decrease the likelihood of timeouts on read operations. Second, we're exploring full Paxos for write operations on an opt-in basis, guaranteeing data is always synchronously replicated across datacenters, which would increase availability at the cost of additional write latency. Both of these features are speculative right now, but we're looking forward to allowing developers to make the decisions that fit their applications best! Finally, a word about our maintenance schedule. App Engine's scheduled maintenance periods usually correspond to shifts in primary application serving between datacenters. Our maintenance periods usually last for about an hour, during which application serving is continuous, but access to the Datastore and memcache may be read-only or completely unavailable. We've recently developed better visibility into when we expect to shift datacenters. This information isn't perfect, but we've heard from many developers that they'd like more advance notice from App Engine about when these maintenance periods will occur. Therefore, we're happy to announce below the preliminary maintenance schedule for the rest of 2009. We don't expect this information to change, but if it does, we'll notify you (via the App Engine Downtime Notify Google Group) as soon as possible. The App Engine team members are personally dedicated to keeping your applications serving without interruption, and we realize that weekday maintenance periods aren't ideal for many. However, we've selected the day of the week and time of day for maintenance to balance disruption to App Engine developers with availability of the full engineering teams of the services App Engine relies upon, like GFS and Bigtable. In the coming months, we expect features like Megastore replication to help reduce the length of our maintenance periods. 閱讀全文
Planning for trouble
We give up on flushing the most recent writes in A that haven't replicated to B, and switch to serving the data that is in B. Thankfully, there isn't much data in A that hasn't replicated to B, because replication is usually quite fast. It depends on the nature of the failure, but the window of unreplicated data usually only includes a small fraction of apps, and is often as small as a few thousand recent puts, deletes, and transaction commits, across all affected apps.
Megastore replication saves the day!
To Paxos or not to Paxos
Onward and upward
Planning for scheduled maintenance
類別:默認分類 查看評論
文章來源:http://hi.baidu.com/knuthocean/blog/item/12bb9f3dea0e400abba1673c.html