Our further evaluation on 38 CVEs from 10 commonly-used programs shows that SanRazor reduced checks suffice to detect at least 33 out of the 38 CVEs. Pages should be numbered, and figures and tables should be legible in black and white, without requiring magnification. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. USENIX Security '21 has three submission deadlines. As increasingly more sensitive data is being collected to gain valuable insights, the need to natively integrate privacy controls in data analytics frameworks is growing in importance. Machine learning (ML) models trained on personal data have been shown to leak information about users. In some cases, the quality of these artifacts is as important as that of the document itself. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users. If in doubt about whether your submission to OSDI 2021 and your upcoming submission to SOSP are the same paper or not, please contact the PC chairs by email. Academic and industrial participants present research and experience papers that cover the full range of theory and practice of computer . This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. (Jan 2019) Our REPT paper won a best paper at OSDI'18 (Oct 2018) I will serve in the SOSP'19 PC. Registering abstracts a week before paper submission is an essential part of the paper-reviewing process, as PC members use this time to identify which papers they are qualified to review. Submissions violating the detailed formatting and anonymization rules will not be considered for review. Marius is open-sourced at www.marius-project.org. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. Academic and industrial participants present research and experience papers that cover the full range of theory and practice of computer . We describe Fluffy, a multi-transaction differential fuzzer for finding consensus bugs in Ethereum. Dorylus is up to 3.8 faster and 10.7 cheaper compared to existing sampling-based systems. However, a plethora of recent data breaches show that even widely trusted service providers can be compromised. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. Instead, we propose addressing the root cause of the heuristics problem by allowing software to explicitly specify to the device if submitted requests are latency-sensitive. Academic and industrial participants present research and experience papers that cover the full range of theory . Jiachen Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Ding Ding, Department of Computer Science, New York University; Huan Wang, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Conrad Christensen, Department of Computer Science, New York University; Zhaoguo Wang and Haibo Chen, Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University; Shanghai AI Laboratory; Engineering Research Center for Domain-specific Operating Systems, Ministry of Education, China; Jinyang Li, Department of Computer Science, New York University. PLDI is a premier forum for programming language research, broadly construed, including design, implementation, theory, applications, and performance. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. However, the existing one-size-fits-all GNN implementations are insufficient to catch up with the evolving GNN architectures, the ever-increasing graph size, and the diverse node embedding dimensionality. The chairs may reject abstracts or papers on the basis of egregious missing or extraneous conflicts. A PC member is a conflict if any of the following three circumstances applies: Institution: You are currently employed at the same institution, have been previously employed at the same institution within the past two years (not counting concluded internships), or are going to begin employment at the same institution during the review period. To evaluate the security guarantees of Storm, we build a formally verified reference implementation using the Labeled IO (LIO) IFC framework. Submissions may include as many additional pages as needed for references but not for appendices. A graph neural network (GNN) enables deep learning on structured graph data. All the times listed below are in Pacific Daylight Time (PDT). As a result, data characteristics and device capabilities vary widely across clients. Manuela will present examples and discuss the scope of AI in her research in the finance domain. Even the little publishable OS work that is not based on Linux still assumes the same simplistic hardware model (essentially a multiprocessor VAX) that bears little resemblance to modern reality. Notification of conditional accept/reject for revisions: 3 March 2022. Further, Vegito can recover from cascading machine failures by using the columnar backup in less than 60 ms. JEL codes: Q18, Q28, Q57 . Message from the Program Co-Chairs. We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data movement, which results in poor resource utilization and inefficient training. The key insight guiding our design is computation separation. Consensus bugs are bugs that make Ethereum clients transition to incorrect blockchain states and fail to reach consensus with other clients. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. Welcome to the SOSP 2021 Website. NrOS replicates kernel state on each NUMA node and uses operation logs to maintain strong consistency between replicas. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. We propose PET, the first DNN framework that optimizes tensor programs with partially equivalent transformations and automated corrections. Perennial 2.0 makes this possible by introducing several techniques to formalize GoJournals specification and to manage the complexity in the proof of GoJournals implementation. He joined Intel Research at Berkeley in April 2002 as a principal architect of PlanetLab, an open, shared platform for developing and deploying planetary-scale services. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 14-16, 2021. Existing decentralized systems like Steemit, OpenBazaar, and the growing number of blockchain apps provide alternatives to existing services. Proceedings Front Matter We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. We demonstrate the above using design, implementation and evaluation of blk-switch, a new Linux kernel storage stack architecture. OSDI will provide an opportunity for authors to respond to reviews prior to final consideration of the papers at the program committee meeting. The conference papers and full proceedings are available to registered attendees now and will be available to everyone beginning Wednesday, July 14, 2021. OSDI brings together professionals from academic and industrial backgrounds in a premier forum for discussing the design, implementation, and implications of systems software. However, your OSDI submission must use an anonymized name for your project or system that differs from any used in such contexts. To resolve the problem, we propose a new LFS-aware ZNS interface, called ZNS+, and its implementation, where the host can offload data copy operations to the SSD to accelerate segment compaction. Main conference program: 5-8 April 2022. For general conference information, see https://www.usenix.org/conference/osdi22. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. We present the nanoPU, a new NIC-CPU co-design to accelerate an increasingly pervasive class of datacenter applications: those that utilize many small Remote Procedure Calls (RPCs) with very short (s-scale) processing times. blk-switch uses this insight to adapt techniques from the computer networking literature (e.g., multiple egress queues, prioritized processing of individual requests, load balancing, and switch scheduling) to the Linux kernel storage stack. Paper Submission Information All submissions must be received by 11:59 PM AoE (UTC-12) on the day of the corresponding deadline. We present TEMERAIRE, a hugepage-aware enhancement of TCMALLOC to reduce CPU overheads in the applications code. As a member of ACCT, I have served two years on the bylaws and governance committee and two years on the finance and audit committee. Ethereum is the second-largest blockchain platform next to Bitcoin. We present application studies for 8 applications, improving requests-per-second (RPS) by 7.7% and reducing RAM usage 2.4%. Authors may use this for content that may be of interest to some readers but is peripheral to the main technical contributions of the paper. OSDI is "a premier forum for discussing the design, implementation, and implications of systems software." A total of six research papers from the department were accepted to the . Sep 2021 - Present 1 year 7 months. For example, talks may be shorter than in prior years, or some parts of the conference may be multi-tracked. We present selective profiling, a technique that locates data locality problems with low-enough overhead that is suitable for production use. Calibrated interrupts increase throughput by up to 35%, reduce CPU consumption by as much as 30%, and achieve up to 37% lower latency when interrupts are coalesced. Radia Perlman is a Fellow at Dell Technologies. One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. Authors should email the program co-chairs, osdi21chairs@usenix.org, a copy of the related workshop paper and a short explanation of the new material in the conference paper beyond that published in the workshop version. She is the author of the textbook Interconnections (about network layers 2 and 3) and coauthor of Network Security. Existing algorithms are designed to work well for certain workloads. Proceedings Cover | Furthermore, such performance can be achieved without any modification in applications, network hardware, kernel CPU schedulers and/or kernel network stack. We particularly encourage contributions containing highly original ideas, new approaches, and/or groundbreaking results. Second, Fluffy uses multiple existing Ethereum clients that independently implement the specification as cross-referencing oracles. These results outperform state-of-the-art HTAP systems by several orders of magnitude on transactional performance, while just incurring little performance slowdown (5% over pure OLTP workloads) and still enjoying data freshness for analytical queries (less than 20 ms of maximum delay) in the failure-free case. This is the first OSDI in an odd year as OSDI moves to a yearly cadence. Papers accompanied by nondisclosure agreement forms will not be considered. We observe that, due to their intended security guarantees, SC schemes are inherently oblivioustheir memory access patterns are independent of the input data. We also show that Marius can scale training to datasets an order of magnitude beyond a single machine's GPU and CPU memory capacity, enabling training of configurations with more than a billion edges and 550 GB of total parameters on a single machine with 16 GB of GPU memory and 64 GB of CPU memory. Abstract registrations that do not provide sufficient information to understand the topic and contribution (e.g., empty abstracts, placeholder abstracts, or trivial abstracts) will be rejected, thereby precluding paper submission. Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. The experimental results show that Penglai can support 1,000s enclave instances running concurrently and scale up to 512GB secure memory with both encryption and integrity protection. Some recent schedulers choose job resources for users, but do so without awareness of how DL training can be re-optimized to better utilize the provided resources. Lifting predicates and crash framing make the specification easy to use for developers, and logically atomic crash specifications allow for modular reasoning in GoJournal, making the proof tractable despite complex concurrency and crash interleavings. The NAL maintains 1) per-node partial views in PM for serving insert/update/delete operations with failure atomicity and 2) a global view in DRAM for serving lookup operations. Second, it innovates on the underlying cryptographic machinery and constructs a new private information retrieval scheme, FastPIR, that reduces the time to process oblivious access requests for mailboxes. It then feeds those invariants and the desired safety properties to an SMT solver to check if the conjunction of the invariants and the safety properties is inductive. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. The key to our solution, Horcrux, is to account for the non-determinism intrinsic to web page loads and the constraints placed by the browsers API for parallelism. PC members are not required to read supplementary material when reviewing the paper, so each paper should stand alone without it. To remedy this, we introduce DeSearch, the first decentralized search engine that guarantees the integrity and privacy of search results for decentralized services and blockchain apps. At a high level, Addra follows a template in which callers and callees deposit and retrieve messages from private mailboxes hosted at an untrusted server. Kyuhwa Han, Sungkyunkwan University and Samsung Electronics; Hyunho Gwak and Dongkun Shin, Sungkyunkwan University; Jooyoung Hwang, Samsung Electronics. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning, Oort: Efficient Federated Learning via Guided Participant Selection, PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections, Modernizing File System through In-Storage Indexing, Nap: A Black-Box Approach to NUMA-Aware Persistent Memory Indexes, Rearchitecting Linux Storage Stack for s Latency and High Throughput, Optimizing Storage Performance with Calibrated Interrupts, ZNS+: Advanced Zoned Namespace Interface for Supporting In-Storage Zone Compaction, DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling, CLP: Efficient and Scalable Search on Compressed Text Logs, Polyjuice: High-Performance Transactions via Learned Concurrency Control, Retrofitting High Availability Mechanism to Tame Hybrid Transaction/Analytical Processing, The nanoPU: A Nanosecond Network Stack for Datacenters, Beyond malloc efficiency to fleet efficiency: a hugepage-aware memory allocator, Scalable Memory Protection in the PENGLAI Enclave, NrOS: Effective Replication and Sharing in an Operating System, Addra: Metadata-private voice communication over fully untrusted infrastructure, Bringing Decentralized Search to Decentralized Services, Finding Consensus Bugs in Ethereum via Multi-transaction Differential Fuzzing, MAGE: Nearly Zero-Cost Virtual Memory for Secure Computation, Zeph: Cryptographic Enforcement of End-to-End Data Privacy, It's Time for Operating Systems to Rediscover Hardware, DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols, GoJournal: a verified, concurrent, crash-safe journaling system, STORM: Refinement Types for Secure Web Applications, Horcrux: Automatic JavaScript Parallelism for Resource-Efficient Web Computation, SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs, Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads, GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs, Marius: Learning Massive Graph Embeddings on a Single Machine, P3: Distributed Deep Graph Learning at Scale. They collectively make the backup fresh, columnar, and fault-tolerant, even facing millions of concurrent transactions per second. DistAI: Data-Driven Automated Invariant Learning for Distributed Protocols Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh . In this paper, we present P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. Grand Rapids, Michigan, United States . The program co-chairs will use this information at their discretion to preserve the anonymity of the review process without jeopardizing the outcome of the current OSDI submission. This distinction forces a re-design of the scheduler. Existing systems that hide voice call metadata either require trusted intermediaries in the network or scale to only tens of users. Session Chairs: Dushyanth Narayanan, Microsoft Research, and Gala Yadgar, TechnionIsrael Institute of Technology, Jinhyung Koo, Junsu Im, Jooyoung Song, and Juhyung Park, DGIST; Eunji Lee, Soongsil University; Bryan S. Kim, Syracuse University; Sungjin Lee, DGIST. Memory allocation represents significant compute cost at the warehouse scale and its optimization can yield considerable cost savings. Hence, CLP enables efficient search and analytics on archived logs, something that was impossible without it. To help more profitably utilize sanitizers, we introduce SanRazor, a practical tool aiming to effectively detect and remove redundant sanitizer checks. Authors may upload supplementary material in files separate from their submissions. Under different configurations of TPC-C and TPC-E, Polyjuice can achieve throughput numbers higher than the best of existing algorithms by 15% to 56%. Fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. We present case studies and end-to-end applications that show how Storm lets developers specify diverse policies while centralizing the trusted code to under 1% of the application, and statically enforces security with modest type annotation overhead, and no run-time cost. KEVIN combines a fast, lightweight, and POSIX compliant file system with a key-value storage device that performs in-storage indexing. Session Chairs: Gennady Pekhimenko, University of Toronto / Vector Institute, and Shivaram Venkataraman, University of WisconsinMadison, Aurick Qiao, Petuum, Inc. and Carnegie Mellon University; Sang Keun Choe and Suhas Jayaram Subramanya, Carnegie Mellon University; Willie Neiswanger, Petuum, Inc. and Carnegie Mellon University; Qirong Ho, Petuum, Inc.; Hao Zhang, Petuum, Inc. and UC Berkeley; Gregory R. Ganger, Carnegie Mellon University; Eric P. Xing, MBZUAI, Petuum, Inc., and Carnegie Mellon University. Title Page, Copyright Page, and List of Organizers | Pollux simultaneously considers both aspects. Concretely, Dorylus is 1.22 faster and 4.83 cheaper than GPU servers for massive sparse graphs. Camera-ready submission (all accepted papers): 15 Mars 2022. For example, optimistic concurrency control (OCC) is better than two-phase-locking (2PL) under low contention, while the converse is true under high contention. Advisor: You have a past or present association as thesis advisor or advisee. Her robot soccer teams have been RoboCup world champions several times, and the CoBot mobile robots have autonomously navigated for more than 1,000km in university buildings. Nico Lehmann and Rose Kunkel, UC San Diego; Jordan Brown, Independent; Jean Yang, Akita Software; Niki Vazou, IMDEA Software Institute; Nadia Polikarpova, Deian Stefan, and Ranjit Jhala, UC San Diego. Please identify yourself as a presenter and include your mailing address in your email. For more details on the submission process, and for templates to use with LaTeX, Word, etc., authors should consult the detailed submission requirements. Using selective profiling, we build DMon, a system that can automatically locate data locality problems in production, identify access patterns that hurt locality, and repair such patterns using targeted optimizations. Attaching supplementary material is optional; if your paper says that you have source code or formal proofs, you need not attach them to convince the PC of their existence. We present NrOS, a new OS kernel with a safer approach to synchronization that runs many POSIX programs. Secure hardware enclaves have been widely used for protecting security-critical applications in the cloud. As a result, the design of a file system with respect to space management and crash consistency is simplified, requiring only 10.8K LOC for full functionality. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. The device then "calibrates" its interrupts to completions of latency-sensitive requests. Haojie Wang, Jidong Zhai, Mingyu Gao, Zixuan Ma, Shizhi Tang, and Liyan Zheng, Tsinghua University; Yuanzhi Li, Carnegie Mellon University; Kaiyuan Rong and Yuanyong Chen, Tsinghua University; Zhihao Jia, Carnegie Mellon University and Facebook.