Evolution of High Performance Networking in Chromium
Jim Roskind, Member of Technical Staff
Chromium has used extensive client side instrumentation to drive design decisions which have lead to significant advances in both browser wide technology and Internet protocol technology. This talk will describe the design and evolution of the statistics gathering in Chromium (that has since been adopted by Firefox), and then give numerous examples of how resulting data has been used to benefit network stack functionality, and eventually support a new protocol design (re: QUIC). The deep dive into the statistics gathering will emphasize an ease-of-use in the API that simplified requests for information, combined with extremely efficient race-tolerant coding, used to gather data without impacting performance. The network advances that were facilitated include numerous speculative activities, atop protocols, new and old.
The Keys to Actionable Perf Investigations
Vance Morrison, Performance Architect
Almost all performance investigations are conceptually simple: You either care about making things fast or use less memory. All other metrics are just in support of these simple goals. In this talk, Vance Morrison will distill his most important lessons from over a decade of performance investigations and improvements while working on the .NET Runtime, and how the simple goals of making things fast and efficient can be so difficult. He will discuss which events can be collected and used to very effectively attack performance problems. He will cover the concepts of thread time and causality tracking, an approach that allows scenarios with significant blocked time, concurrency, and asynchrony to be diagnosed. Finally, he will show how relatively novel ways of grouping data can be very helpful finding performance issues.
Automatic Regression Triaging at Facebook
Guilin Chen, Software Engineer
AutoTriage is a tool that we have built at Facebook that automates the root cause analysis of regression triaging. We use AutoTriage to understand performance regressions on Facebook's web tier. Facebook's web tier is where we house the business logic for privacy/permission checking, and for rendering html or json to various clients. It is home to over 25 products, has more than 1000 developers contributing to it, is pushed 3 times a day, and have thousands of configuration changes dynamically changing it per day. AutoTriage allows the Site Efficiency team to scale it's regression analysis to this fast paced development culture.
Increasing Ad Revenue by Improving Performance
Daniel Greenia, Senior Analyst
The same technologies that make Google’s Ad Network scalable for new customers also create opportunities for abuse in the system. Thousands of gray-area accounts are flagged every day for human review, leading to a classic resource constraint problem: Given more account reviews than available human reviewers, how to we prioritize targets for our limited resources? In this presentation, Daniel will demonstrate a method for identifying high-value work items through a combination of value forecasting and cost estimation. Applying this method has increased the efficiency of the ad review system and led to a threefold performance improvement in the group’s primary metric.
Real-World Performance Data for Mobile
Michelle Filiba, Software Engineer
Facebook mobile performance impacts people who rely on the application every day to connect and share. With this in mind, the Mobile Speed team focuses on making important experiences in the app fast. We have many tools that simulate the speed of an experience and look for performance improvements when testing on local phones. However, there are only so many phones and environments we can simulate. This pushed us to start thinking of a way to collect performance data from real people's phones as they use our application. The system we built, Loom, dynamically collects trace data from their devices. Loom allows us to take a deep dive into the data and learn more about the root cause of performance issues.
Visualizing and Optimizing Real User Performance on Mobile
Anant Rao, Engineering Manager
Browsers on the desktop offer several standard tools for visualizing page load performance. As more and more of our member traffic shifts to our native mobile apps, the lack of maturity in standard tools on mobile, makes performance hotspot detection challenging. In our quest to solve this, we've taken inspiration from tools offered by browsers, and built out their equivalent on mobile. This talk will focus on what we measure & visualize performance data and how this has helped us root cause sub-par performance and find optimization opportunities, that directly impact our member experience.
Linux 4.x Performance: Using BPF Superpowers
Brendan Gregg, Senior Performance Architect
Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter). It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix.