Android Performance

Android Systrace Basics - SurfaceFlinger Explained

Word count: 1.8kReading time: 11 min
2020/02/14
loading

This is the fifth article in the Systrace series, primarily providing a brief introduction to the workflow of SurfaceFlinger. It covers several important threads within SurfaceFlinger, including Vsync signal interpretation, app buffer display, and jank detection. Since Vsync has already been covered in Systrace Basics - Vsync Explained and Detailed Explanation of Android Rendering Mechanism Based on Choreographer, it won’t be discussed in detail here.

The purpose of this series is to view the overall operation of the Android system from a different perspective using Systrace, while also providing an alternative angle for learning the Framework. Perhaps you’ve read many articles about the Framework but can never remember the code, or you’re unclear about the execution flow. Maybe from Systrace’s graphical perspective, you can gain a deeper understanding.

Table of Contents

Series Article Index

  1. Introduction to Systrace
  2. Systrace Basics - Prerequisites for Systrace
  3. Systrace Basics - Why 60 fps?
  4. Android Systrace Basics - SystemServer Explained
  5. Systrace Basics - SurfaceFlinger Explained
  6. Systrace Basics - Input Explained
  7. Systrace Basics - Vsync Explained
  8. Systrace Basics - Vsync-App: Detailed Explanation of Choreographer-Based Rendering Mechanism
  9. Systrace Basics - MainThread and RenderThread Explained
  10. Systrace Basics - Binder and Lock Contention Explained
  11. Systrace Basics - Triple Buffer Explained
  12. Systrace Basics - CPU Info Explained
  13. Systrace Smoothness in Action 1: Understanding Jank Principles
  14. Systrace Smoothness in Action 2: Case Analysis - MIUI Launcher Scroll Jank Analysis
  15. Systrace Smoothness in Action 3: FAQs During Jank Analysis
  16. Systrace Responsiveness in Action 1: Understanding Responsiveness Principles
  17. Systrace Responsiveness in Action 2: Responsiveness Analysis - Using App Startup as an Example
  18. Systrace Responsiveness in Action 3: Extended Knowledge on Responsiveness
  19. Systrace Thread CPU State Analysis Tips - Runnable
  20. Systrace Thread CPU State Analysis Tips - Running
  21. Systrace Thread CPU State Analysis Tips - Sleep and Uninterruptible Sleep

Main Content

Here is the official definition of SurfaceFlinger:

  1. Most apps display three layers on the screen: a status bar at the top, a navigation bar at the bottom or side, and the app interface itself. Some apps may have more or fewer layers (e.g., the default home screen app has a dedicated wallpaper layer, while a fullscreen game might hide the status bar). Each layer can be updated independently. The status bar and navigation bar are rendered by system processes, while the app layer is rendered by the app, with no coordination between them.
  2. Device displays refresh at a fixed rate, typically 60 fps on phones and tablets. If content updates during a refresh, tearing occurs; therefore, it’s essential to update content only between cycles. The system receives a signal from the display device when it’s safe to update content. For historical reasons, we call this the VSYNC signal.
  3. Refresh rates may change over time; for example, some mobile devices range between 58 fps and 62 fps depending on current conditions. For HDMI-connected TVs, the rate can theoretically drop to 24 Hz or 48 Hz to match video. Since each refresh cycle can only update the screen once, submitting buffers at 200 fps is wasteful as most frames will be discarded. SurfaceFlinger doesn’t act every time an app submits a buffer; instead, it wakes up only when the display device is ready for a new buffer.
  4. When a VSYNC signal arrives, SurfaceFlinger traverses its layer list searching for new buffers. If found, it acquires them; otherwise, it continues using previously acquired buffers. SurfaceFlinger must always display content, so it retains one buffer. If a layer lacks a submitted buffer, it is ignored.
  5. After collecting all buffers for visible layers, SurfaceFlinger consults the Hardware Composer on how to perform composition.

— Cited from SurfaceFlinger and Hardware Composer

Below is the flowchart corresponding to this process. Simply put, SurfaceFlinger‘s primary function is: it accepts data buffers from multiple sources, composites them, and sends them to the display device.

In Systrace, we focus on the parts corresponding to this diagram:

  1. App Section
  2. BufferQueue Section
  3. SurfaceFlinger Section
  4. HWComposer Section

These four parts appear in Systrace in chronological order: 1, 2, 3, then 4. Let’s examine the rendering flow from these four perspectives.

App Section

The App section is detailed in Systrace Basics - MainThread and RenderThread Explained. The primary flow is shown below:

From SurfaceFlinger‘s perspective, the App section is responsible for producing the Surfaces needed for composition.

The interaction between an App and SurfaceFlinger centers on three points:

  1. Vsync signal reception and processing.
  2. RenderThread‘s dequeueBuffer.
  3. RenderThread‘s queueBuffer.

Vsync Signal Reception and Processing

This is covered in Android Based on Choreographer Rendering Mechanism Explained. The first item in the diagram marks the Vsync-App signal arriving from SurfaceFlinger. Upon receiving this, the app begins preparing its frame.

RenderThread’s dequeueBuffer

dequeueBuffer means requesting a buffer from the BufferQueue within SurfaceFlinger. Before rendering begins, the app makes a Binder call to obtain a buffer:

App-side Systrace:
-w1249

SurfaceFlinger-side Systrace:
-w826

RenderThread’s queueBuffer

queueBuffer puts the buffer back into the BufferQueue after the app finishes processing (writing drawcalls). This follows the eglSwapBuffersWithDamageKHR -> queueBuffer flow:

App-side Systrace:
-w1165

SurfaceFlinger-side Systrace:
-w1295

Through these three parts, you should have a clear understanding of the flow depicted below:
-w410

BufferQueue Section

BufferQueue is discussed in Systrace Basics - Triple Buffer Explained. Each process with a display interface has a corresponding BufferQueue. The consumer creates and owns the data structure, which can exist in a different process from the producer. The flow is:

In this example, the App is the producer (filling buffers), and SurfaceFlinger is the consumer (compositing buffers).

  1. dequeue (initiated by producer): When the producer needs a buffer, it calls dequeueBuffer(), specifying width, height, pixel format, and flags.
  2. queue (initiated by producer): After filling the buffer, the producer calls queueBuffer() to return it to the queue.
  3. acquire (initiated by consumer): The consumer calls acquireBuffer() to take the buffer and use its content.
  4. release (initiated by consumer): Once finished, the consumer calls releaseBuffer() to return the buffer to the queue.

SurfaceFlinger Section

Workflow

As mentioned, SurfaceFlinger‘s main job is composition:

When a VSYNC signal arrives, SurfaceFlinger traverses its layer list searching for new buffers. If found, it acquires them; otherwise, it continues using previously acquired buffers… It then asks the Hardware Composer how to perform composition.

In Systrace, the main thread starts working upon receiving the Vsync signal:
-w1296

The corresponding code handles two main messages:

  1. MessageQueue::INVALIDATE — executes handleMessageTransaction and handleMessageInvalidate.
  2. MessageQueue::REFRESH — executes handleMessageRefresh.

frameworks/native/services/surfaceflinger/SurfaceFlinger.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
void SurfaceFlinger::onMessageReceived(int32_t what) NO_THREAD_SAFETY_ANALYSIS {
ATRACE_CALL();
switch (what) {
case MessageQueue::INVALIDATE: {
......
bool refreshNeeded = handleMessageTransaction();
refreshNeeded |= handleMessageInvalidate();
......
break;
}
case MessageQueue::REFRESH: {
handleMessageRefresh();
break;
}
}
}

// handleMessageInvalidate implementation:
bool SurfaceFlinger::handleMessageInvalidate() {
ATRACE_CALL();
bool refreshNeeded = handlePageFlip();

if (mVisibleRegionsDirty) {
computeLayerBounds();
if (mTracingEnabled) {
mTracing.notify("visibleRegionsDirty");
}
}

for (auto& layer : mLayersPendingRefresh) {
Region visibleReg;
visibleReg.set(layer->getScreenBounds());
invalidateLayerStack(layer, visibleReg);
}
mLayersPendingRefresh.clear();
return refreshNeeded;
}

// handleMessageRefresh implementation; most SF work starts here:
void SurfaceFlinger::handleMessageRefresh() {
ATRACE_CALL();

mRefreshPending = false;

const bool repaintEverything = mRepaintEverything.exchange(false);
preComposition();
rebuildLayerStacks();
calculateWorkingSet();
for (const auto& [token, display] : mDisplays) {
beginFrame(display);
prepareFrame(display);
doDebugFlashRegions(display, repaintEverything);
doComposition(display, repaintEverything);
}

logLayerStats();

postFrame();
postComposition();

mHadClientComposition = false;
mHadDeviceComposition = false;
for (const auto& [token, displayDevice] : mDisplays) {
auto display = displayDevice->getCompositionDisplay();
const auto displayId = display->getId();
mHadClientComposition =
mHadClientComposition || getHwComposer().hasClientComposition(displayId);
mHadDeviceComposition =
mHadDeviceComposition || getHwComposer().hasDeviceComposition(displayId);
}

mVsyncModulator.onRefreshed(mHadClientComposition);

mLayersWithQueuedFrames.clear();
}

Major functional categories in handleMessageRefresh:

  1. Preparation
    1. preComposition()
    2. rebuildLayerStacks()
    3. calculateWorkingSet()
  2. Composition
    1. beginFrame(display)
    2. prepareFrame(display)
    3. doDebugFlashRegions(display, repaintEverything)
    4. doComposition(display, repaintEverything)
  3. Cleanup
    1. logLayerStats()
    2. postFrame()
    3. postComposition()

Given the display system’s complexity, we won’t cover every detail. If your work involves this area, familiarize yourself with all flows; otherwise, understanding the high-level logic is enough.

Dropped Frames (Jank)

To determine if an app is dropping frames in Systrace, look at SurfaceFlinger:

  1. Does the SurfaceFlinger main thread skip composition at each Vsync-SF?
  2. If it skips, find the reason:
    1. No available buffers?
    2. Occuppied by other tasks (screenshot, HWC, etc.)?
    3. Waiting for presentFence?
    4. Waiting for GPU fence?
  3. If composition occurs, check if your app’s available buffer count is normal. If it’s 0, investigate why the app didn’t queueBuffer in time (likely an app-side issue), as the composition might have been triggered by other processes having available buffers.

For more details on reading this in Systrace, see Systrace Basics - Triple Buffer Explained - Jank Detection.

HWComposer Section

Refer to the official introduction for Hardware Composer HAL (HWC):

  1. HWC determines the most efficient way to composite buffers using available hardware. As a HAL, its implementation is device-specific and typically provided by the display hardware OEM.
  2. Overlay planes composite multiple buffers in hardware rather than the GPU. For example, a portrait phone screen with a status bar, navigation bar, and app content can use:
    1. Render app content to a temporary buffer, then render the status and navigation bars on top, then send that buffer to display hardware.
    2. Send all three buffers to display hardware and instruct it to read different screen parts from different buffers. This is significantly more efficient.
  3. Display processor capabilities vary. HWC performs these calculations to achieve optimal performance:
    1. SurfaceFlinger provides a layer list and asks, “How do you want to handle these?”
    2. HWC marks each layer as either an overlay or GLES composition.
    3. SurfaceFlinger handles GLES composition, sends the output buffer to HWC, and lets HWC handle the rest.
  4. When screen content doesn’t change, overlays can be less efficient than GL composition, especially with transparent pixels. HWC may opt for GLES composition in such cases to save battery during idle.
  5. Android 4.4+ devices typically support 4 overlay planes. Exceeding this triggers GLES composition for some layers, significantly affecting energy and performance.

— Cited from SurfaceFlinger and Hardware Composer

Continuing with the SurfaceFlinger main thread, here is the communication with HWC (step 3 above):
-w1149

This corresponds to the latter part of the earlier diagram:
-w563

Details are extensive. HWC’s performance critical; issues like insufficient performance or slow interrupts can cause jank, as noted in Overview of Jank Causes in Android - System Side.

For more HWC knowledge, see Android P Display System (1): Hardware Composition HWC2 and the related series.

References

  1. Android P Display System (1): Hardware Composition HWC2
  2. Android P Graphics Display System
  3. Definition of SurfaceFlinger
  4. SurfaceFlinger Analysis Document

About Me && Blog

Below is my personal intro and related links. I look forward to exchanging ideas with fellow professionals. “When three walk together, one can always be my teacher!”

  1. Blogger Intro: Includes personal WeChat and WeChat group links.
  2. Blog Content Navigation: A guide for my blog content.
  3. Curated Excellent Blog Articles - Android Performance Optimization Must-Knows: Welcome to recommend projects/articles.
  4. Android Performance Optimization Knowledge Planet: Welcome to join and thank you for your support~

One walks faster alone, but a group walks further together.

Scan WeChat QR Code

CATALOG
  1. 1. Table of Contents
  • Series Article Index
  • Main Content
    1. 1. App Section
      1. 1.1. Vsync Signal Reception and Processing
      2. 1.2. RenderThread’s dequeueBuffer
      3. 1.3. RenderThread’s queueBuffer
    2. 2. BufferQueue Section
    3. 3. SurfaceFlinger Section
      1. 3.1. Workflow
      2. 3.2. Dropped Frames (Jank)
    4. 4. HWComposer Section
  • References
  • About Me && Blog