Android Performance

Android ANR Series 3 - ANR Case Studies

Word count: 6kReading time: 37 min
2025/02/08
loading

This is the third article in the Android App ANR series, focusing on real-world ANR cases. The series includes:

  1. Android App ANR Series 1: Understanding Android ANR Design Philosophy
  2. Android App ANR Series 2: ANR Analysis Workflow and Key Logs
  3. Android App ANR Series 3: ANR Case Studies

ANR (Application Not Responding) is a simple definition that covers many Android system design philosophies.

First, ANR belongs to the application scope. This differs from SNR (System Not Responding), which reflects issues in the system process (system_server). SNR is managed by the Watchdog mechanism, while ANR is managed by the message handling mechanism at the system layer through message scheduling and timeout handling.

Second, the ANR mechanism is primarily implemented in the system layer. All ANR-related messages are scheduled by system_server and dispatched to the application process. The system uses various timeout limits to track message processing. If an app fails to respond within these limits, the system collects diagnostics (CPU/IO usage, thread stacks) and reports that the process is not responding.

Third, ANR is essentially a performance issue. The mechanism limits the app’s main thread, requiring it to handle common operations (starting services, processing broadcasts, input) within fixed timeframes. Time-consuming operations in the main thread (heavy calculations, heavy I/O, complex layouts) reduce responsiveness.

Finally, some ANR issues are very difficult to analyze. System-level issues like Linux kernel bugs, memory fragmentation, or hardware failures can cause message scheduling failures that are hard to reproduce and diagnose.

– From duanqz

Common Causes of ANR

When investigating ANR reasons, usually we must follow the rule: Boldly hypothesize, carefully verify. After extracting the abnormal parts, first hypothesize that the problem is caused here, and then use this hypothesis as a starting point to check the Logs before and after to see if they can support your hypothesis. If not, then switch to another point.

Issues Within the Current Process

  1. Deadlocks.
  2. Main thread calls thread.join(), sleep(), wait(), or waits for a thread lock.
  3. Main thread is blocked in nSyncDraw (waiting for RenderThread).
  4. Time-consuming operations on the main thread, such as complex layouts, massive for loops, IO, etc.
  5. Main thread is blocked by a sub-thread’s synchronized lock.
  6. Main thread times out waiting for a sub-thread.
  7. Main thread Activity lifecycle method execution timeout.
  8. Main thread Service lifecycle method execution timeout.
  9. Main thread Broadcast.onReceive function execution timeout (even if goAsync is called).
  10. RenderThread is time-consuming.
  11. Time-consuming network access.
  12. Application reads and writes a large amount of data.
  13. Database operations.
  14. Hardware operations (such as Camera).
  15. The number of service binders reaches the upper limit.
  16. Other threads terminate or crash, causing the main thread to wait indefinitely.
  17. Dump memory operations.
  18. A large number of SharedPreference reads and writes at the same time.

Issues in Remote Processes or the System

  1. Binder communication with SystemServer, and SystemServer execution is time-consuming.
    1. The method itself takes a long time to execute causing timeout.
    2. Too much lock contention in SystemServer, resulting in lock wait timeout.
  2. Timeout waiting for other processes to return, such as getting data from other processes’ ContentProvider.
  3. Window state confusion leading to Input dispatch timeout.
  4. Frequent crashes of the process corresponding to the ContentProvider will also kill the current process.
  5. System-wide low memory.
  6. System-wide high CPU usage.
  7. System-wide high IO usage.
  8. SurfaceFlinger timeout.
  9. Bug in system freezing function.
  10. WatchDog ANR in System Server.
  11. The device triggers thermal control frequency limitation.

ANR Case Studies

ANR Case: Toutiao - Deadlock

The main operation is frequently pulling out Toutiao from the sidebar for split-screen operations. After multiple operations, the application experienced an ANR, which caused the phone to briefly black screen and freeze in the application interface in the split-screen bar. However, it returned to normal after 4-5 seconds.

The reason for the application ANR is that its own main thread is blocked.

1
2
3
4
5
6
7
8
9
10
11
12
"main"prio=5tid=1Blocked
| group="main"sCount=1dsCount=0obj=0x74f9bbe8self=0xe7084400
| sysTid=28210nice=0cgrp=defaultsched=0/0handle=0xe9dcd534
| state=S schedstat=(13454428309928953492) utm=121stm=13core=3HZ=100
| stack=0xff3b6000-0xff3b8000stackSize=8MB
| held mutexes=
at com.ss.android.common.applog.LogReaper.insertCrashLog(SourceFile:98)
- waiting to lock <0x0d3fbd00> (a com.ss.android.common.applog.LogReaper) held by thread34
at com.ss.android.common.applog.AppLog.uncaughtException(SourceFile:1408
at u.aly.n.uncaughtException(SourceFile:34)
at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1068)
at java.lang.ThreadGroup.uncaughtException(ThreadGroup.java:1063)

Key Information:

1
waiting to lock <0x0d3fbd00> (a com.ss.android.common.applog.LogReaper) held by thread34

Therefore, in the ANR log below, search for tid==34:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"LogReaper"prio=5tid=34TimedWaiting
| group="main"sCount=1dsCount=0obj=0x12fcaba0self=0xcb226e00
| sysTid=28274nice=0cgrp=defaultsched=0/0handle=0xc9f9b920
| state=S schedstat=(77341565269270880) utm=7stm=0core=5HZ=100
| stack=0xc9e99000-0xc9e9b000stackSize=1038KB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x00fc7065> (a java.util.concurrent.atomic.AtomicInteger)
at java.lang.Object.wait(Object.java:407)
at com.ss.android.action.b.d.a(SourceFile:216)
at com.ss.android.newmedia.b.onLogSessionBatchEvent(SourceFile:468)
at com.ss.android.common.applog.DBHelper.batchSession(SourceFile:616)
- locked <0x0d4ff1c4> (a com.ss.android.common.applog.DBHelper)
at com.ss.android.common.applog.LogReaper.switchSession(SourceFile:175)
at com.ss.android.common.applog.LogReaper.switchSession(SourceFile:153)
at com.ss.android.common.applog.LogReaper.processItem(SourceFile:122)
- locked <0x0d3fbd00> (a com.ss.android.common.applog.LogReaper)
at com.ss.android.common.applog.LogReaper.run(SourceFile:632)

Key Information:

1
waiting on <0x00fc7065> (a java.util.concurrent.atomic.AtomicInteger)

Caused by code blocking within the app.

ANR Case: ANR caused by Freezing

Search for am_anr to view the time when the ANR occurred.

1
05-0100:51:39.594 1449 5234I am_anr  : [0,2169,com.xxx.weather2,820526660,Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 1.)]

From the ANR description, it can be seen that the current Window timed out while processing the previous Input event, causing the new event not to be processed in time, so an ANR occurred.

Search for ANR in to view the CPU information at that time.

1
2
3
4
5
6
7
8
9
05-0100:51:53.974 1449 5234E ActivityManager: ANR in com.xxx.weather2 (com.xxx.weather2/com.xxx.weather.xxxMainActivity)
05-0100:51:53.974 1449 5234E ActivityManager: PID:2169
05-0100:51:53.974 1449 5234E ActivityManager: Reason:Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 1.)
05-01 00:51:53.974 1449 5234 E ActivityManager: Parent: com.xxx.weather2/com.xxx.weather.xxxMainActivity
05-01 00:51:53.974 1449 5234 E ActivityManager: Load: 29.89 / 31.82 / 32.27
05-01 00:51:53.974 1449 5234 E ActivityManager: CPU usage from 4583ms to 12043ms later (2020-05-01 00:51:44.177 to 2020-05-01 00:51:51.637):
05-01 00:51:53.974 1449 5234 E ActivityManager: 18% 1449/system_server: 9.1% user + 9.3% kernel / faults: 7819 minor 1 major
05-01 00:51:53.974 1449 5234 E ActivityManager: 10% 720/surfaceflinger: 6.2% user + 4.4% kernel / faults: 734 minor 26 major
05-01 00:51:53.974 1449 5234 E ActivityManager: 10% 651/android.hardware.audio@5.0-service-mediatek: 9.5% user + 0.8% kernel / faults: 1 minor 4 major

It can be seen that the CPU was not busy at that time, so it was not an ANR caused by system load. Therefore, subsequent analysis needs to extract corresponding information from the Log to analyze what the system was doing before and after the timeout input event occurred.

The analysis process is as follows:

  1. Find the specific time when the ANR occurred: 00:51:39.594.
  2. Start looking from the Log 5s ago and find useful information: 00:51:34.478.
  3. It was found that there was nothing abnormal at this time point, only that the application did not respond to key_back_down, causing an ANR reported 5s later.
  4. From the Log after ANR, it is found that xxxHansManager : unfreeze uid: 10182 package: com.xxx.weather2 reason: Signal. This Log indicates that com.xxx.weather2 was in a frozen state before and was unfrozen here. So we can verify the bold guess: was com.xxx.weather2 unable to respond to events after being frozen, causing the issue?
  5. Search for the freezing Log. It can be seen that at 00:51:04.000, the system froze com.xxx.weather2: xxxHansManager : freeze uid: 10182 package: com.xxx.weather2 reason: LcdOff.
  6. This requires the system team to check if there is a problem with the logic of freezing.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Due to LcdOff, initiate freezing, weather2 is frozen: com.xxx.weather2 reason: LcdOff
05-01 00:51:03.994 1449 2187 I xxxHansManager : freeze uid: 10245 package: com.tencent.mm reason: LcdOff
05-01 00:51:03.994 1449 2187 D xxxListManagerImpl: com.tencent.mm in autoStart list!
05-01 00:51:03.998 1449 2187 I xxxHansManager : freeze uid: 10246 package: com.xunmeng.pinduoduo reason: LcdOff
05-01 00:51:03.998 1449 2187 D xxxListManagerImpl: com.xunmeng.pinduoduo in autoStart list!
05-01 00:51:04.000 1449 2187 I xxxHansManager : freeze uid: 10182 package: com.xxx.weather2 reason: LcdOff
05-01 00:51:04.002 645 30914 I netd : firewallSetUidRule(4, 10182, 2) <0.09ms>
05-01 00:51:04.005 1449 2187 I xxxHansManager : freeze uid: 10187 package: com.heytap.yoli reason: LcdOff
05-01 00:51:04.005 1449 2187 D xxxListManagerImpl: com.heytap.yoli in autoStart list!
05-01 00:51:04.005 1449 2187 I xxxHansManager : isHansImportantCase uid: 10196 pkg: cn.kuwo.player reason: audiofocus
05-01 00:51:04.007 1449 2187 I xxxHansManager : freeze uid: 10234 package: com.tencent.mobileqq reason: LcdOff
05-01 00:51:04.008 1449 2187 D xxxListManagerImpl: com.tencent.mobileqq in autoStart list!
05-01 00:51:04.014 1449 2187 I xxxHansManager : freeze uid: 10235 package: com.tencent.qqlive reason: LcdOff

// weather2 Received KEYCODE_BACK ACTION_DOWN
05-01 00:51:04.471 1449 9842 D xxxPhoneWindowManager: interceptKeyBeforeQueueing:KeyEvent { action=ACTION_DOWN, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x8, repeatCount=0, eventTime=38812125, downTime=38812125, deviceId=-1, source=0x101, displayId=0 }
05-01 00:51:04.471 1449 9842 I sysui_multi_action: [757,803,799,key_back_down,802,1]
05-01 00:51:04.477 1449 1620 D xxxPhoneWindowManager: interceptKeyBeforeDispatching key: win=Window{1b36ae8 u0 com.xxx.weather2/com.xxx.weather.xxxMainActivity} event = KeyEvent { action=ACTION_DOWN, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x8, repeatCount=0, eventTime=38812125, downTime=38812125, deviceId=-1, source=0x101, displayId=0 }
05-01 00:51:04.477 1449 1620 D xxxPhoneWindowManager: interceptKeyBeforeDispatching newEvent keyCode = 4

// weather2 Received KEYCODE_BACK ACTION_UP
05-01 00:51:34.478 1449 1620 D xxxPhoneWindowManager: interceptKeyBeforeDispatching key: win=Window{1b36ae8 u0 com.xxx.weather2/com.xxx.weather.xxxMainActivity} event = KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x8, repeatCount=0, eventTime=38842126, downTime=38842126, deviceId=-1, source=0x101, displayId=0 }
05-01 00:51:34.478 1449 1620 D xxxPhoneWindowManager: interceptKeyBeforeDispatching newEvent keyCode = 4

// KEYCODE_BACK ACTION_UP did not respond for 5s, triggering ANR
05-01 00:51:39.484 1449 1620 I WindowManager: Input event dispatching timed out sending to com.xxx.weather2/com.xxx.weather.xxxMainActivity. Reason: Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 1.
05-01 00:51:39.484 2256 5518 I QUALITY-TOTAL: exp: anr
05-01 00:51:39.594 1449 5234 I am_anr : [0,2169,com.xxx.weather2,820526660,Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 0. Wait queue length: 1.)]
05-01 00:51:39.593 1449 1620 D ActivityManager: ANR post Runnable for ProcessRecord{ae0c833 2169:com.xxx.weather2/u0a182} to deal with anr happend at 38847248@#@2169
05-01 00:51:39.593 1449 1620 D ActivityManager: ANR caller(2) = com.android.server.am.ActivityManagerService$LocalService.inputDispatchingTimedOut:19872 com.android.server.wm.ActivityRecord.keyDispatchingTimedOut:2641 com.android.server.wm.AppWindowToken.keyDispatchingTimedOut:2007 com.android.server.wm.InputManagerCallback.notifyANR:111 com.android.server.input.InputManagerService.notifyANR:1822 <bottom of call stack> <bottom of call stack> <bottom of call stack>

// The system sends Signal 3 to weather2 (SIGNAL_QUIT = 3, given to the Signal Catcher thread to output Trace)
05-01 00:51:40.868 2169 2202 I oloros.weather: Thread[7,tid=2202,WaitingInMainSignalCatcherLoop,Thread*=0xd9878400,peer=0x138c0250,"Signal Catcher"]: reacting to signal

// The system detected the Signal given to weather2, so unfreeze com.xxx.weather2
05-01 00:51:40.869 1449 9811 I xxxHansManager : unfreeze uid: 10182 package: com.xxx.weather2 reason: Signal
05-01 00:51:40.869 2169 2202 I oloros.weather:
05-01 00:51:40.869 2169 2169 W ViewRootImpl[xxxMainActivity]: Dropping event due to no window focus: KeyEvent { action=ACTION_DOWN, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x8, repeatCount=0, eventTime=38812125, downTime=38812125, deviceId=-1, source=0x101, displayId=0 }, hasFocus:true, mStopped:true, mPausedForTransition:false
05-01 00:51:40.869 2169 2169 W ViewRootImpl[xxxMainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=38847249, downTime=38812125, deviceId=-1, source=0x101, displayId=0 }, hasFocus:true, mStopped:true, mPausedForTransition:false

// You can see here that weather2 was stuck for 36391.9ms, which is the time from 00:51:04.471 (KEYCODE_BACK sent) to 00:51:40.869 (current time)
05-01 00:51:40.869 1449 1620 I InputDispatcher: Window '1b36ae8 com.xxx.weather2/com.xxx.weather.xxxMainActivity (server)' spent 36391.9ms processing the last input event: KeyEvent
05-01 00:51:40.869 2169 2169 V ViewRootImpl[xxxMainActivity]: Sending input event to IME: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_BACK, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=38847249, downTime=38812125, deviceId=-1, source=0x101, displayId=0 }
05-01 00:51:40.870 2169 2169 I Choreographer: Skipped 2212 frames! The application may be doing too much work on its main thread.

ANR Case: Broadcast Timeout causing ANR

Logs for this case are missing, but specifically, the user was performing a migration. The BroadcastReceiver‘s onReceive started a thread to handle it, and eventually reported a BroadcastReceiver ANR.

1
2
04-0717:03:17.174435 1448 1476V WindowManager: Lookingforfocus: Window{efa2a4f u0 正在升级数据库,数据量较大,请耐心等待}, flags=25296898, canReceive=true1
04-0717:03:17.174459 1448 1476V WindowManager: findFocusedWindow: Foundnewfocus @ Window{efa2a4f u0 正在升级数据库,数据量较大,请耐心等待}2

Analysis is as follows:

  1. The reason for the calendar background broadcast ANR is that onReceive execution timed out. Previously, it wasn’t obvious because the time-consuming operation in onReceive used new Thread, so it was thought that doing time-consuming operations in a sub-thread would not affect the execution of subsequent broadcasts.
  2. Later it was found that PendingResult result = goAsync(); was called in onReceive. This statement allows time-consuming operations to be performed in a sub-thread during the execution of an ordered broadcast receiver, without affecting the receiver’s lifecycle. This method is very simple, returning mPendingResult and setting it to null.
  3. If we call this method in onReceive, it means the broadcast processing flow is interrupted. When onReceive finishes execution, since mPendingResult is null, AMS.finishReceiver will not be called immediately. And because goAsync returns a PendingResult, we can call PendingResult.finish at any time and on any thread to callback AMS. This is equivalent to turning a synchronous callback into an asynchronous callback. During this asynchronous callback process, we can start a new thread to perform some time-consuming IO operations, etc. Simply put, goAsync provides a mechanism that allows us to process broadcast messages in an asynchronous thread to prevent the main thread from being blocked.
  4. In this case, although a thread was used to handle the time-consuming task in onReceive, since goAsync was called, the timeout is still calculated. If it is not completed within the specified time, even if it is in a sub-thread, it will trigger an ANR for the BroadcastReceiver.

You can look at the official comments for the goAsync method:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
* This can be called by an application in {@link #onReceive} to allow
* it to keep the broadcast active after returning from that function.
* This does <em>not</em> change the expectation of being relatively
* responsive to the broadcast, but does allow
* the implementation to move work related to it over to another thread
* to avoid glitching the main UI thread due to disk IO.
*
* <p>As a general rule, broadcast receivers are allowed to run for up to 10 seconds
* before they system will consider them non-responsive and ANR the app. Since these usually
* execute on the app's main thread, they are already bound by the ~5 second time limit
* of various operations that can happen there (not to mention just avoiding UI jank), so
* the receive limit is generally not of concern. However, once you use {@code goAsync}, though
* able to be off the main thread, the broadcast execution limit still applies, and that
* includes the time spent between calling this method and ultimately
* {@link PendingResult#finish() PendingResult.finish()}.</p>
*
* <p>If you are taking advantage of this method to have more time to execute, it is useful
* to know that the available time can be longer in certain situations. In particular, if
* the broadcast you are receiving is not a foreground broadcast (that is, the sender has not
* used {@link Intent#FLAG_RECEIVER_FOREGROUND}), then more time is allowed for the receivers
* to run, allowing them to execute for 30 seconds or even a bit more. This is something that
* receivers should rarely take advantage of (long work should be punted to another system
* facility such as {@link android.app.job.JobScheduler}, {@link android.app.Service}, or
* see especially {@link android.support.v4.app.JobIntentService}), but can be useful in
* certain rare cases where it is necessary to do some work as soon as the broadcast is
* delivered. Keep in mind that the work you do here will block further broadcasts until
* it completes, so taking advantage of this at all excessively can be counter-productive
* and cause later events to be received more slowly.</p>
*
* @return Returns a {@link PendingResult} representing the result of
* the active broadcast. The BroadcastRecord itself is no longer active;
* all data and other interaction must go through {@link PendingResult}
* APIs. The {@link PendingResult#finish PendingResult.finish()} method
* must be called once processing of the broadcast is done.
*/
public final PendingResult goAsync(){
PendingResult res = mPendingResult;
mPendingResult =null;
return res;
}

Its usage is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
@Override
public void onReceive(final Context context, final Intent intent){
final PendingResult result = goAsync();
Runnable worker = new Runnable() {
@Override
public void run(){
onReceiveAsync(context, intent);
result.finish();
}
};
mAsyncHandler.post(worker);
}

ANR Case: Launcher - Input ANR

Search am_anr to find the time point when ANR occurred.

  1. It belongs to input dispatch anr.
  2. It belongs to application processing input event timeout (instead of no focus window).
1
19:44:56.815  2515 25056 I am_anr  : [0,3365,com.xxx.launcher,819478085,Input dispatching timed out (Waiting to send key event because the focused window has not finished processing all of the input events that were previously delivered to it.  Outbound queue length: 0.  Wait queue length: 9.)]

Search ANR in to find CPU usage.

  1. From Load : 0.02 / 0.01 / 0.0, the overall load of the device is not high, so it is likely an ANR caused by logic.
  2. The CPU usage of logd and SurfaceFlinger is slightly high, which can be regarded as suspicious objects for focused observation.
1
2
3
4
5
6
7
8
9
10
11
12
1-0119:45:07.753 251525056E ActivityManager: ANR in com.xxx.launcher (com.xxx.launcher/.Launcher)
19:45:07.753 251525056E ActivityManager: PID:3365
19:45:07.753 251525056E ActivityManager: Reason: Input dispatching timed out (Waiting to send key event because the focused window hasnotfinished processing all of the input events that were previously delivered to it. Outbound queue length:0. Wait queue length:9.)
19:45:07.753 251525056E ActivityManager: Parent: com.xxx.launcher/.Launcher
19:45:07.753 251525056E ActivityManager: Load:0.02/0.01/0.0
19:45:07.753 251525056E ActivityManager: CPU usage from0ms to10936ms later (2020- 19:44:56.751to2020- 19:45:07.688):
19:45:07.753 251525056E ActivityManager: 97%546/logd:97% user +0.2% kernel / faults:12minor
19:45:07.753 251525056E ActivityManager: 25%956/surfaceflinger:11% user +13% kernel / faults:151minor17major
19:45:07.753 251525056E ActivityManager: 1.4%1664/media.codec:0.9% user +0.5% kernel / faults:28729minor4major
19:45:07.753 251525056E ActivityManager: 7.5%2515/system_server:2.9% user +4.6% kernel / faults:1818minor8major
19:45:07.753 251525056E ActivityManager: 3.2%3365/com.xxx.launcher:2.1% user +1% kernel / faults:3535minor29major
19:45:07.753 251525056E ActivityManager: 2%3620/com.xxx.persist.system:1.2% user +0.7% kernel / faults:870minor

Analyze trace.txt to see if there is any useful information. First look at the Launcher’s main thread stack.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
"main"prio=5tid=1Native
| group="main"sCount=1dsCount=0flags=1obj=0x72a50cd0self=0x6f7ae10800
| sysTid=3365nice=-10cgrp=defaultsched=0/0handle=0x700123bed8
| state=S schedstat=(37451337214473599138010494263) utm=26967stm=10484core=0HZ=100
|stack=0x7fc0270000-0x7fc0272000stackSize=8192KB
| held mutexes=
kernel: (couldn't read /proc/self/task/3365/stack)
native: #00pc0000000000071a8c /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28)
native: #01pc0000000000075710 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(voidvolatile*,bool,int,bool, timespecconst*)+140)
native: #02pc00000000000d9744 /apex/com.android.runtime/lib64/bionic/libc.so (pthread_cond_wait+60)
native: #03pc00000000002bf5e8 /system/lib64/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::postAndWait()+168)
native: #04pc00000000002bf510 /system/lib64/libhwui.so (android::uirenderer::renderthread::DrawFrameTask::drawFrame()+44)12
at android.graphics.HardwareRenderer.nSyncAndDrawFrame(Native method)
at android.graphics.HardwareRenderer.syncAndDrawFrame(HardwareRenderer.java:422)
at android.view.ThreadedRenderer.draw(ThreadedRenderer.java:671)
at android.view.ViewRootImpl.draw(ViewRootImpl.java:3983)
- locked <0x030694da> (a java.lang.Object)
at android.view.ViewRootImpl.performDraw(ViewRootImpl.java:3782)
at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:3085)
at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1994)
at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:8201)
at android.view.Choreographer$CallbackRecord.run(Choreographer.java:1085)
at android.view.Choreographer.doCallbacks(Choreographer.java:908)
at android.view.Choreographer.doFrame(Choreographer.java:835)
at android.view.Choreographer$FrameHandler.handleMessage(Choreographer.java:1013)
at android.os.Handler.dispatchMessage(Handler.java:107)
at android.os.Looper.loop(Looper.java:238)
at android.app.ActivityThread.main(ActivityThread.java:7798)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:492)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:995)

It can be seen that the main thread is blocked waiting for RenderThread to return, so continue to check the RenderThread stack.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
"RenderThread"daemon prio=7tid=25Native
| group="main"sCount=1dsCount=0flags=1obj=0x13100b00self=0x6f0c698c00
| sysTid=3616nice=-10cgrp=defaultsched=0/0handle=0x6f072e0d50
| state=S schedstat=(35808915205059629728352628704) utm=29226stm=6582core=3HZ=100
|stack=0x6f071ea000-0x6f071ec000stackSize=991KB
| held mutexes=
kernel: (couldn't read /proc/self/task/3616/stack)
native: #00pc00000000000c49f4 /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+4)
native: #01pc000000000007d518 /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+132)
native: #02pc0000000000059e58 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+244)
native: #03pc000000000005ad94 /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*,int*)+60
native: #04pc000000000005ab38 /system/lib64/libbinder.so (android::IPCThreadState::transact(int,unsignedint, android::Parcelconst&, android::Parcel*,unsignedint)+180)
native: #05pc000000000004f000 /system/lib64/libbinder.so (android::BpBinder::transact(unsignedint, android::Parcelconst&, android::Parcel*,unsignedint)+228)
native: #06pc000000000007fef8 /system/lib64/libgui.so (android::BpGraphicBufferProducer::dequeueBuffer(int*, android::sp<android::Fence>*,unsignedint,unsignedint,int,unsignedlong,unsignedlong*, android::FrameEventHistoryDelta*)+224)
native: #07pc00000000000b9880 /system/lib64/libgui.so (android::Surface::dequeueBuffer(ANativeWindowBuffer**,int*)+392)
native: #08pc0000000000380540 /system/lib64/libhwui.so (android::uirenderer::renderthread::ReliableSurface::hook_dequeueBuffer(ANativeWindow*, ANativeWindowBuffer**,int*)+104)
native: #09pc000000000000989c /vendor/lib64/egl/eglSubDriverAndroid.so (???)
native: #10pc0000000000009388 /vendor/lib64/egl/eglSubDriverAndroid.so (???)
native: #11pc000000000022cd2c /vendor/lib64/egl/libGLESv2_adreno.so (???)
native: #12pc0000000000212b8c /vendor/lib64/egl/libGLESv2_adreno.so (???)
native: #13pc0000000000020838 /system/lib64/libEGL.so (android::eglQuerySurfaceImpl(void*,void*,int,int*)+248)
native: #14pc00000000002c6794 /system/lib64/libhwui.so (android::uirenderer::renderthread::EglManager::beginFrame(void*)+224)
native: #15pc00000000002d414c /system/lib64/libhwui.so (android::uirenderer::renderthread::CanvasContext::draw()+236)
native: #16pc00000000002d34e8 /system/lib64/libhwui.so (_ZNSt3__110__function6__funcIZN7android10uirenderer12renderthread13DrawFrameTask11postAndWaitEvE3$_0NS_9allocatorIS6_EEFvvEEclEv$c303f2d2360db58ed70a2d0ac7ed911b+380)
native: #17pc00000000002de044 /system/lib64/libhwui.so (android::uirenderer::WorkQueue::process()+228)
native: #18pc00000000002ddd24 /system/lib64/libhwui.so (android::uirenderer::renderthread::RenderThread::threadLoop()+576)
native: #19pc0000000000013654 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+328)
native: #20pc00000000000da200 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36)
native: #21pc00000000000769d4 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64)

It can be seen that RenderThread is stuck on binder communication, calling the method dequeueBuffer, waiting for the dequeueBuffer return result. Normally, the main thread does not wait for dequeueBuffer; it continues to execute after nSyncAndDrawFrame returns. It is stuck here because it is waiting for the previous draw task of RenderThread to complete, that is, the previous draw task is stuck in dequeueBuffer, causing the main thread of this frame to be stuck and timeout.

If you are familiar with the frame rendering process, you should know that the peer of dequeueBuffer is SurfaceFlinger. At this time, you need to check whether SurfaceFlinger has any problems (here if there is binder info, you can directly see which thread RenderThread is communicating with in SurfaceFlinger. Binder info can see the details of binder communication, for example, we should be able to see this case (since the binder info is not dumped, the following is just an example).

1
2
3
4
5
6
7
// 956 is SurfaceFlinger, here is the incoming binder information of SurfaceFlinger
proc956:
incoming transaction28350131:0000000000000000from3365:3616to956:1357code2flags10pri0:110r1 node28346208size112:0data0000000000000000

// 3365 is Launcher, 3365:3616 => 956:1356 means from Launcher (3365) rendering thread (3616) to SurfaceFlinger (956) Binder: 956_4 (1357)
proc3365:
outgoing transaction28350131:0000000000000000from3365:3616to956:1357code2flags10pri0:110r1

Here we search for dequeueBuffer to see which Binder thread in SurfaceFlinger that RenderThread is communicating with, and found that it is Binder:956_4.

Corresponding sysTid=1357

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"Binder:956_4"sysTid=1357
#00pc0000000000071a8c /apex/com.android.runtime/lib64/bionic/libc.so (syscall+28) (BuildId:340eb42dd36c6541aec54c306eb4d0ee)
#01pc0000000000075710 /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(voidvolatile*,bool,int,bool, timespecconst*)+140) (BuildId:340eb42dd36c6541aec54c306eb4d0ee)
#02pc00000000000d97e8 /apex/com.android.runtime/lib64/bionic/libc.so (pthread_cond_timedwait+120) (BuildId:340eb42dd36c6541aec54c306eb4d0ee)
#03pc0000000000072df0 /system/lib64/libc++.so (_ZNSt3__118condition_variable15__do_timed_waitERNS_11unique_lockINS_5mutexEEENS_6chrono10time_pointINS5_12system_clockENS5_8durationIxNS_5ratioILl1ELl1000000000EEEEEEE+108) (BuildId:5aad1075509f6e517eb78db32da8fbf6)
#04pc000000000006f9d8 /system/lib64/libgui.so (android::BufferQueueProducer::waitForFreeSlotThenRelock(android::BufferQueueProducer::FreeSlotCaller, std::__1::unique_lock<std::__1::mutex>&,int*)const+788) (BuildId: e017958f19275814d1f2d55ce8b10bfa)
#05pc000000000006fd8c /system/lib64/libgui.so (android::BufferQueueProducer::dequeueBuffer(int*, android::sp<android::Fence>*,unsignedint,unsignedint,int,unsignedlong,unsignedlong*, android::FrameEventHistoryDelta*)+776) (BuildId: e017958f19275814d1f2d55ce8b10bfa)
#06pc000000000007f0a8 /system/lib64/libgui.so (android::BnGraphicBufferProducer::onTransact(unsignedint, android::Parcelconst&, android::Parcel*,unsignedint)+3580) (BuildId: e017958f19275814d1f2d55ce8b10bfa)
#07pc000000000004d67c /system/lib64/libbinder.so (android::BBinder::transact(unsignedint, android::Parcelconst&, android::Parcel*,unsignedint)+136) (BuildId: bb7ca5d12323a310f26473506cf070ed)
#08pc000000000005a55c /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+1008) (BuildId: bb7ca5d12323a310f26473506cf070ed)
#09pc000000000005a0b8 /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+156) (BuildId: bb7ca5d12323a310f26473506cf070ed)
#10pc000000000005a898 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+220) (BuildId: bb7ca5d12323a310f26473506cf070ed)
#11pc000000000008093c /system/lib64/libbinder.so (android::PoolThread::threadLoop()+24) (BuildId: bb7ca5d12323a310f26473506cf070ed)
#12pc0000000000013654 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+328) (BuildId:594f10db565bb0b9cf0223c7a1990ce5)
#13pc00000000000da200 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) (BuildId:340eb42dd36c6541aec54c306eb4d0ee)
#14pc00000000000769d4 /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId:340eb42dd36c6541aec54c306eb4d0ee)

It can be seen that it is stuck in waitForFreeSlotThenRelock, and its code logic is in frameworks/native/libs/gui/BufferQueueProducer.cpp. Those interested can check it out themselves.

Being stuck in waitForFreeSlotThenRelock means that there are not enough Buffers, because the number of Buffers corresponding to each Layer is fixed (2 or 3 or 4, generally 3 or 4). If all 4 Buffers are in use, then the application calls dequeueBuffer to apply for a free Buffer will not be successful, and it needs to wait. Here, wait for free buffer times out, resulting in the application’s ANR.

The SurfaceFlinger and hwui parts of the Log also frequently print prompts for problem points.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// SurfaceFlinger Log prompts that the number of available buffers is insufficient
19:44:42.371 956 956 E BufferQueueConsumer: [com.xxx.launcher/com.xxx.launcher.Launcher#0] acquireBuffer: max acquired buffer count reached: 2 (max 1)
19:44:42.371 956 956 E BufferLayerConsumer: [com.xxx.launcher/com.xxx.launcher.Launcher#0] updateTexImage: acquire failed: Function not implemented (-38)

// Prompt is dequeueBuffer failed
19:45:45.127 3365 3616 W OpenGLRenderer: dequeueBuffer failed, error = -110; switching to fallback

19:45:45.138 3365 3616 I OpenGLRenderer: Davey! duration=4015ms; Flags=0, IntendedVsync=21148991192817, Vsync=21148991192817, OldestInputEvent=9223372036854775807, NewestInputEvent=0, HandleInputStart=21148991206202, AnimationStart=21148991249587, PerformTraversalsStart=21148991252765, DrawStart=21148991798858, SyncQueued=21148991974796, SyncStart=21152137782451, IssueDrawCommandsStart=21152138724534, SwapBuffers=21156150646824, FrameCompleted=21156152280366, DequeueBufferDuration=0, QueueBufferDuration=0,

// MessageQueue information identifies which Message blocks
19:45:45.142 3365 3365 E ANR_LOG : >>> msg's executing time is too long
19:45:45.142 3365 3365 E ANR_LOG : Blocked msg = { when=-4s2ms what=0 target=android.view.Choreographer$FrameHandler callback=android.view.Choreographer$FrameDisplayEventReceiver } , cost = 4002 ms
19:45:45.142 3365 3365 E ANR_LOG : >>>Current msg List is:
19:45:45.142 3365 3365 E ANR_LOG : Current msg <1> = { when=-4s5ms what=13 target=android.view.ViewRootImpl$ViewRootHandler }
19:45:45.142 3365 3365 E ANR_LOG : Current msg <2> = { when=-4s1ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=com.android.launcher3.util.ViewOnDrawExecutor }
19:45:45.143 3365 3365 E ANR_LOG : Current msg <3> = { when=-4s1ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.widget.Editor$1 }
19:45:45.143 3365 3365 E ANR_LOG : Current msg <4> = { when=-3s23ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.widget.ViewFlipper$2 }
19:45:45.143 3365 3365 E ANR_LOG : Current msg <5> = { when=-1s502ms what=6 target=android.view.inputmethod.InputMethodManager$H arg1=74627 obj=android.view.inputmethod.InputMethodManager$PendingEvent@561bec9 }
19:45:45.143 3365 3365 E ANR_LOG : Current msg <6> = { when=+14s859ms what=0 target=android.os.Handler callback=xxxUIEngineProguard.i.b$b }
19:45:45.143 3365 3365 E ANR_LOG : Current msg <7> = { when=+24d20h30m18s414ms what=0 target=android.view.ViewRootImpl$ViewRootHandler callback=android.widget.ViewFlipper$2 }
19:45:45.143 3365 3365 E ANR_LOG : >>>CURRENT MSG DUMP OVER<<<

This kind of problem is generally a SurfaceFlinger problem, which can be transferred to the corresponding person in charge for further analysis.

Anomalous Log Examples

Main Thread Database Operation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
"main" prio=5 tid=1 Native
held mutexes=
kernel: (couldn't read /proc/self/task/11003/stack)
native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)
native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)
native: #02 pc 00009cab /system/lib/libsqlite.so (???)
native: #03 pc 00011119 /system/lib/libsqlite.so (???)
native: #04 pc 00016455 /system/lib/libsqlite.so (???)
native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)
native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)
native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)
native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)
at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)
at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)
locked <0x0db193bf> (a java.lang.Object)
at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)
at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)
at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)
at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)
at com.xxxx.video.common.data.DataBaseORM.<init>(DataBaseORM.java:46)
at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)
locked <0x017095d5> (a java.lang.Class<com.xxxx.video.common.data.DataBaseORM>)

Large Binder Payloads

1
2
3
4
5
07-2104:43:21.573 1000 148812756E Binder  : Unreasonably large binder replybuffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling1size388568(data:1,32,7274595)
07-2104:43:21.573 1000 148812756E Binder : android.util.Log$TerribleFailure: Unreasonably large binder replybuffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling1size388568(data:1,32,7274595)
07-2104:43:21.607 1000 1488 2951E Binder : Unreasonably large binder replybuffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling1size211848(data:1,23,7274595)
07-2104:43:21.607 1000 1488 2951E Binder : android.util.Log$TerribleFailure: Unreasonably large binder replybuffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling1size211848(data:1,23,7274595)
07-2104:43:21.662 1000 1488 6258E Binder : Unreasonably large binder replybuffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling1size259300(data:1,33,7274595)

Binder Communication Failure

1
2
3
4
07-2106:04:35.580<6>[32837.690321] binder:1698:2362transaction failed29189/-3,size100-0line3042
07-2106:04:35.594<6>[32837.704042] binder:1765:4071transaction failed29189/-3,size76-0line3042
07-2106:04:35.899<6>[32838.009132] binder:1765:4067transaction failed29189/-3,size224-8line3042
07-2106:04:36.018<6>[32838.128903] binder:1765:2397transaction failed29189/-22,size348-0line2916

ANR Related Information Sharing

  1. Xigua Video Stability Governance System Construction 1: Tailor Principle and Practice
  2. Xigua Video Stability Governance System Construction 2: Raphael Principle and Practice
  3. Xigua Video Stability Governance System Construction 3: Sliver Principle and Practice
  4. Xigua Freeze & ANR Optimization Governance and Monitoring System Construction
  5. Toutiao ANR Optimization Practice Series - Design Principle and Influencing Factors
  6. Toutiao ANR Optimization Practice Series - Monitoring Tools and Analysis Ideas
  7. Toutiao ANR Optimization Practice Series - Example Analysis Collection
  8. Toutiao ANR Optimization Practice Series - Barrier Causes Main Thread Deadlock
  9. Toutiao ANR Optimization Practice Series - Goodbye, SharedPreference Waiting
  10. Android ANR | Principle Analysis and Common Cases

Reference

  1. https://duanqz.github.io/2015-10-12-ANR-Analysis#1-%E6%A6%82%E8%A7%88
  2. https://duanqz.github.io/2015-10-12-ANR-Analysis
  3. http://gityuan.com/2016/12/02/app-not-response/
  4. http://gityuan.com/2017/01/01/input-anr/
  5. https://xiaozhuanlan.com/topic/5097486132

About Me && Blog

Below is a personal introduction and related links. I look forward to communicating with you all!

  1. Blogger Personal Introduction: Contains personal WeChat and WeChat group links.
  2. Blog Content Navigation: A navigation of personal blog content.
  3. Excellent Blog Articles Collected and Organized by Individuals - Must-Know for Android Performance Optimization: Welcome everyone to recommend yourself and recommend (WeChat private chat is fine)
  4. Android Performance Optimization Knowledge Planet: Welcome to join, thanks for support~

One person can go faster, a group of people can go further

CATALOG
  1. 1. Common Causes of ANR
    1. 1.1. Issues Within the Current Process
    2. 1.2. Issues in Remote Processes or the System
  2. 2. ANR Case Studies
    1. 2.1. ANR Case: Toutiao - Deadlock
    2. 2.2. ANR Case: ANR caused by Freezing
    3. 2.3. ANR Case: Broadcast Timeout causing ANR
    4. 2.4. ANR Case: Launcher - Input ANR
  3. 3. Anomalous Log Examples
    1. 3.1. Main Thread Database Operation
    2. 3.2. Large Binder Payloads
    3. 3.3. Binder Communication Failure
  4. 4. ANR Related Information Sharing
  5. 5. Reference
  6. 6. About Me && Blog