Android Performance

Android ANR Series 2: ANR Analysis Methodology and Key Logs

Word count: 6.5kReading time: 40 min
2025/02/08
loading

This is the second article in the Android App ANR series, focusing on ANR analysis methodology and key logs. The series includes:

  1. Android App ANR Series 1: Understanding Android ANR Design Philosophy
  2. Android App ANR Series 2: ANR Analysis Methodology and Key Logs
  3. Android App ANR Series 3: ANR Case Studies

ANR (Application Not Responding) - a simple definition that encompasses much of Android’s system design philosophy.

First, ANR falls within the application domain. This differs from SNR (System Not Responding), which reflects issues where the system process (system_server) loses responsiveness, while ANR explicitly confines the problem to applications. SNR is ensured by the Watchdog mechanism (details can be found in Watchdog mechanism and problem analysis); ANR is ensured by the message handling mechanism. Android implements a sophisticated mechanism at the system layer to detect ANR, with the core principle being message scheduling and timeout handling.

Second, the ANR mechanism is primarily implemented at the system layer. All ANR-related messages are scheduled by the system process (system_server) and then dispatched to application processes for actual processing. Simultaneously, the system process designs different timeout limits to track message processing. Once an application mishandles a message, the timeout limit takes effect: it collects system states such as CPU/IO usage, process function call stacks, and reports to the user that a process is not responding (ANR dialog; some ROMs don’t display the ANR dialog but directly crash to the home screen).

Third, ANR issues are essentially performance problems. The ANR mechanism actually imposes restrictions on the application’s main thread, requiring it to complete the most common operations (starting services, processing broadcasts, handling input) within specified time limits. If processing times out, the main thread is considered to have lost the ability to respond to other operations. Time-consuming operations on the main thread, such as intensive CPU computations, heavy I/O, complex UI layouts, etc., all reduce the application’s responsiveness.

Finally, some ANR problems are very difficult to analyze. Sometimes due to underlying system influences, message scheduling fails, and the problematic scenario is hard to reproduce. Such ANR issues often require significant time to understand system behaviors, going beyond the scope of the ANR mechanism itself. Some ANR problems are hard to investigate because there are many factors causing system instability, such as memory fragmentation caused by Linux Kernel bugs, hardware damage, etc. Such low-level causes often leave ANR problems untraceable, and these aren’t application issues at all, wasting much time for application developers. If you’ve worked on entire system development and maintenance, you’ll deeply understand this. Therefore, I cannot guarantee that understanding all content in this chapter will solve every ANR problem. If you encounter very difficult ANR issues, I suggest talking to friends working on Framework, drivers, and kernel, or if the problem is just a one-in-a-hundred-thousand phenomenon that doesn’t affect normal program operation, I’d suggest ignoring it.

– From duanqz

ANR Analysis Methodology

ANR problems mainly stem from two causes: application-side issues and system-side anomalies. When analyzing ANR problems, the most important task is to determine which cause is responsible (though there are some gray areas, such as poorly written code that doesn’t manifest under normal conditions but quickly surfaces when the system has problems).

General ANR Analysis Steps:

  1. Check EventLog for specific ANR time (search for am_anr) to see if it matches the ANR log, determining whether the ANR log is valid. If the ANR log is valid, analyze it to extract useful information: pid, tid, deadlocks, etc. When facing ANR problems, we need to question whether the trace before us is the original crime scene. If there is a lot of information output when ANR occurs, and CPU and I/O resources were tight at that time, the log output timestamp might be delayed by 10 to 20 seconds. So we sometimes need to be vigilant. However, normally, the am_anr output time in EventLog is the earliest and closest to the actual ANR time.
  2. Check MainLog (Android Log) or SystemLog for ANR details (search for “ANR in”), extracting effective information:
    1. Time when ANR occurred
    2. Process that printed the ANR log
    3. Process where ANR occurred
    4. Reason for ANR
    5. CPU load
    6. Memory load
    7. CPU usage statistics time period
    8. CPU usage rate of each process
      1. Total CPU usage rate
      2. Page fault counts
        1. xxx minor indicates page faults in cache, which can be understood as the process performing memory access
        2. xxx major indicates page faults in memory, which can be understood as the process performing I/O operations
    9. CPU usage summary
  3. Combine MainLog (Android Log) and EventLog to extract all useful information within the CPU start and end time points into a file:
    1. Collect key operations, such as unlock, app installation, screen on/off, app launch, etc.
    2. Collect exceptions and system key logs:
      1. System slowdown: such as Slow operation, Slow dispatch, Slow delivery, dvm_lock_sample
      2. Process changes: am_kill, am_proc_died, lowmemorykiller, ANR, app launch relationships, etc.
      3. System information: cpu info, meminfo, binder info (whether full), iowait (whether too high)
    3. Collect all key thread running conditions and thread priorities of the ANR process
    4. Based on the key information file extracted in step 4, further understand the system’s situation and state at that time (recommend using vscode or notepad++ with global search for clues), such as:
      1. Was it in low memory frequently killing processes?
      2. First unlock after reboot with system busy?
      3. Multiple app launches in short time causing system busy?
      4. Or application’s own logic waiting?
  4. If still unclear, add logs to reproduce.

Distinguishing Between Application Problems and System Problems

First, Analyze Whether It’s an Application Problem

The key to analyzing application problems is to understand what the user was doing at that time and what role the application played during this user operation, then proceed with further analysis:

  1. Analyze whether there are time-consuming operations in key component lifecycles that might not be exposed normally but surface when system load increases (suggest adding corresponding logs in key lifecycle functions for easier debugging).
  2. Analyze whether extreme situations occurred causing application logic to be time-consuming, such as large amounts of data processing or import, too many threads running simultaneously, etc. (check application’s CPU / IO usage).
  3. Analyze whether deadlocks exist.
  4. Analyze whether waiting for binder return.
  5. Analyze whether MainThread and RenderThread have abnormalities in Trace files.
  6. Analyze whether MainThread and WorkerThread have waiting relationships in Trace files.

Analyze System State

  1. Check CPU usage (CPU usage rate and CPU load), see if system-related processes or threads like SystemServer, lowmemorykiller, HeapTaskDeamon, Audio, SurfaceFlinger occupy high resources.
  2. Check whether significant IO situations exist, check IO load:
    1. faults: 118172 minor (page faults in cache).
    2. major (page faults in memory).
  3. Check whether system is in low memory:
    1. Check dumpsys meminfo results to see if in low memory.
    2. Check kernel log for frequent lowmemorykiller.
    3. Check event log for frequent applications killed by system low memory policy.
    4. Check kswapd0.
  4. Whether application is frozen: Application in D state, ANR occurs. If the last operation is refriger, then the application was frozen. Normally caused by power optimization; check if there are xxxHansManager : unfreeze logs around; or in Systrace’s Kernel Callstack containing: {kernel callsite when blocked:: "\_\_refrigerator+0xe4/0x198"}.
  5. Check for system anomalies: For instance, custom features causing system business, failing to respond to application Binder requests in time. This requires analyzing SystemServer logs in the Log to see if there are abnormal log outputs.

Continue Analysis: Can Application Resolve It or System Resolve It?

  1. If the ANR problem is transferred to the system side, there is a high probability it’s unsolvable by the app.
  2. If the application code is normal, and there are no extreme usage scenarios or data processing, purely caused by the system or other Apps, then it can be considered a system issue.
  3. If the application code itself has certain issues, which are not exposed in non-extreme scenarios or non-system anomalies, then the application needs to consider mitigation plans.

ANR Log Content

Logs can be captured via test tools, obtained from online platforms, or captured manually:

1
adb bugreport log.zip

ANR Trace Log contains the following content:

Thread Details

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x72c8bbf8 self=0xb400007b0ec10800
| sysTid=5991 nice=-10 cgrp=default sched=0/0 handle=0x7b95f61500
| state=S schedstat=( 807053249 267562324 1494 ) utm=63 stm=17 core=3 HZ=100
| stack=0x7fcccd9000-0x7fcccdb000 stackSize=8192KB
| held mutexes=
native: #00 pc 00000000000c6418 /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000019a9c /system/lib64/libutils.so (android::Looper::pollInner(int)+184)
native: #02 pc 000000000001997c /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+112)
native: #03 pc 0000000000114310 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:339)
at android.os.Looper.loop(Looper.java:198)
at android.app.ActivityThread.main(ActivityThread.java:8142)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1006)

Ideally, the meaning of some fields in the Trace above:

Thread Stack

Below is the call stack corresponding to the thread:

1
2
3
4
5
6
7
8
9
10
11
native: #00 pc 00000000000c6418 /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000019a9c /system/lib64/libutils.so (android::Looper::pollInner(int)+184)
native: #02 pc 000000000001997c /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+112)
native: #03 pc 0000000000114310 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:339)
at android.os.Looper.loop(Looper.java:198)
at android.app.ActivityThread.main(ActivityThread.java:8142)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1006)

Divided into Java and Native. Generally speaking, nativePollOnce is waiting for a Message, which is normal.

Thread State

There is a correspondence between the state in Thread.java and the state in Thread.cpp. It can be seen that the former is more generalized and easier to understand, oriented towards Java users; while the latter is more detailed, oriented towards the internal environment of the virtual machine. The thread states displayed in traces.txt are defined in Thread.cpp. The complete correspondence is as follows:

art/runtime/thread_state.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// State stored in our C++ class Thread.
// When we refer to "a suspended state", or when function names mention "ToSuspended" or
// "FromSuspended", we mean any state other than kRunnable, i.e. any state in which the thread is
// guaranteed not to access the Java heap. The kSuspended state is merely one of these.
enum ThreadState {
// Java
// Thread.State JDWP state
kTerminated = 66, // TERMINATED TS_ZOMBIE Thread.run has returned, but Thread* still around
kRunnable, // RUNNABLE TS_RUNNING runnable
kTimedWaiting, // TIMED_WAITING TS_WAIT in Object.wait() with a timeout
kSleeping, // TIMED_WAITING TS_SLEEPING in Thread.sleep()
kBlocked, // BLOCKED TS_MONITOR blocked on a monitor
kWaiting, // WAITING TS_WAIT in Object.wait()
kWaitingForLockInflation, // WAITING TS_WAIT blocked inflating a thin-lock
kWaitingForTaskProcessor, // WAITING TS_WAIT blocked waiting for taskProcessor
kWaitingForGcToComplete, // WAITING TS_WAIT blocked waiting for GC
kWaitingForCheckPointsToRun, // WAITING TS_WAIT GC waiting for checkpoints to run
kWaitingPerformingGc, // WAITING TS_WAIT performing GC
kWaitingForDebuggerSend, // WAITING TS_WAIT blocked waiting for events to be sent
kWaitingForDebuggerToAttach, // WAITING TS_WAIT blocked waiting for debugger to attach
kWaitingInMainDebuggerLoop, // WAITING TS_WAIT blocking/reading/processing debugger events
kWaitingForDebuggerSuspension, // WAITING TS_WAIT waiting for debugger suspend all
kWaitingForJniOnLoad, // WAITING TS_WAIT waiting for execution of dlopen and JNI on load code
kWaitingForSignalCatcherOutput, // WAITING TS_WAIT waiting for signal catcher IO to complete
kWaitingInMainSignalCatcherLoop, // WAITING TS_WAIT blocking/reading/processing signals
kWaitingForDeoptimization, // WAITING TS_WAIT waiting for deoptimization suspend all
kWaitingForMethodTracingStart, // WAITING TS_WAIT waiting for method tracing to start
kWaitingForVisitObjects, // WAITING TS_WAIT waiting for visiting objects
kWaitingForGetObjectsAllocated, // WAITING TS_WAIT waiting for getting the number of allocated objects
kWaitingWeakGcRootRead, // WAITING TS_WAIT waiting on the GC to read a weak root
kWaitingForGcThreadFlip, // WAITING TS_WAIT waiting on the GC thread flip (CC collector) to finish
kNativeForAbort, // WAITING TS_WAIT checking other threads are not run on abort.
kStarting, // NEW TS_WAIT native thread started, not yet ready to run managed code
kNative, // RUNNABLE TS_RUNNING running in a JNI native method
kSuspended, // RUNNABLE TS_RUNNING suspended by GC or debugger
};

In the Trace file, the last line of the thread name identifies the current state of the thread.

Normal Main Thread Trace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"main" prio=5 tid=1 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x72c8bbf8 self=0xb400007b0ec10800
| sysTid=5991 nice=-10 cgrp=default sched=0/0 handle=0x7b95f61500
| state=S schedstat=( 807053249 267562324 1494 ) utm=63 stm=17 core=3 HZ=100
| stack=0x7fcccd9000-0x7fcccdb000 stackSize=8192KB
| held mutexes=
native: #00 pc 00000000000c6418 /apex/com.android.runtime/lib64/bionic/libc.so (__epoll_pwait+8)
native: #01 pc 0000000000019a9c /system/lib64/libutils.so (android::Looper::pollInner(int)+184)
native: #02 pc 000000000001997c /system/lib64/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+112)
native: #03 pc 0000000000114310 /system/lib64/libandroid_runtime.so (android::android_os_MessageQueue_nativePollOnce(_JNIEnv*, _jobject*, long, int)+44)
at android.os.MessageQueue.nativePollOnce(Native method)
at android.os.MessageQueue.next(MessageQueue.java:339)
at android.os.Looper.loop(Looper.java:198)
at android.app.ActivityThread.main(ActivityThread.java:8142)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1006)

Abnormal Main Thread Trace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 dsCount=0 flags=1 obj=0x70f65400 self=0xe28dae00
| sysTid=22002 nice=-10 cgrp=default sched=0/0 handle=0xe9674474
| state=S schedstat=( 1943159901 290647362 1661 ) utm=159 stm=34 core=7 HZ=100
| stack=0xff041000-0xff043000 stackSize=8192KB
| held mutexes=
at com.facebook.cache.disk.DiskStorageCache.e(DiskStorageCache.java:3)
- waiting to lock <0x0e57c91f> (a java.lang.Object) held by thread 89
at com.xxx.community.util.imageloader.FrescoImageLoader.a(FrescoImageLoader.java:18)
at com.xxx.community.util.imageloader.FrescoImageLoader$2$1.run(FrescoImageLoader.java:2)
at android.os.Handler.handleCallback(Handler.java:938)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:254)
at android.app.ActivityThread.main(ActivityThread.java:8142)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:592)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1006)

CPU Usage Analysis

Searching for the keyword ANR in reveals the CPU usage within a period of time before the ANR. The parsing is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
ActivityManager: ANR in com.xxx.launcher (com.xxx.launcher/.Launcher)(Process Name)
ActivityManager: PID: 5991(Process PID)
ActivityManager: Reason: Input dispatching timed out (6a6083a com.xxx.launcher/com.xxx.launcher.Launcher (server) is not responding. Waited 5001ms for FocusEvent(hasFocus=true))(Reason)
ActivityManager: Parent: com.xxx.launcher/.Launcher
ActivityManager: Load: 15.29 / 5.19 / 1.87(Load indicates CPU load for 1 min, 5 min, 15 min)
ActivityManager: ----- Output from /proc/pressure/memory -----(Memory Pressure)
ActivityManager: somavg10=1.35 avg60=0.31 avg300=0.06 total=346727
ActivityManager: full avg10=0.00 avg60=0.00 avg300=0.00 total=34803
ActivityManager: ----- End output from /proc/pressure/memory -----

// CPU usage within 13s
ActivityManager: CPU usage from 0ms to 13135ms later (2020-09-09 02:09:54.942 to 2020-09-09 02:10:08.077):
ActivityManager: 191%(Total CPU Usage) 1948/system_server: 72%(User Space Usage) user + 119%(Kernel Space Usage) kernel / faults: 78816 minor 9 major
ActivityManager: 10% 2218/android.bg: 3.6% user + 6.6% kernel
ActivityManager: 30% 5991/com.xxx.launcher: 23% user + 6.4% kernel / faults: 118172 minor(Minor page faults) 2 major(Major Page faults)
ActivityManager: 16% 6174/launcher-loader: 13% user + 2.8% kernel
ActivityManager: 3.9% 5991/m.xxx.launcher: 3.1% user + 0.8% kernel
ActivityManager: 20% 6549/com.xxx.xxx: 16% user + 3.7% kernel / faults: 3541 minor
ActivityManager: 10% 6889/DBCacheManager: 8.7% user + 1.2% kernel
ActivityManager: 9.4% 6942/DefaultDispatch: 7.1% user + 2.2% kernel

// CPU usage of each thread of each process within 1s
ActivityManager: CPU usage from 246ms to 1271ms later (2020-09-09 02:09:55.188 to 2020-09-09 02:09:56.213):
ActivityManager: 290% 1948/system_server: 114% user + 176% kernel / faults: 9353 minor
ActivityManager: 32% 5159/LockSettingsSer: 29% user + 2.9% kernel
ActivityManager: 25% 8661/AnrConsumer: 8.8% user + 16% kernel
ActivityManager: 44% 5991/com.xxx.launcher: 37% user + 7.4% kernel / faults: 5756 minor
ActivityManager: 16% 6174/launcher-loader: 13% user + 3.7% kernel
ActivityManager: 14% 5991/m.xxx.launcher: 14% user + 0% kernel
ActivityManager: 37% 6549/com.xxx.xxx: 28% user + 9.3% kernel / faults: 153 minor
ActivityManager: 37% 6942/DefaultDispatch: 28% user + 9.3% kernel
ActivityManager: 20% 5962/com.android.phone: 14% user + 5.5% kernel / faults: 1345 minor
ActivityManager: 11% 5962/m.android.phone: 7.4% user + 3.7% kernel

CPU Load

1
ActivityManager: Load: 15.29 / 5.19 / 1.871

The three numbers after Load mean the average system load in 1 minute, 5 minutes, and 15 minutes respectively. When the CPU is completely idle, the average load is 0; when the CPU workload is saturated, the average load is 1. The Load helps to judge whether the system load is too heavy.

There is a vivid metaphor: Imagine the CPU as a bridge with only one lane. All vehicles must pass through this lane. A system load of 0 means there are no cars on the bridge. A system load of 0.5 means half of the bridge section has cars. A system load of 1.0 means all sections of the bridge have cars, that is, the bridge is already “full”. A system load of 2.0 means there are too many vehicles, the bridge is already full (100%), and there are clearly as many vehicles waiting to get on the bridge. The traffic capacity of the bridge is the maximum workload of the CPU; the vehicles on the bridge are the waiting processes.

The rule of thumb is:

  1. When the system load is consistently greater than 0.7, you must verify it, where is the problem, to prevent the situation from getting worse.
  2. When the system load is consistently greater than 1.0, you must find a solution to bring this value down.
  3. When the system load reaches 5.0, it indicates that your system has serious problems.

Modern mobile phones use multi-core CPU architectures. Eight cores are common, meaning the CPU processing capacity is multiplied by 8. The running time of each core can be obtained from the following file: /sys/devices/system/cpu/cpu%d/cpufreq/stats/time_in_state, where %d represents the CPU core. The file records the running time of the CPU at various frequencies from boot to reading the file, unit: 10 ms.

Memory Load

1
2
3
4
ActivityManager: ----- Output from /proc/pressure/memory -----(Memory Pressure)
ActivityManager: somavg10=1.35 avg60=0.31 avg300=0.06 total=346727
ActivityManager: full avg10=0.00 avg60=0.00 avg300=0.00 total=34803
ActivityManager: ----- End output from /proc/pressure/memory -----

Memory load is obtained from /proc/pressure/memory.

avg10, avg60, avg300 represent the percentage of blocking time within the time period of 10s, 60s, and 300s respectively. total is the total accumulated time in milliseconds.

The some line represents the percentage of time that at least one task is blocked on a certain resource. The full line represents the percentage of time that all non-idle tasks are blocked at the same time. During this period, the CPU is completely wasted, which will bring serious performance problems. Let’s take IO some and full as an example. Suppose that in a 60-second time period, the system has two tasks. The running status within the 60-second cycle is shown in the figure below:

The red shaded part indicates that the task enters the blocked state due to waiting for IO resources. The part where Task A and Task B are blocked at the same time is full, accounting for 16.66%; the part where at least one task is blocked (including the part where only Task B is blocked) is some, accounting for 50%.

some and full are the sum of the proportion of blocking time within a certain period of time. The blocking time is not necessarily continuous, as shown in the figure below:

For details, please refer to the article by Kernel Craftsman: https://blog.csdn.net/feelabclihu/article/details/105534140

IO Load

1
ActivityManager:   30% 5991/com.xxx.launcher: 23% user + 6.4% kernel / faults: 118172 minor(cache page faults) 2 major(memory page faults)
  1. Minor: Page faults in high-speed cache, referring to Minor Page Fault (MnPF). When disk data is loaded into memory, the kernel will issue an MnPF message when reading it again. When a file is read and written for the first time, there will be many MPFs. After being cached into memory, accessing MPF again will be very few, and MnPF will increase instead. This is the result of the caching technology adopted by the kernel to reduce inefficient disk I/O operations. It can be understood as the process performing memory access.
  2. Major: Page faults in memory, referring to Major Page Fault (MPF). When the kernel reads data, it will search the CPU’s high-speed cache and physical memory successively. If not found, it will issue an MPF message requesting to load data into memory. It can be understood as the process performing IO operations.

If there are a large number of major, it means that the IO operation load was relatively high at that time.

Process Load

1
ActivityManager:   30% 5991/com.xxx.launcher: 23% user + 6.4% kernel / faults: 118172 minor(Minor page faults) 2 major(Major Page faults)
  1. 23% user: cpu usage in user mode
  2. 6.4% kernel: cpu usage in kernel mode

CPU Anomalous Process

  1. SystemServer cpu usage is high: the overall system operation will be slow.
  2. kswapd0 cpu usage is high: the overall system operation will be slow, causing various ANRs. Transfer the problem to “Memory Optimization” and ask them to optimize.
  3. logd CPU usage is high: causes system lag and ANR because the operation of each process outputting LOG is blocked and executes extremely slowly.
  4. Vold CPU usage is too high: causes system lag and ANR, please verify with the colleague responsible for storage first.
  5. Application itself CPU usage is high: high probability of the application’s own problem.
  6. Application is in D state, ANR occurs: If the last operation is refriger, then the application is frozen. Normally, it is caused by power consumption optimization. You can check whether there is a Log like xxxHansManager : unfreeze before and after; or in Systrace, the Kernel Callstack shows: {kernel callsite when blocked:: "__refrigerator+0xe4/0x198"}.
  7. “+” before CPU usage rate: Some processes have a “+” sign before their CPU usage rate, such as cat and zygote64, indicating that these processes did not exist in the last CPU statistics time slice, but ran in this CPU statistics time slice. Similarly, there is a “-“ sign, indicating that these processes died during the two CPU statistics time slices.

Key System Log Introduction

Application Freezing

1
xxxHansManager : freeze uid: 10245 package: com.tencent.mm reason: LcdOff

If the freezing logic has a Bug, it will also cause the application to generate an ANR. This line of Log is relatively simple, mainly the frozen process information and the reason for freezing.

ActivityManager : Slow operation

When AMS performs key tasks, if the task takes longer than 50 ms, it will print the corresponding Log.

frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

1
2
3
4
5
6
7
void checkTime(long startTime, String where) {
long now = SystemClock.uptimeMillis();
if ((now-startTime) > 50) {
// If we are taking more than 50ms, log about it.
Slog.w(TAG, "Slow operation: " + (now-startTime) + "ms so far, now at " + where);
}
}

The corresponding Log is as follows. If such Logs are printed frequently in the system, it means that the system is currently in a relatively stuck state, and system factors must be considered during analysis.

1
2
3
ActivityManager: Slow operation: 138ms so far, now at startProcess: done updating battery stats
ActivityManager: Slow operation: 138ms so far, now at startProcess: building log message
ActivityManager: Slow operation: 138ms so far, now at startProcess: starting to update pids map

Looper : Slow dispatch

1
Looper  : Slow dispatch took 418ms main h=android.app.ActivityThread$H c=android.app.-$$Lambda$LoadedApk$ReceiverDispatcher$Args$_BumDX2UKsnxLVrE6UJsJZkotuA@e68bdc4 m=0

Looper : Slow delivery

1
Looper  : Slow delivery took 361ms android.ui h=com.android.server.am.ActivityManagerService$UiHandler c=null m=53

Looper:Slow Looper

1
W/Looper: Slow Looper main: Activity com.androidperformance.memoryfix/.MainActivity is 439ms late (wall=0ms running=0ms ClientTransaction{ callbacks=[android.app.servertransaction.TopResumedActivityChangeItem] }) because of 3 msg, msg 2 took 268ms (seq=2 running=207ms runnable=15ms late=1ms h=android.app.ActivityThread$H w=110), msg 3 took 171ms (seq=3 running=140ms runnable=5ms io=1ms late=268ms h=android.app.ActivityThread$H w=159)

onTrimMemory

1
ClockTag AlarmClockApplication: onTrimMemory: 80

dvm_lock_sample

When the blocked time of a thread waiting for a lock exceeds the threshold (for example: 500ms), the current lock holding status is output.

1
dvm_lock_sample:[system_server,1,Binder_9,1500,ActivityManagerService.java,6403,-,1448,0]
  1. system_server: Binder_9 executes to line 6403 of ActivityManagerService.java, and has been waiting for the AMS lock.
  2. “-“ means that the lock is held by the same file, that is, the lock is held by line 1448 of the same file code, causing the Binder_9 thread to be blocked for 1500ms.

binder_sample

binder_sample: Monitors the binder transaction duration of the main thread of each process. When it exceeds the threshold (for example: 500ms), the corresponding target call information is output.

1
6628  6628 I binder_sample: [android.view.accessibility.IAccessibilityManager,6,2010,com.xxx.community,100]
  1. The process is 6628, main thread 6628
  2. Executing android.view.accessibility.IAccessibilityManager interface
  3. The corresponding method code = 6 (i.e., TRANSACTION_addAccessibilityInteractionConnection)
  4. The time spent is 2010 ms
  5. The package where the block is located is com.xxx.community
  6. The last parameter is the sample ratio

Find the corresponding interface function, such as which function corresponds to Code = 6 in IAccessibilityManager in the above example. You can search for FIRST_CALL_TRANSACTION in cs.android.com, click call, and then view the corresponding IAccessibilityManager file in the out directory (scroll all the way down until you can search for IAccessibilityManager).

Among them, the corresponding static final int TRANSACTION_addAccessibilityInteractionConnection = (android.os.IBinder.FIRST_CALL_TRANSACTION + 5)

Long monitor contention

1
16809 14216 W system_server: Long monitor contention with owner InputDispatcher (17039) at android.content.res.Configuration com.android.server.wm.ActivityTaskManagerService.getGlobalConfigurationForPid(int)(ActivityTaskManagerService.java:1066) waiters=0 in boolean com.android.server.wm.WindowProcessController.hasActivities() for 141ms

art/runtime/monitor.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
std::string Monitor::PrettyContentionInfo(const std::string& owner_name,
pid_t owner_tid,
ArtMethod* owners_method,
uint32_t owners_dex_pc,
size_t num_waiters) {
Locks::mutator_lock_->AssertSharedHeld(Thread::Current());
const char* owners_filename;
int32_t owners_line_number = 0;
if (owners_method != nullptr) {
TranslateLocation(owners_method, owners_dex_pc, &owners_filename, &owners_line_number);
}
std::ostringstream oss;
oss << "monitor contention with owner " << owner_name << " (" << owner_tid << ")";
if (owners_method != nullptr) {
oss << " at " << owners_method->PrettyMethod();
oss << "(" << owners_filename << ":" << owners_line_number << ")";
}
oss << " waiters=" << num_waiters;
return oss.str();
}

Binder Thread

When the thread pool of processes such as system_server is used up and there are no idle threads, binder communication is in a starvation state. If the starvation state exceeds a certain threshold, information is output.

1
E IPCThreadState: binder thread pool (31 threads) starved for 120 ms

am_kill

1
am_kill : [0,18555,com.sina.weibo.image,945,remove task]1

In the sentence above, remove task refers to the reason why this application was killed.

The following are the Reasons corresponding to various killed situations. Sometimes we need to see the reason why the application was killed to judge whether the system is normal or the user’s operation steps:

force-stop

Abnormal Process Kill

Active Process Kill

Schedueled Process Kill

Other Process Kill

All the process kills introduced above call the ProcessRecord.kill() method, which inevitably outputs the corresponding EventLog. So what other scenarios of killing processes will not output log?

  1. Process.killProcess(int pid) // Can kill any specified process, or send signal directly
  2. adb shell kill -9 pid // Can kill any specified process
  3. Killed by lmk directly

That is to say, if a process is killed without log output, it may be killed by directly calling kill or sending a signal, or perhaps killed by lmk.

am_proc_died

1
am_proc_died: [0,13210,com.xxx.gallery3d,935,19]

am_anr

1
am_anr : [0,8769,com.android.updater,952680005,Broadcast of Intent { act=android.intent.action.BOOT_COMPLETED flg=0x9000010 cmp=com.android.updater/.BootCompletedReceiver (has extras) }]
  1. Process pid: 8769
  2. Process name: com.android.updater
  3. The type of ANR occurred is: Specific class or reason for BroadcastTimeout: { act=android.intent.action.BOOT_COMPLETED flg=0x9000010 cmp=com.android.updater/.BootCompletedReceiver (has extras) }

Lowmemorykiller

The log of lowmemorykiller is in the kernel log.

1
2
lowmemorykiller: Kill 'com.heytap.themestore' (15545), uid 10136, oom_adj 995 to free 105068kB
lowmemorykiller: Reclaimed 105068kB at oom_adj 606

Killing applications through the lowmemorykiller mechanism:

  1. com.heytap.themestore process name
  2. 15545: PID
  3. 10136: UID
  4. 995: oom_adj
  5. 105068kB: released memory
  6. 606: min_adj

Blocked msg

Internally added Log, when the execution time of Message exceeds 1.5s, the following Log will be printed. From the following Log, we can see:

  1. The Message that timed out, and the execution time of this Message
  2. The Message blocked by this Message

From this Log, you can observe whether your Message execution blocked other Messages or your Message was blocked by other Messages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
E ANR_LOG : >>> msg's executing time is too long
E ANR_LOG : Blocked msg = { when=-32s683ms what=110 target=android.app.ActivityThread$H obj=AppBindData{appInfo=ApplicationInfo{bd8d51e com.android.contacts}} } , cost = 32436 ms
E ANR_LOG : >>>Current msg List is:
E ANR_LOG : Current msg <1> = { when=-32s672ms what=140 target=android.app.ActivityThread$H arg1=5 }
E ANR_LOG : Current msg <2> = { when=-32s671ms what=114 target=android.app.ActivityThread$H obj=CreateServiceData{token=android.os.BinderProxy@f7611ff className=com.android.contacts.xxxAppServicesManagerClient packageName=com.android.contacts intent=null} }
E ANR_LOG : Current msg <3> = { when=-32s671ms what=121 target=android.app.ActivityThread$H obj=BindServiceData{token=android.os.BinderProxy@f7611ff intent=Intent { cmp=com.android.contacts/.xxxAppServicesManagerClient }} }
E ANR_LOG : Current msg <4> = { when=-31s658ms what=1 target=android.os.AsyncTask$InternalHandler obj=android.os.AsyncTask$AsyncTaskResult@75e517c }
E ANR_LOG : Current msg <5> = { when=-29s750ms what=140 target=android.app.ActivityThread$H arg1=10 }
E ANR_LOG : Current msg <6> = { when=-29s103ms what=118 target=android.app.ActivityThread$H obj={1.0 460mcc3mnc [zh_CN] ldltr sw360dp w360dp h622dp 480dpi nrml long port finger -keyb/v/h -nav/h appBounds=Rect(0, 0 - 1080, 1920) s.10mThemeChanged = 0mThemeChangedFlags = 0mFlipFont = 0}
E ANR_LOG : Current msg <7> = { when=-28s370ms what=118 target=android.app.ActivityThread$H obj={1.0 460mcc11mnc [zh_CN] ldltr sw360dp w360dp h622dp 480dpi nrml long port finger -keyb/v/h -nav/h appBounds=Rect(0, 0 - 1080, 1920) s.11mThemeChanged = 0mThemeChangedFlags = 0mFlipFont = 0} }
E ANR_LOG : Current msg <8> = { when=-27s821ms what=122 target=android.app.ActivityThread$H obj=BindServiceData{token=android.os.BinderProxy@f7611ff intent=Intent { cmp=com.android.contacts/.xxxAppServicesManagerClient }} }
E ANR_LOG : Current msg <9> = { when=-27s821ms what=116 target=android.app.ActivityThread$H obj=android.os.BinderProxy@f7611ff }
E ANR_LOG : Current msg <10> = { when=-27s654ms what=114 target=android.app.ActivityThread$H obj=CreateServiceData{token=android.os.BinderProxy@e23cf1b className=com.android.contacts.xxxAppServicesManagerClient packageName=com.android.contacts intent=null} }
E ANR_LOG : >>>CURRENT MSG DUMP OVER<<<

Common Scenario Logs Introduction

Screen Off Log

The key is power_screen_broadcast_done, screen_toggled and power_screen_state.

1
2
3
4
5
6
7
8
9
10
11
12
1810  2470 I intercept_power: [ACTION_DOWN,1,0]
1810 2470 I intercept_power: [ACTION_UP,0,0]
1810 2470 I power_sleep_requested: 0

1810 1810 I power_screen_state: [0,2,0,0,1254]
1810 1810 I screen_toggled: 0 // 0 means screen off, 1 means screen on
1810 1810 I power_screen_broadcast_send: 1
1810 1825 I wm_set_keyguard_shown: [1,0,0,setKeyguardShown] // (keyguardShowing|1),(aodShowing|1),(keyguardGoingAway|1),(Reason|3)

2768 2768 I sysui_status_bar_state: [1,1,0,0,0,1]

1810 1810 I power_screen_broadcast_done: [0,611,1] // 0 means screen off, 611 is screen off time

Screen On Log

The key is power_screen_broadcast_done, screen_toggled and power_screen_state.

Where power_screen_state:

  1. offOrOn|1|5
  2. becauseOfUser|1|5
  3. totalTouchDownTime|2|3
  4. touchCycles|1|1
  5. latency|1|3
1
2
3
4
5
6
1810  2470 I intercept_power: [ACTION_DOWN,1,0]
1810 2470 I intercept_power: [ACTION_UP,1,0]
1810 1810 I screen_toggled: 1 // 1 means screen on
1810 1810 I power_screen_broadcast_send: 1
1810 1810 I power_screen_state: [1,0,0,0,319]
1810 1810 I power_screen_broadcast_done: [1,633,1] // 1 means screen on, 633 is screen on time

Unlock Complete Log

Values of wm_set_keyguard_shown:

  1. (keyguardShowing|1): Whether Keyguard is showing
  2. (aodShowing|1): Whether aodShowing is showing
  3. (keyguardGoingAway|1): Keyguard disappearing
  4. (Reason|3): Reason
1
2
3
4
5
6
7
8
9
10
 1810  5711 I wm_set_keyguard_shown: [1,0,1,keyguardGoingAway]
// Launcher Resume
1810 5711 I wm_set_resumed_activity: [0,com.xxx.launcher/.Launcher,resumeTopActivityInnerLocked]
1810 5711 I wm_resume_activity: [0,93093883,63,com.xxx.launcher/.Launcher]
20615 20615 I wm_on_restart_called: [93093883,com.xxx.launcher.Launcher,performRestartActivity]
20615 20615 I wm_on_start_called: [93093883,com.xxx.launcher.Launcher,handleStartActivity]
20615 20615 I wm_on_resume_called: [93093883,com.xxx.launcher.Launcher,RESUME_ACTIVITY]
20615 20615 I wm_on_top_resumed_gained_called: [93093883,com.xxx.launcher.Launcher,topWhenResuming]

1810 5711 I wm_set_keyguard_shown: [0,0,1,setKeyguardShown] // keyguard not showing, aod not showing, keyguardGoingAway, reason is called setKeyguardShown to show desktop

Notification Panel Pull Down Log

1
2
3
4
5
6
2768  2768 I sysui_multi_action: [757,803,799,panel_open,802,1]
2768 2768 I sysui_multi_action: [757,1328,758,4,1326,29,1327,0,1329,0]
1810 3004 I notification_visibility: [-1|android|55|null|1000,1,457483196,457483196,0,4]
1810 3004 I notification_visibility: [0|com.android.systemui|10005|null|10132,1,97168545,97168545,0,2]
1810 3004 I notification_visibility: [0|com.android.systemui|2147483647|ranker_group|10132|ranker_group,1,97168236,97168236,0,1]
1810 3004 I notification_visibility: [-1|com.android.systemui|10006|null|10132,1,82586063,82586063,0,0]

Notification Panel Collapse Log

1
2
3
4
5
2768  2768 I sysui_multi_action: [757,111,758,2]
1810 5473 I notification_visibility: [-1|android|55|null|1000,0,457546516,457546516,0,4]
1810 5473 I notification_visibility: [0|com.android.systemui|10005|null|10132,0,97231865,97231865,0,2]
1810 5473 I notification_visibility: [0|com.android.systemui|2147483647|ranker_group|10132|ranker_group,0,97231557,97231557,0,1]
1810 5473 I notification_visibility: [-1|com.android.systemui|10006|null|10132,0,82649384,82649384,0,0]

App Launch Log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Create Task
1810 5473 I wm_task_created: [100,-1]
1810 5473 I wm_stack_created: 100
1810 5473 I wm_create_task: [0,100]

// Create Activity
1810 5473 I wm_create_activity: [0,231348670,100,com.androidperformance.memoryfix/.MainActivity,android.intent.action.MAIN,NULL,NULL,270532608]

// Launcher goes through pause flow
1810 5473 I wm_pause_activity: [0,93093883,com.xxx.launcher/.Launcher,userLeaving=true]
0615 20615 I wm_on_top_resumed_lost_called: [93093883,com.xxx.launcher.Launcher,topStateChangedWhenResumed]
0615 20615 I wm_on_paused_called: [93093883,com.xxx.launcher.Launcher,performPause]
1810 3720 I wm_add_to_stopping: [0,93093883,com.xxx.launcher/.Launcher,makeInvisible]

// Start Activity
1810 2045 I am_proc_start: [0,18803,10263,com.androidperformance.memoryfix,pre-top-activity,{com.androidperformance.memoryfix/com.androidperformance.memoryfix.MainActivity}]
1810 3428 I am_proc_bound: [0,18803,com.androidperformance.memoryfix]
1810 3428 I wm_restart_activity: [0,231348670,100,com.androidperformance.memoryfix/.MainActivity]
1810 3428 I wm_set_resumed_activity: [0,com.androidperformance.memoryfix/.MainActivity,minimalResumeActivityLocked]
8803 18803 I wm_on_create_called: [231348670,com.androidperformance.memoryfix.MainActivity,performCreate]
8803 18803 I wm_on_start_called: [231348670,com.androidperformance.memoryfix.MainActivity,handleStartActivity]
8803 18803 I wm_on_resume_called: [231348670,com.androidperformance.memoryfix.MainActivity,RESUME_ACTIVITY]
8803 18803 I wm_on_top_resumed_gained_called: [231348670,com.androidperformance.memoryfix.MainActivity,topStateChangedWhenResumed]
1810 2034 I wm_activity_launch_time: [0,231348670,com.androidperformance.memoryfix/.MainActivity,471]

// Launcher goes through stop flow
1810 1978 I wm_stop_activity: [0,93093883,com.xxx.launcher/.Launcher]
0615 20615 I wm_on_stop_called: [93093883,com.xxx.launcher.Launcher,STOP_ACTIVITY_ITEM]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// App launch from Launcher, focus change: Launcher => null => App
WindowManager: Changing focus from Window{b0416d7 u0 com.xxx.launcher/com.xxx.launcher.Launcher} to null,diplayid=0
WindowManager: Changing focus from null to Window{10f5145 u0 com.android.settings/com.android.settings.Settings},diplayid=0

// Return to Launcher from App, focus change: App => null => Launcher
WindowManager: Changing focus from Window{10f5145 u0 com.android.settings/com.android.settings.Settings} to null,diplayid=0
WindowManager: Changing focus from null to Window{b0416d7 u0 com.xxx.launcher/com.xxx.launcher.Launcher},diplayid=0

// Enter Lockscreen from App interface: focus change: App => null => Lockscreen
WindowManager: Changing focus from Window{10f5145 u0 com.android.settings/com.android.settings.Settings} to null,diplayid=0
WindowManager: Changing focus from null to Window{82e5f30 u0 NotificationShade},diplayid=0

// Unlock from Lockscreen interface to App, focus change: Lockscreen => App
WindowManager: Changing focus from Window{82e5f30 u0 NotificationShade} to Window{10f5145 u0 com.android.settings/com.android.settings.Settings},diplayid=0

ANR Related Information Sharing

  1. Xigua Video Stability Governance System Construction 1: Tailor Principle and Practice
  2. Xigua Video Stability Governance System Construction 2: Raphael Principle and Practice
  3. Xigua Video Stability Governance System Construction 3: Sliver Principle and Practice
  4. Xigua Freeze & ANR Optimization Governance and Monitoring System Construction
  5. Toutiao ANR Optimization Practice Series - Design Principle and Influencing Factors
  6. Toutiao ANR Optimization Practice Series - Monitoring Tools and Analysis Ideas
  7. Toutiao ANR Optimization Practice Series - Example Analysis Collection
  8. Toutiao ANR Optimization Practice Series - Barrier Causes Main Thread Deadlock
  9. Toutiao ANR Optimization Practice Series - Goodbye, SharedPreference Waiting
  10. Android ANR | Principle Analysis and Common Cases

Reference

  1. https://duanqz.github.io/2015-10-12-ANR-Analysis#1-%E6%A6%82%E8%A7%88
  2. https://duanqz.github.io/2015-10-12-ANR-Analysis
  3. http://gityuan.com/2016/12/02/app-not-response/
  4. http://gityuan.com/2017/01/01/input-anr/
  5. https://xiaozhuanlan.com/topic/5097486132

About Me && Blog

Below is a personal introduction and related links. I look forward to communicating with you all!

  1. Blogger Personal Introduction: Contains personal WeChat and WeChat group links.
  2. Blog Content Navigation: A navigation of personal blog content.
  3. Excellent Blog Articles Collected and Organized by Individuals - Must-Know for Android Performance Optimization: Welcome everyone to recommend yourself and recommend (WeChat private chat is fine)
  4. Android Performance Optimization Knowledge Planet: Welcome to join, thanks for support~

One person can go faster, a group of people can go further

CATALOG
  1. 1. ANR Analysis Methodology
    1. 1.1. General ANR Analysis Steps:
    2. 1.2. Distinguishing Between Application Problems and System Problems
      1. 1.2.1. First, Analyze Whether It’s an Application Problem
      2. 1.2.2. Analyze System State
    3. 1.3. Continue Analysis: Can Application Resolve It or System Resolve It?
  2. 2. ANR Log Content
    1. 2.1. Thread Details
    2. 2.2. Thread Stack
    3. 2.3. Thread State
      1. 2.3.1. Normal Main Thread Trace
      2. 2.3.2. Abnormal Main Thread Trace
    4. 2.4. CPU Usage Analysis
    5. 2.5. CPU Load
    6. 2.6. Memory Load
    7. 2.7. IO Load
    8. 2.8. Process Load
    9. 2.9. CPU Anomalous Process
  3. 3. Key System Log Introduction
    1. 3.1. Application Freezing
    2. 3.2. ActivityManager : Slow operation
    3. 3.3. Looper : Slow dispatch
    4. 3.4. Looper : Slow delivery
    5. 3.5. Looper:Slow Looper
    6. 3.6. onTrimMemory
    7. 3.7. dvm_lock_sample
    8. 3.8. binder_sample
    9. 3.9. Long monitor contention
    10. 3.10. Binder Thread
    11. 3.11. am_kill
      1. 3.11.1. force-stop
      2. 3.11.2. Abnormal Process Kill
      3. 3.11.3. Active Process Kill
      4. 3.11.4. Schedueled Process Kill
      5. 3.11.5. Other Process Kill
    12. 3.12. am_proc_died
    13. 3.13. am_anr
    14. 3.14. Lowmemorykiller
    15. 3.15. Blocked msg
  4. 4. Common Scenario Logs Introduction
    1. 4.1. Screen Off Log
    2. 4.2. Screen On Log
    3. 4.3. Unlock Complete Log
    4. 4.4. Notification Panel Pull Down Log
    5. 4.5. Notification Panel Collapse Log
    6. 4.6. App Launch Log
    7. 4.7. Window Focus Related Flows
  5. 5. ANR Related Information Sharing
  6. 6. Reference
  7. 7. About Me && Blog