Hi,Kyson
2 min readJan 30, 2021

--

总体流程

三部分

1. App进程

2. PC脚本

3. Android系统服务

![systrace_framework](./systrace_framework.png)

## PC脚本

systrace.py中指定运行systrace\catapult\systrace\systrace\run_systrace.py

```python

def main_impl(arguments):

controller.StartTracing()

controller.StopTracing()

```

systrace\catapult\systrace\systrace\tracing_controller.py

开启controller agent和child agent

child agent固定有这些,会根据传入的参数从这里选择有效的模块

AGENT_MODULES = [android_process_data_agent, android_cgroup_agent,

atrace_agent, atrace_from_file_agent, atrace_process_dump,

ftrace_agent, walt_agent]

```python

def StartTracing(self):

if not self._controller_agent.StartAgentTracing(

self._controller_config,

timeout=self._controller_config.timeout):

print ‘Unable to start controller tracing agent.’

return False

for agent_and_config in self._child_agents_with_config:

agent = agent_and_config.agent

config = agent_and_config.config

if agent.StartAgentTracing(config,

timeout=self._controller_config.timeout):

succ_agents.append(agent)

```

self._controller_agent.StartAgentTracing 貌似只是记录个日志

主要还在各个agent

android_process_data_agent 用命令获取进程snapshot

```

“ps -A -o USER,PID,PPID,VSIZE,RSS,WCHAN,ADDR=PC,S,NAME,COMM” \

“&& ps -AT -o USER,PID,TID,CMD”

```

atrace_agent

```python

@py_utils.Timeout(tracing_agents.START_STOP_TIMEOUT)

def StartAgentTracing(self, config, timeout=None):

self._tracer_args = _construct_atrace_args(config,

self._categories)

self._device_utils.RunShellCommand(

self._tracer_args + [‘ — async_start’], check_return=True)

return True

```

执行命令atrace,命令路径`system/bin/atrace`

```python

def _construct_atrace_args(config, categories):

ATRACE_BASE_ARGS = [‘atrace’]

atrace_args = ATRACE_BASE_ARGS[:]

return atrace_args

```

atrace运行在Android设备中,代码:https://cs.android.com/android/platform/superproject/+/master:frameworks/native/cmds/atrace/atrace.rc

## Android系统服务

atrace.rc主要在Android设备启动时做些初始化,主要是文件的权限设置,初始化关掉trace开关等等

重点是frameworks/native/cmds/atrace/atrace.cpp

设置各种trace之后开始

其中注意点,g_debugAppCmdLine为需要debug的app,这个参数需要指定为我们需要调试的app的包名

```cpp

int main(int argc, char **argv)

{

for (;;) {

switch(ret) {

case ‘a’:

g_debugAppCmdLine = optarg;

break;

}

}

if (traceStart) {

ok &= setUpUserspaceTracing();

}

if (ok && traceStart && !onlyUserspace) {

ok &= setUpKernelTracing();

ok &= setUpVendorTracing();

ok &= startTrace();

}

if (ok && traceStart) {

if (!traceStream && !onlyUserspace) {

printf(“capturing trace…”);

fflush(stdout);

}

}

```

g_debugAppCmdLine字段会在setup阶段使用,和系统的关键服务合并成一个包名列表,设置到系统环境变量中,给后续的trace读取(Trace_nativeGetEnabledTags)

```cpp

static bool setUpUserspaceTracing()

{

std::string packageList(g_debugAppCmdLine);

if (coreServicesTagEnabled) {

if (!packageList.empty()) {

packageList += “,”;

}

packageList += android::base::GetProperty(k_coreServicesProp, “”);

}

ok &= setAppCmdlineProperty(&packageList[0]);

return ok;

}

```

startTrace就是给文件`static const char* k_tracingOnPath = “tracing_on”;`写入1,这里是ftrace的机制

## App进程

android\os\Trace.java

```java

public static void beginSection(@NonNull String sectionName) {

if (isTagEnabled(TRACE_TAG_APP)) {

if (sectionName.length() > MAX_SECTION_NAME_LEN) {

throw new IllegalArgumentException(“sectionName is too long”);

}

nativeTraceBegin(TRACE_TAG_APP, sectionName);

}

}

```

isTagEnabled(TRACE_TAG_APP)默认release是false,这里需要强制打开

frameworks/base/core/jni/android_os_Trace.cpp

```cpp

static void android_os_Trace_nativeTraceBegin(JNIEnv* env, jclass,

jlong tag, jstring nameStr) {

withString(env, nameStr, [tag](char* str) {

atrace_begin(tag, str);

});

}

```

system/core/libcutils/include/cutils/trace.h

```cpp

#define ATRACE_BEGIN(name) atrace_begin(ATRACE_TAG, name)

static inline void atrace_begin(uint64_t tag, const char* name)

{

if (CC_UNLIKELY(atrace_is_tag_enabled(tag))) {

void atrace_begin_body(const char*);

atrace_begin_body(name);

}

}

```

system/core/libcutils/trace-dev.cpp

```cpp

void atrace_begin_body(const char* name)

{

WRITE_MSG(“B|%d|”, “%s”, name, “”);

}

```

```cpp

#define WRITE_MSG(format_begin, format_end, name, value) { \

char buf[ATRACE_MESSAGE_LENGTH] __attribute__((uninitialized)); \

int pid = getpid(); \

int len = snprintf(buf, sizeof(buf), format_begin “%s” format_end, pid, \

name, value); \

if (len >= (int) sizeof(buf)) { \

/* Given the sizeof(buf), and all of the current format buffers, \

* it is impossible for name_len to be < 0 if len >= sizeof(buf). */ \

int name_len = strlen(name) — (len — sizeof(buf)) — 1; \

/* Truncate the name to make the message fit. */ \

ALOGW(“Truncated name in %s: %s\n”, __FUNCTION__, name); \

len = snprintf(buf, sizeof(buf), format_begin “%.*s” format_end, pid, \

name_len, name, value); \

} \

write(atrace_marker_fd, buf, len); \

}

```

atrace_marker_fd文件fd从这里来

system/core/libcutils/trace-dev.cpp

```cpp

static void atrace_init_once()

{

atrace_marker_fd = open(“/sys/kernel/tracing/trace_marker”, O_WRONLY | O_CLOEXEC);

if (atrace_marker_fd == -1) {

atrace_marker_fd = open(“/sys/kernel/debug/tracing/trace_marker”, O_WRONLY | O_CLOEXEC);

}

if (atrace_marker_fd == -1) {

ALOGE(“Error opening trace file: %s (%d)”, strerror(errno), errno);

atrace_enabled_tags = 0;

} else {

atrace_enabled_tags = atrace_get_property();

}

}

```

所以实际上记录App的trace事件就是向trace_marker文件中写日志

## 为什么OverHead很低

ATrace每个Event大概需要1–10us,大概是一次jni方法调用,一个字符串拼接和一次系统调用的roundtrip

而如果完全使用用户态的记录方式,一般会这么写

public static void testMethod(){

long startTime = System.nanoTime();

long endTime = System.nanoTime();

// 记录方法的开始结束时间,可能需要创建类,如果支持多线程,需要Thread.currentThread()

record(“testMethod”, startTime, endTime);

}

System.nanoTime()会有一次jni,一次系统调用__clock_gettime,配合其他数据结构、类和多线程的支持,将会有更多的损耗

## 自动化测试

adb shell

atrace -o /data/trace.txt -a com.xunmeng.pinduoduo

ftrace各个文件含义:https://hotttao.github.io/2020/01/03/linux_perf/03_ftrace/

--

--