总体流程
三部分
1. App进程
2. PC脚本
3. Android系统服务
![systrace_framework](./systrace_framework.png)
## PC脚本
systrace.py中指定运行systrace\catapult\systrace\systrace\run_systrace.py
```python
def main_impl(arguments):
…
controller.StartTracing()
…
controller.StopTracing()
```
systrace\catapult\systrace\systrace\tracing_controller.py
开启controller agent和child agent
child agent固定有这些,会根据传入的参数从这里选择有效的模块
AGENT_MODULES = [android_process_data_agent, android_cgroup_agent,
atrace_agent, atrace_from_file_agent, atrace_process_dump,
ftrace_agent, walt_agent]
```python
def StartTracing(self):
if not self._controller_agent.StartAgentTracing(
self._controller_config,
timeout=self._controller_config.timeout):
print ‘Unable to start controller tracing agent.’
return False
for agent_and_config in self._child_agents_with_config:
agent = agent_and_config.agent
config = agent_and_config.config
if agent.StartAgentTracing(config,
timeout=self._controller_config.timeout):
succ_agents.append(agent)
```
self._controller_agent.StartAgentTracing 貌似只是记录个日志
主要还在各个agent
android_process_data_agent 用命令获取进程snapshot
```
“ps -A -o USER,PID,PPID,VSIZE,RSS,WCHAN,ADDR=PC,S,NAME,COMM” \
“&& ps -AT -o USER,PID,TID,CMD”
```
atrace_agent
```python
@py_utils.Timeout(tracing_agents.START_STOP_TIMEOUT)
def StartAgentTracing(self, config, timeout=None):
…
self._tracer_args = _construct_atrace_args(config,
self._categories)
self._device_utils.RunShellCommand(
self._tracer_args + [‘ — async_start’], check_return=True)
return True
```
执行命令atrace,命令路径`system/bin/atrace`
```python
def _construct_atrace_args(config, categories):
…
ATRACE_BASE_ARGS = [‘atrace’]
atrace_args = ATRACE_BASE_ARGS[:]
…
return atrace_args
```
atrace运行在Android设备中,代码:https://cs.android.com/android/platform/superproject/+/master:frameworks/native/cmds/atrace/atrace.rc
## Android系统服务
atrace.rc主要在Android设备启动时做些初始化,主要是文件的权限设置,初始化关掉trace开关等等
重点是frameworks/native/cmds/atrace/atrace.cpp
设置各种trace之后开始
其中注意点,g_debugAppCmdLine为需要debug的app,这个参数需要指定为我们需要调试的app的包名
```cpp
int main(int argc, char **argv)
{
…
for (;;) {
switch(ret) {
case ‘a’:
g_debugAppCmdLine = optarg;
break;
}
}
…
if (traceStart) {
ok &= setUpUserspaceTracing();
}
if (ok && traceStart && !onlyUserspace) {
ok &= setUpKernelTracing();
ok &= setUpVendorTracing();
ok &= startTrace();
}
if (ok && traceStart) {
if (!traceStream && !onlyUserspace) {
printf(“capturing trace…”);
fflush(stdout);
}
}
```
g_debugAppCmdLine字段会在setup阶段使用,和系统的关键服务合并成一个包名列表,设置到系统环境变量中,给后续的trace读取(Trace_nativeGetEnabledTags)
```cpp
static bool setUpUserspaceTracing()
{
…
std::string packageList(g_debugAppCmdLine);
if (coreServicesTagEnabled) {
if (!packageList.empty()) {
packageList += “,”;
}
packageList += android::base::GetProperty(k_coreServicesProp, “”);
}
ok &= setAppCmdlineProperty(&packageList[0]);
…
return ok;
}
```
startTrace就是给文件`static const char* k_tracingOnPath = “tracing_on”;`写入1,这里是ftrace的机制
## App进程
android\os\Trace.java
```java
public static void beginSection(@NonNull String sectionName) {
if (isTagEnabled(TRACE_TAG_APP)) {
if (sectionName.length() > MAX_SECTION_NAME_LEN) {
throw new IllegalArgumentException(“sectionName is too long”);
}
nativeTraceBegin(TRACE_TAG_APP, sectionName);
}
}
```
isTagEnabled(TRACE_TAG_APP)默认release是false,这里需要强制打开
frameworks/base/core/jni/android_os_Trace.cpp
```cpp
static void android_os_Trace_nativeTraceBegin(JNIEnv* env, jclass,
jlong tag, jstring nameStr) {
withString(env, nameStr, [tag](char* str) {
atrace_begin(tag, str);
});
}
```
system/core/libcutils/include/cutils/trace.h
```cpp
#define ATRACE_BEGIN(name) atrace_begin(ATRACE_TAG, name)
static inline void atrace_begin(uint64_t tag, const char* name)
{
if (CC_UNLIKELY(atrace_is_tag_enabled(tag))) {
void atrace_begin_body(const char*);
atrace_begin_body(name);
}
}
```
system/core/libcutils/trace-dev.cpp
```cpp
void atrace_begin_body(const char* name)
{
WRITE_MSG(“B|%d|”, “%s”, name, “”);
}
```
```cpp
#define WRITE_MSG(format_begin, format_end, name, value) { \
char buf[ATRACE_MESSAGE_LENGTH] __attribute__((uninitialized)); \
int pid = getpid(); \
int len = snprintf(buf, sizeof(buf), format_begin “%s” format_end, pid, \
name, value); \
if (len >= (int) sizeof(buf)) { \
/* Given the sizeof(buf), and all of the current format buffers, \
* it is impossible for name_len to be < 0 if len >= sizeof(buf). */ \
int name_len = strlen(name) — (len — sizeof(buf)) — 1; \
/* Truncate the name to make the message fit. */ \
ALOGW(“Truncated name in %s: %s\n”, __FUNCTION__, name); \
len = snprintf(buf, sizeof(buf), format_begin “%.*s” format_end, pid, \
name_len, name, value); \
} \
write(atrace_marker_fd, buf, len); \
}
```
atrace_marker_fd文件fd从这里来
system/core/libcutils/trace-dev.cpp
```cpp
static void atrace_init_once()
{
atrace_marker_fd = open(“/sys/kernel/tracing/trace_marker”, O_WRONLY | O_CLOEXEC);
if (atrace_marker_fd == -1) {
atrace_marker_fd = open(“/sys/kernel/debug/tracing/trace_marker”, O_WRONLY | O_CLOEXEC);
}
if (atrace_marker_fd == -1) {
ALOGE(“Error opening trace file: %s (%d)”, strerror(errno), errno);
atrace_enabled_tags = 0;
} else {
atrace_enabled_tags = atrace_get_property();
}
}
```
所以实际上记录App的trace事件就是向trace_marker文件中写日志
## 为什么OverHead很低
ATrace每个Event大概需要1–10us,大概是一次jni方法调用,一个字符串拼接和一次系统调用的roundtrip
而如果完全使用用户态的记录方式,一般会这么写
public static void testMethod(){
long startTime = System.nanoTime();
…
long endTime = System.nanoTime();
// 记录方法的开始结束时间,可能需要创建类,如果支持多线程,需要Thread.currentThread()
record(“testMethod”, startTime, endTime);
}
System.nanoTime()会有一次jni,一次系统调用__clock_gettime,配合其他数据结构、类和多线程的支持,将会有更多的损耗
## 自动化测试
adb shell
atrace -o /data/trace.txt -a com.xunmeng.pinduoduo
ftrace各个文件含义:https://hotttao.github.io/2020/01/03/linux_perf/03_ftrace/