| |
| # Timeline |
| |
| ## Intro |
| |
| The timeline is at the center of the redaction system. It provides an |
| efficient method to find which package a thread/process belongs to. |
| |
| The timeline allows queries to be connected to time. Without this, there's a |
| significant privacy conern because a pid can be recycled. Just because the pid |
| is excluded from redaction before time T, doesn't mean it should be redacted |
| after time T. |
| |
| ## General Structure |
| |
| The timeline uses an event-based pattern using two events: |
| |
| - __Open Event:__ Marks the begining of a pid's new lifespan. |
| - __Close Event:__ Marks the end of a pids's lifespan. |
| |
| An event-based structure (compared to a span-based structure) is used as it is |
| better suited to handle errors/issues in the underlying data. For example, if a |
| pid doesn't explictly ends before being reused (e.g. two back-to-back open |
| events), the event-based structure "just works". |
| |
| Open events contain the thread's full state. The close event only contains the |
| information needed to reference the thread's previous event. |
| |
| ```c++ |
| struct Open { |
| uint64_t ts; |
| int32_t pid; |
| int32_t ppid; |
| uint64_t uid; |
| }; |
| |
| struct Close { |
| uint64_t ts; |
| int32_t pid; |
| }; |
| ``` |
| |
| The vast majory of threads will have one event, an open event provided by the |
| `ProcessTree`. For some threads, they will have multiple open (`ProcessTree`, |
| `NewTask`) and close events (`ProcFree`) in alternating order. |
| |
| ## Query |
| |
| ```c++ |
| struct Slice { |
| int32_t pid; |
| uint64_t uid; |
| }; |
| |
| class Timeline { |
| Slice Query(uint64_t ts, int32_t pid) const; |
| }; |
| |
| ``` |
| |
| Events, regardless of type, are stored in contiguous memory and are ordered |
| first by pid and second by time. This is done to allow events to be found |
| via a binary search. |
| |
| The vast majory of threads will have one event, the open event. Some threads |
| may have close and re-open events. |
| |
| To handle a query, |
| |
| 1. Use a binary search to find the lower bound for `pid` (the first instance of |
| `pid`) |
| 1. Scan forward to find the last event before `ts` (for `pid`) |
| |
| If an event was found: |
| |
| ```c++ |
| if (e.type == kOpen && uid != 0) |
| return Slice(pid, e.uid); |
| |
| // The pid is active, check the parent for a uid. |
| if (e.type == kOpen && uid == 0) |
| return Query(ts, e.ppid); |
| |
| return Slice(pid, kNoPackage); |
| ``` |
| |
| If `pid` does not have an immediate package (`uid`), the parent must be |
| searched. The parent-child tree is short, so the recursive search will be |
| relatively short. To minimize this even more, a union-find operation is applied |
| because any queries can be made. |
| |
| __Simple runtime overview:__ |
| |
| Initialization: |
| |
| - $sort + union\ find$ |
| |
| - $nlogn + mlogn$ |
| - where $n=events$ |
| - and $m=approx\ average\ depth$ |
| |
| Query: |
| |
| - $P(p) = m_p * (logn + e_p)$ |
| - where $m_p=\ distance\ from\ pid\ to\ uid$ |
| - and $n=events$ |
| - and $e_p=number\ of\ events\ for\ process\ p$ |
| |
| - Because of the union-find in initialization, $m_p \to 0$ |
| |
| To further reduce the runtime, the search domain is reduces by remove all open |
| events for $pids$ that don't connect to a target $uid$. By removing open events, |
| and close events, there are two advantages: |
| |
| 1. Removing open events are safe and simple. By removing open events, those pids |
| can never be marked by active. Keeping the close events effectively reminds the |
| system that the pid is not active. |
| |
| 1. The number of open events exceeds the number of close events. Removing open |
| events will have a greater effect on the number of events. |
| |
| __Example:__ |
| |
| |Name|Value|Notes| |
| |-|-|-| |
| |tids|3666|Total number of threads.| |
| |freed threads|5|Number of threads that were freed.| |
| |reused threads|0|No threads were used more than one time.| |
| |process tids|64|Total number of threads connected to the target process.| |
| |
| After initialization, there would only be 64 open events and 5 close events. |
| This means that every uid lookup would be $logn\ |\ n=64 = 6$. Finding the uid |
| given a pid is one of the most common operations during redaction because uid |
| determines if something needs to be redacted. |