of the Thesis Versatile File System Tracing with Tracefs by Akshat Aranya Master of Science in Computer Science Stony Brook University 2004 File system traces have been used for years to analyze user behavior and system software behavior, leading to advances in file system and storage technologies. Existing traces, however, are difficult to use because they were captured for a specific use and cannot be changed, they often miss vital information for others to use, they become stale as time goes by, and they cannot be easily distributed due to user privacy concerns. Other forms of traces (block level, NFS level, or system-call level) all contain one or more deficiencies, limiting their usefulness to a wider range of studies. We developed Tracefs, a thin stackable file system for capturing file system traces in a portable manner. Tracefs can capture uniform traces for any file system, without modifying the file systems being traced. Tracefs can capture traces at various degrees of granularity: by users, groups, processes, files and file names, file operations, and more; it can transform trace data into aggregate counters, compressed, checksummed, encrypted, or anonymized streams; and it can buffer and direct the resulting data to various destinations (e.g., sockets, disks, etc.). Our modular and extensible design allows for uses beyond traditional file system traces: Tracefs can wrap around other file systems for debugging as well as for feeding user activity data into an Intrusion Detection System. We have implemented and evaluated a prototype Tracefs on Linux. Our evaluation shows a highly versatile system with small overheads.
12 Figures and Tables
Figure 2.1: Architecture of Tracefs as a stackable file system. Tracefs intercepts operations and invokes hooks into one or more tracers before passing the operations to the underlying file system.
Figure 2.3: Directed acyclic graph representing the trace condition:
Figure 2.5: An example of a trace message. Each message contains a message identifier, a length field, and multiple arguments. The highest bit of the argument identifier indicates that the argument has a length field.
Figure 3.1: The file name cache for tracing based on file names and extensions. The first level table maps a name to an inode number table that stores inode numbers with that name. An input filter has a direct reference to the inode number table.
Figure 3.2: The asynchronous filter for performing stream transformations asynchronously. A separate kernel thread is used to transform the stream and write traces to stable storage.
Figure 4.1: Execution times for an Am-Utils build. Each group of bars represents an output filter configuration under LIGHT, MEDIUM, and FULL tracing. The leftmost bar in each group shows the execution time for Ext3.
Table 4.1: File system and asynchronous filter configurations for evaluating the performance of the asynchronous filter.
Figure 4.2: Execution times for Postmark. Each group of bars represents an output filter configuration under LIGHT, MEDIUM, and FULL tracing. The leftmost bar in each group shows execution times for Ext3.
Figure 4.3: Trace file sizes and creation rates for Postmark and Am-Utils benchmarks. Each bar shows values for FULL, MEDIUM, and LIGHT tracing. The left half depicts file sizes and the right half depicts trace file creation rates
Figure 4.4: Execution times for Postmark and OpenClose. Bars show system, user, and elapsed times for all combinations of asynchronous filter and trace file writes.
Figure 4.5: Elapsed and system times for Postmark with Ext3 and Tracefs with multiple processes
Figure 4.6: Trace file anonymization rates and increase in file sizes for different portions of traces anonymized. The Y1 axis shows the scale for file sizes whereas the Y2 axis shows the scale for the rate of anonymization.
Download Full PDF Version (Non-Commercial Use)