Computer forensics is a complex and often mis-understood practice. The examiner is presented with massive volumes of data, and forced to work within the constraints of computer processing capabilities. Contrary to prime time television shows, there is no “Magic” button, and there is no instant feedback. Proper computer investigation is a multi-step, time and labor-intensive process.
This article is designed as a primer on the computer forensics process and generally what examiners will go through to conduct an investigation. It is meant to provide a basic understanding of how a computer stores, accesses and processes data so that you, the client, has a clearer sense of the work and time necessary for a full and thorough forensics investigation. It is by no means complete, as every investigation is different, and this is not designed to delve into every eventuality. For example, a malware or network compromise examination may be far more surgical in its approach, meaning that many of the “machine–time intensive” functions are not necessary. It is to be viewed and used as a guideline only. It is written starting from the point that an examiner would come to be in receipt of the computer/hard drive/media in question, whether that be in the examiner’s lab, or on site, and consists basically of three phases. The Acquisition Phase, the Analysis Phase, and the Reporting Phase.
In this phase, the examiner will perform an Evidentiary Intake of all components provided, to include photos, cataloguing of all items, and the start of an evidence chain of custody log that will follow the evidence throughout its life in the examination process. The computer hard drive needs to be “copied” in a special way, so as to preserve the evidentiary integrity of its contents. The proper terminology is a “bit for bit, forensically sound image”, and not a “copy” or “mirror”. A copy (or mirror) is not the same thing as an image, and if the process is done those ways, it can damage your case before you get started.
In every case, the examiner should be provided with the entire computer, and not just the hard drive. There is very important information on a computer chip inside the computer (that does not sit on the computer hard drive) that is instrumental in the investigation. It must be captured and catalogued separate from the hard drive, and is commonly done while the hard drive is removed from the computer.
Once the hard drive is removed from the computer, it is connected to specialized hardware to start the actual acquisition. This may take the form of connecting the hard drive to a specialized computer built for the purpose, or connected directly to an acquisition device that will perform the task. In either case, the process has to follow specific protocols, so as not to damage or destroy the integrity of the evidence. The preferred method is through the use of an acquisition device because of its speed, but for various reasons, this may not be feasible, such as with older hard drives, or with servers or network connected devices that cannot be taken offline for the acquisition phase.
So how long does this acquisition take? This is an often asked question, because in some cases, the window for acquisition is very limited, as in a case where an employee will only be gone for a few hours, or the activity will disrupt some other work product. In all cases, this is an “it depends” answer. It depends on a number of factors that the examiner cannot control. It is dictated by things like type of formatting on subject drive, rotational speed of drive, data density and volume, drive interface and construction, and tools used. In the case of using a hardware device as mentioned above, a general rule of thumb would be 4-5 GB of data per minute to create the forensic image. Translated, this means that a 320 GB hard drive will take roughly 70-80 minutes to image. A 1 TERAbyte hard drive will take roughly 3.5 to 4.5 hours to image. These are just imaging times. Immediately following the imaging process, a verification process has to be performed to ensure the integrity of the evidence collected. This roughly takes as long as the initial imaging phase, and needs to be performed during the acquisition process.
In the case of an older hard drive, or a problematic hard drive with structural problems, the imaging process may only happen at a rate of 1 GB per minute. During a live acquisition of a machine that cannot be turned off, the same rate (1 GB/min) can be expected. Hard drives in external enclosures that connect via USB must be dismantled (which can be destructive in some cases), or else their acquisition timeline will be increased.
Many times an examiner has shown up at a location only to find that the computer has more than one hard drive in it, or is a server with multiple drives in it. Obviously this seriously impacts the acquisition timeline, not to mention the cost.
Finally, during the acquisition phase, if it is a live acquisition, or the computer is found to be on when the examiner arrives, it is highly recommended that the computer’s RAM (Random Access Memory) be acquired. This again is dependent on the amount of RAM in the computer, but can take anywhere from a few minutes to more than an hour, and cannot be done concurrent with the hard drive acquisition process.
As you can see, from start to finish, the acquisition phase can take the better part of an entire day just by itself before any analysis is started!
This is usually the longest phase in the process, although depending on findings, the Reporting Phase can actually take longer.
Much of the activity performed in this phase is predicated on what the case parameters are. Understand that there are also a great many tools at a skilled examiner’s disposal with which to perform tasks. This paper attempts to be tool agnostic, and speak merely to the process, and not the specific techniques or tools.
To better understand what the examiner faces, it behooves a potential client to have a basic understanding of how an operating system works. This paper is focused on the Windows operating system, but generally speaking the process applies to any operating system. The purpose of this paper is not to fully train the reader in the technical functionality of Windows, but rather to understand some of its complexities. For that reason, it is not necessarily important to understand the terminologies or data repositories, but rather be aware of the volumes of information that get parsed that the average client may not be aware of.
Windows by itself is a complicated and highly technical living beast. Add to this the fact that there are basically 2 types of Windows (32 bit and 64 bit; functioning in vastly different ways), and multiple versions of each type (Windows 7 Home, Windows 7 Professional, Windows 7 Ultimate, etc), not to mention different packages of Windows (Windows 95, 98, XP, 2000, Vista, 7, 8, etc). As if that weren’t enough, consider the vagaries of all the different programs (and their types and versions) that a user may install, and it becomes easy to see why a computer forensic examination is a highly complex undertaking. And these are only the parts that an average user sees and knows about.
Generally speaking, Windows contents (system only, not considering user data) can be classified as Visible and Hidden. Although these two areas don’t specifically contain user created data, they DO contain a vast amount of data ABOUT a user, their activities, preferences, and habits. These areas provide as much or more data relevant to an investigation than the user files themselves. For example, the fact that a user-created Word Document exists is not always that important, because the client usually already knows about it. It is the “under the hood” workings that tell the examiner about the file, such as the “who”, “what”, “when”, “where” “how”, and sometimes even “why”. For the most part, this data is not visible (or even known about) by the average user.
Here then, is a brief list of the types of “behind the scenes” data that may exist, but are not visible to the average user. This is certainly not an exhaustive list, and if anything, is a small subset of the available data that an examiner may have to review.
Temporary Internet Folders
Browser Cache Data
Most Recently Used Lists
System Log Files
Virtual Memory Files
Although this may initially not look like a very long list, it is so vast that an examiner will generally spend most of his or her time in these areas. As well, on most computers, there are multiple copies of all of the above from different snapshots in time. Below is just a very small list of the types of information that may be found in the above areas.
Files that have been deleted
All USB devices ever connected
Files/folders that have been exfiltrated from a computer
When devices have been connected
CD/DVDS that may have been burned
Lists of recently used programs and the files they have accessed
Programs that have been installed and uninstalled
Attempts at data destruction/hiding
Program settings that can show knowledge of an act
What programs start when the computer starts
How many times a program has ever been run
Wi-Fi connection points
Hidden email and other accounts
Geographic location of where photos were taken
What particular user performed a task
Each of the functions listed above is a separate task, and can take time (sometimes hours or more) to determine, or to place the data in a state at which it can be determined.
The first technical step in starting the Analysis Phase is getting the data into a state by which it can be searched, or have data extracted. This can take many forms depending on how an examiner performs their tasks. A very common first step is to perform a function variously called “recover folders”. This auto “undeletes” any files on the computer that were deleted, but which still have the Master File Table Entry intact. Once complete, the next step commonly performed is “indexing the case”. This indexing is a very complex set of functions whereby a program or programs are used to perform a number of tasks to prepare the data for examination. Just some of the steps include the extraction and parsing of compound files such as Registry Hives and their components for each user, not to mention the same for every instance of these that are found in System Restore Points, and Volume Shadow Copies. This will also apply to log files and other system artifact repositories. Many cases involve user created documents, and as such, clients will need search terms ran against them to see if any are responsive. Clients typically think this is a simple search, like looking for something in Google. It couldn’t be further from the truth. In fact, files such as Microsoft Office files, PDF documents, Compression files such as RAR and ZIP, TIFF files, email files, Internet Cache files, etc, reside on the computer in proprietary encryption packages. Without first decrypting and mounting the above files, search terms will be useless, and will find nothing.
The indexing portion itself goes above and beyond everything that has already been discussed. It is performed once all compound files (Registry, Office documents, PDF, Email, etc) have been decrypted and mounted. This process then crawls through the hard drive from one end to the other and creates an index of every single word, phrase, or human readable syntax contained within. Once complete, this allows the search process to be performed based on search terms. Again, this process is dependent on software and methodologies, and is meant as a general guideline.
All of the above needs to be performed before an examiner can start working with the data. It should be clear by now that with the size of hard drives today, this is going to be a time consuming task, measured typically in days or weeks, and not hours. In fact, depending on the size of the dataset, the software used, and the goals of the investigation, it can take 2-7 days PER HARD DRIVE just to get the data into a format for the examiner to start working with it. Again, specific caveats apply.
Fortunately, much of the above are automated functions that the examiner “sets and forgets”. Once the functions are complete, the examination can start, but in the interim, in a best-case scenario, 3-4 days or more may have passed since acquisition of the data. This is suggestive of a lab that has nothing else to do. It is the norm that successful labs will have numerous cases ongoing at any given time, so in most cases, the progression of a given file is not 24/7 until completion.
Once all of the above has been completed, the heavy lifting begins. This is where the examiner starts to use experience and technical expertise to answer the client’s questions. Search term hit extraction is a common function in many cases, and is typically not an onerous function as it pertains to resident data. However, in cases where data has been deleted and must be manually recovered, this can be cost and labor prohibitive. For example, most forms of email cannot be recovered automatically, once they are deleted. There is no special script or program that can recover them, because they do not contain a file signature (an important component for automated recovery). As such, they need to be located, identified, extracted, and given a name and extension that will allow for identification, and for the end user to be able to read the extraction simply by clicking on it.
Imagine for a moment, finding 20,000 “hits” on a particular search parameter in unallocated file space. Suggesting in the case of email, that a skilled examiner can perform the extraction at a manual rate of 6 per minute (10 seconds per email to identify, extract, convert, name, and move to the next one), the above 20,000 extractions would take over 55 hours non-stop without a break. For this reason, it is important for clients to be realistic about what they are searching for. Asking to recover all emails from or to firstname.lastname@example.org, where this is the email address of the user of the computer you are investigating, is an unrealistic request, but one that labs get frequently.
This is by no means the end of the Analysis Phase. It is very possible that for various case specific reasons, more automated searches need to be performed, each taking many hours to a few days to perform. It is easy to see then, how a case that was initially quoted at 7 days can turn into 14 or more. An examiner can only give an average completion time. There is simply no way to know at the onset, what the examiner may face, and no one at any stage prior can assist with that determination. In fact, a very common reason for cases to take longer than anticipated is because the client had their IT department self investigate the case first. Mostly, this just destroys or alters evidence, making it harder and far more time consuming for the examiner to determine what was happening on the computer, when, and by whom.
Usually the most important, but most neglected phase, the report is an integral part of the overall engagement. A proper report will be clear and concise, free of embellishment and guessing, and understandable to any layperson. In many cases, the report cannot be submitted in written format. Take for example, a complex spreadsheet. Printing the spreadsheet that is many columns wide cannot be portrayed properly on paper, not to mention that a printed spreadsheet cannot show the underlying formulas in the data. Another example would be a case that involves the identification of 500 pictures. It is much easier to provide an electronic report in which the user can click in the report to be taken to the photos, rather than printing so much data. A proper report takes time, and can stretch into many hours of examiner time.
One of the most important (and often unperformed) tasks is for the client to make the necessary time to sit with the examiner in order to best determine the ultimate goal. In many cases, a client will give the examiner a one or two sentence goal, a list of search terms, and then send them off hunting. This only creates a great deal of wasted time, and client frustration at how long a case might take.
In conclusion, a client must be realistic to what is possible, sympathetic to unseen changes to work direction and timeline, and receptive to receiving information that was unexpected. A computer forensic examiner cannot fabricate data. If you trust your examiner, and they are proficient at their science, sometimes the client must accept that either there is no evidence left to put forth, or that initial theories were wrong. At the end of the day, the data does not lie. It merely sits there waiting to be interpreted.