GitHub Actions: Fix File Storage In Pre-Steps For Reliability
Hey guys! Let's dive into a crucial aspect of building robust GitHub Actions β proper file storage locations, especially when dealing with pre-steps. We'll explore why the $GITHUB_WORKSPACE
isn't the ideal spot for temporary files and how $RUN_TEMP
can save the day. Plus, we'll dissect a real-world log example to understand how these concepts play out in practice. So, grab your favorite coding beverage, and let's get started!
Understanding the Problem: Why Not $GITHUB_WORKSPACE
?
When developing GitHub Actions, it's super tempting to use the $GITHUB_WORKSPACE
environment variable as a convenient place to store files generated during the workflow. After all, it's the designated directory where your repository is checked out, making it seem like the perfect spot for temporary data. However, there's a major caveat: the $GITHUB_WORKSPACE
directory is wiped clean by the checkout action. This means that if you store any information in this directory during a pre-step, it will be gone before your main action even gets a chance to use it. This can lead to unexpected errors, lost data, and a generally frustrating experience. In essence, avoid using $GITHUB_WORKSPACE
for storing any data you need to persist between steps in your GitHub Actions workflow. Think of it as a clean slate that's refreshed every time the checkout action runs.
The primary reason for this behavior is to ensure a clean and consistent environment for each job execution. By clearing the $GITHUB_WORKSPACE
, GitHub Actions prevents potential conflicts and ensures that each job starts with a predictable state. This is particularly important for workflows that involve complex build processes or multiple steps that depend on each other. So, while $GITHUB_WORKSPACE
is excellent for accessing your repository's files, itβs not designed to be a persistent storage location.
Imagine you're building a sophisticated testing workflow. Your pre-step might involve setting up a testing environment, generating configuration files, or downloading dependencies. If you store these crucial components in $GITHUB_WORKSPACE
, the subsequent checkout action will obliterate them, leaving your tests high and dry. This not only defeats the purpose of the pre-step but also introduces a tricky debugging scenario, as the cause of the failure might not be immediately apparent. To prevent these headaches, you need a reliable storage solution that survives the checkout action's cleanup.
Furthermore, storing files directly in $GITHUB_WORKSPACE
can lead to issues with workflow isolation. Each job in a GitHub Actions workflow should ideally operate in its own isolated environment to prevent interference. If different jobs or steps within a job start writing to the same location, you risk data corruption and unpredictable behavior. This is especially critical for parallel workflows where multiple jobs run concurrently. Using a dedicated temporary storage location like $RUN_TEMP
helps maintain this isolation, ensuring that each step has its own private space to work with. So, by sidestepping $GITHUB_WORKSPACE
for temporary files, you're not just avoiding data loss; you're also promoting a more robust and maintainable workflow.
The Solution: Embracing $RUN_TEMP
So, where should you store your temporary files then? Enter $RUN_TEMP
, the hero of our story! This environment variable points to a directory specifically designed for temporary files generated during a workflow run. Unlike $GITHUB_WORKSPACE
, $RUN_TEMP
is guaranteed to persist throughout the entire workflow run, ensuring that your pre-step data remains safe and sound. It's like having a dedicated temporary storage locker for your workflow β a reliable place to stash files without worrying about them disappearing unexpectedly. Using $RUN_TEMP
(or a subdirectory within it) is the recommended best practice for storing temporary data in GitHub Actions.
To effectively use $RUN_TEMP
, you'll typically create a subdirectory within it to keep your files organized and prevent naming conflicts with other actions or steps. This subdirectory acts as a dedicated workspace for your action, providing a clear separation of concerns. For instance, if your action is called mitmproxy-logger-action
, you might create a directory named $RUN_TEMP/mitmproxy-traffic
to store traffic logs and related files. This approach not only keeps your temporary data tidy but also makes it easier to clean up when the workflow completes.
The great thing about $RUN_TEMP
is that it's automatically cleaned up by the runner after the workflow completes. This eliminates the need for manual cleanup steps, preventing your runner's disk from filling up with stale temporary files. This automated cleanup is a significant advantage, as it simplifies your workflow and reduces the risk of resource exhaustion. However, it's important to note that files in $RUN_TEMP
are only guaranteed to persist for the duration of the workflow run. If you need to store data for longer periods, you'll need to use a more persistent storage solution, such as GitHub Actions artifacts or external storage services.
When implementing $RUN_TEMP
in your actions, consider using robust file path construction techniques to avoid potential issues with different operating systems or runner configurations. Employing path manipulation tools provided by your scripting language (like os.path.join
in Python or path.join
in Node.js) can help ensure that your file paths are correctly formed, regardless of the underlying platform. This cross-platform compatibility is crucial for creating actions that work seamlessly across various runner environments. By adopting $RUN_TEMP
and best practices for file handling, you can build more reliable and portable GitHub Actions that gracefully handle temporary data.
Log Analysis: A Real-World Example
Let's dissect a real-world log snippet from a GitHub Actions run to see how these concepts come into play. The log provides valuable insights into the execution of a mitmproxy-logger-action
, highlighting potential pitfalls and best practices. We'll examine the log step-by-step, focusing on the file storage aspects and the implications of using different directories.
2025-08-03T06:49:34.7368597Z Starting mitmproxy logger...
2025-08-03T06:49:34.7371236Z Installing mitmproxy...
This initial section of the log shows the action starting up and installing mitmproxy
, a powerful tool for intercepting and analyzing network traffic. This installation process likely involves downloading and extracting files, which need to be stored somewhere temporarily. The crucial question is: where are these files being stored? If they're going into $GITHUB_WORKSPACE
, we know there's a potential problem looming.
2025-08-03T06:49:44.1927603Z Starting mitmdump on 127.0.0.1:8080
2025-08-03T06:49:44.1929130Z Traffic will be saved to: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic/traffic_20250803_064944.mitm
Aha! This log line reveals the file storage location: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic/traffic_20250803_064944.mitm
. Notice that this path falls directly under $GITHUB_WORKSPACE
. This is a red flag. The traffic logs, which are the core output of this action, are being stored in a directory that will be wiped clean by the subsequent checkout action. This is precisely the scenario we want to avoid.
2025-08-03T06:49:46.2168374Z ##[group]Run actions/checkout@v4
Here comes the culprit! The checkout action is running, and as we know, it will clear the contents of $GITHUB_WORKSPACE
. Any traffic logs generated before this point are effectively lost. This explains the warnings we'll see later in the log.
2025-08-03T06:49:47.7564556Z ##[warning]PID file not found at: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic/mitmdump.pid
2025-08-03T06:49:47.7573856Z ##[warning]Traffic file path not found at: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic/traffic_file_path.txt
These warnings confirm our suspicion. The action is trying to locate the PID file and traffic file path, but they're gone β victims of the checkout action's cleanup. This highlights the critical importance of using a persistent storage location like $RUN_TEMP
. The action should have stored these files in a subdirectory under $RUN_TEMP
to ensure they survive the checkout action.
2025-08-03T06:49:48.5716672Z Starting mitmproxy cleanup and artifact upload...
2025-08-03T06:49:48.5719789Z Looking for PID file at: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic/mitmdump.pid
2025-08-03T06:49:48.5721005Z No PID available. Checking if traffic directory exists...
2025-08-03T06:49:48.5721798Z Traffic directory does not exist: /home/runner/work/mitmproxy-logger-action-testing/mitmproxy-logger-action-testing/mitmproxy-traffic
2025-08-03T06:49:48.5723001Z Checking for any traffic files in directory...
2025-08-03T06:49:48.5723696Z No traffic file found. Creating an empty one for completeness...
This section further emphasizes the issue. The cleanup process can't find the traffic directory because it was wiped out by the checkout action. As a workaround, the action creates an empty traffic file, but this is clearly not the intended behavior. The valuable traffic logs are lost, rendering the action largely ineffective. This log analysis underscores the critical need to store temporary files in $RUN_TEMP
rather than $GITHUB_WORKSPACE
in pre-steps.
Best Practices and Takeaways
Okay, guys, let's recap the key takeaways and best practices to ensure your GitHub Actions are file-storage savvy:
- Never store persistent data in
$GITHUB_WORKSPACE
in pre-steps. Remember, it's a temporary zone that gets wiped clean by the checkout action. - Embrace
$RUN_TEMP
as your go-to location for temporary files that need to persist throughout the workflow run. - Create subdirectories within
$RUN_TEMP
to organize your action's files and avoid naming conflicts. Think$RUN_TEMP/your-action-name
. - Use robust file path construction techniques to ensure cross-platform compatibility. Leverage path manipulation tools provided by your scripting language.
- Review your action's logs carefully to identify potential file storage issues. Warnings about missing files are often a telltale sign of problems.
By following these guidelines, you'll be well-equipped to build reliable and robust GitHub Actions that handle file storage like pros. So, go forth and create awesome workflows, guys! Remember, a little planning around file storage can save you a whole lot of headaches down the road.
Now that we've thoroughly diagnosed the issue with storing temporary files in $GITHUB_WORKSPACE
, let's outline a practical approach to fixing the mitmproxy-logger-action
. The goal is to refactor the action to use $RUN_TEMP
for storing traffic logs and related files, ensuring that these files persist throughout the workflow run and are available for subsequent steps. This involves modifying the action's scripts to create a dedicated subdirectory within $RUN_TEMP
, store the traffic logs in this directory, and update the cleanup process to correctly locate and archive the logs.
The first step is to modify the start.sh
script, which is responsible for starting mitmdump
and configuring the traffic logging. We need to change the script to create a subdirectory within $RUN_TEMP
and instruct mitmdump
to store the traffic logs in this new location. This can be achieved by adding a few lines of code to the script:
mkdir -p "$RUN_TEMP/mitmproxy-traffic"
traffic_file="$RUN_TEMP/mitmproxy-traffic/traffic_$(date +%Y%m%d_%H%M%S).mitm"
mitmdump --listen-host ${LISTEN_HOST} --listen-port ${LISTEN_PORT} -w "${traffic_file}" &>
"$RUN_TEMP/mitmproxy-traffic/mitmdump.log" &
echo $! > "$RUN_TEMP/mitmproxy-traffic/mitmdump.pid"
echo "${traffic_file}" > "$RUN_TEMP/mitmproxy-traffic/traffic_file_path.txt"
This snippet does the following:
- Creates a directory
$RUN_TEMP/mitmproxy-traffic
if it doesn't already exist usingmkdir -p
. This ensures that the directory is created even if the parent directories don't exist. - Constructs the full path to the traffic file using
$RUN_TEMP
and the current date and time. This generates a unique filename for each workflow run. - Starts
mitmdump
, instructing it to save traffic to the constructed file path. The output and errors are redirected to a log file within the same directory. - Saves the PID of the
mitmdump
process to a file within the$RUN_TEMP
directory. This PID is crucial for later stopping the process during cleanup. - Writes the full path to the traffic file to a text file, also within the
$RUN_TEMP
directory. This allows other steps in the workflow to easily access the traffic file.
Next, we need to update the action's main script to properly set the outputs. The action should now read the traffic file path from the traffic_file_path.txt
file in the $RUN_TEMP
directory and set the traffic-file
output accordingly. This ensures that subsequent steps in the workflow can access the traffic file.
traffic_file="$(cat "$RUN_TEMP/mitmproxy-traffic/traffic_file_path.txt")"
echo "traffic-file=${traffic_file}" >> $GITHUB_OUTPUT
This snippet reads the traffic file path from the text file and sets the traffic-file
output variable, making it available to other steps.
Finally, the cleanup process needs to be adjusted to correctly locate and archive the traffic logs. The script should now look for the PID file and traffic file within the $RUN_TEMP/mitmproxy-traffic
directory and archive the traffic logs from this location. This involves modifying the cleanup script to use the new file paths and ensuring that the traffic logs are correctly compressed and encrypted before being uploaded as an artifact.
By implementing these changes, the mitmproxy-logger-action
will be significantly more robust and reliable. The use of $RUN_TEMP
ensures that the traffic logs persist throughout the workflow run, preventing data loss and enabling effective traffic analysis. This practical approach demonstrates how understanding the nuances of GitHub Actions file storage can lead to better action design and a smoother workflow experience.
In summary, migrating the temporary file storage location from $GITHUB_WORKSPACE
to $RUN_TEMP
and adjusting the corresponding scripts and processes to store the traffic files within the specified $RUN_TEMP
location rather than the $GITHUB_WORKSPACE
is paramount for the mitmproxy-logger-action
's functionality. By doing so, the action ensures that critical traffic logs persist throughout the workflow's duration, which is essential for effective analysis and debugging. This underscores the importance of adhering to GitHub Actions' best practices for temporary file management to construct stable, reliable workflows.