Today’s objective was to containerize the SEC filing analysis application to ensure a consistent and reproducible runtime environment. This process involved not only creating the necessary Docker artifacts but also refactoring the application’s configuration to adhere to best practices for handling secrets and environment-specific variables.
Initial Containerization and Refactoring
The first phase of the work focused on establishing the foundational components for containerization and improving the project’s production readiness.
The key commits included:
- Addition of
Dockerfile
anddocker-compose.yml
: Standard files were created to define the application’s image and orchestrate the service stack, respectively. TheDockerfile
is based on a Python image, installs dependencies fromrequirements.txt
, and defines the container’s entry point. - Dependency Management for Production: The
psycopg2-binary
package was added torequirements.txt
. This is a critical step to move from a development-time database like SQLite to a production-grade PostgreSQL database, which the Docker environment is configured to use. - Configuration Handling: Environment variable management within the application’s Python code (
main.py
,database.py
) was refactored to rely on a centralized configuration system, anticipating the need to inject variables from the Docker environment. - Source Control Hygiene: The
.gitignore
file was updated to exclude a broader range of files and artifacts, such as IDE configurations, environment files (.env
), and cache directories, keeping the repository clean.
The Core Challenge: Propagating Secrets to the Container
Upon the initial docker-compose up
, the application failed with an error indicating a missing API key for the OpenAI/OpenRouter service.
Investigation:
The root cause was identified quickly: the API credentials were hardcoded in a launch.json
file, which is used exclusively by the VS Code debugger. These variables are not present in the runtime environment when the application is executed directly or via Docker.
Solution:
The standard and most secure practice is to decouple secrets and configuration from the application code and development tools. The following steps were executed to resolve the issue:
-
Centralize Secrets in
.env
: All environment-specific variables, including the existingDATABASE_URL
and the new LLM credentials (LLM_MODEL
,OPENAI_API_KEY
,OPENAI_BASE_URL
), were consolidated into a.env
file at the project root. This file is explicitly excluded from version control by.gitignore
. -
Update
docker-compose.yml
: While Docker Compose automatically loads variables from a.env
file in the same directory, it is best practice to explicitly pass them to the service for clarity and control. Theweb
service definition indocker-compose.yml
was updated to pass the LLM-related environment variables to the container. -
Update Application Configuration (
config.py
): The application’s Pydantic-based settings module (config.py
) was extended. TheSettings
class was modified to loadLLM_MODEL
,OPENAI_API_KEY
, andOPENAI_BASE_URL
from the environment, making them available to the application logic in a structured and validated manner.
This three-step process—.env
-> docker-compose.yml
-> config.py
—establishes a robust and conventional pattern for managing configuration that works seamlessly across local development and containerized environments.
A Note on Local Environment Troubleshooting
A secondary challenge encountered during this process was related to the local Docker setup on Ubuntu. Initial commands failed with permission denied... docker.sock
, which evolved into protocol not available
after switching contexts.
This is a common issue for users of Docker Desktop on Linux. The CLI was initially trying to connect to the traditional Docker daemon socket (/var/run/docker.sock
), which was not running. The correct target was the Docker Desktop VM, managed via the desktop-linux
context.
The persistent protocol not available
error indicated that the Docker Desktop backend service itself had entered a faulty state. The resolution was to perform a Clean / Purge data from the Docker Desktop troubleshooter. This action resets the Docker engine, deleting all images and containers, but resolves underlying data corruption.
Conclusion
The primary goal of containerizing the application was achieved. The key takeaway is the critical importance of a well-defined configuration management strategy. But again, like many things in software development, the process always involves unexpected challenges and debugging efforts.