fix: Sanitize database credentials in error messages #11621
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThis change adds database URL sanitization to prevent credential exposure in error messages. A new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/lfx/tests/unit/utils/test_database_url_sanitization.py`:
- Around line 125-135: The test
test_should_handle_malformed_url_with_regex_fallback uses weak assertions that
always pass because it uses "or" with "***" present; update the assertions to
require both that the original credentials are not present and that the mask is
present (use "and" logic) so that sanitize_database_url(url) is verified to
remove credentials and also to include the mask; additionally replace the
placeholder credentials with distinctive strings (e.g.,
"s3cr3tuser"/"s3cr3tpass") to avoid accidental substring matches against
host/path components when checking the sanitized result.
🧹 Nitpick comments (1)
src/lfx/src/lfx/utils/util_strings.py (1)
37-50: Type hint doesn't reflectNonehandling.The function signature declares
url: str, but the implementation handlesNoneat line 49 (and the test file exercises this path). Consider updating the signature tourl: str | Noneto match the actual contract.Proposed fix
-def sanitize_database_url(url: str) -> str: +def sanitize_database_url(url: str | None) -> str | None:
dkaushik94
left a comment
There was a problem hiding this comment.
Changes look good except the check for username presence only.
| from sqlalchemy.engine import make_url | ||
|
|
||
| parsed_url = make_url(url) | ||
| if parsed_url.username: |
There was a problem hiding this comment.
@Cristhianzl
This is a brittle check since systems often use no username but with password auth. For example possibly:
postgresql://:password123@localhost/db
This would end up failing this check but we still would want to mask the password.
Additionally, this case also fails the RegEx check since we are enforcing the username should be at least 1 character long.
We can improve the regex, but I think our use case is pretty contained provided we aren't extending this utility function to work with all kinds of datastore connection strings (that'd add more edge cases we would need to handle in case make_url fails.).
There was a problem hiding this comment.
A quick improvement with Gemini came up with:
def sanitize_database_url(url: str) -> str:
if not url:
return url
# Strategy 1: SQLAlchemy (The Gold Standard)
try:
from sqlalchemy.engine import make_url
parsed_url = make_url(url)
# Mask if there is a username OR a password
if parsed_url.username or parsed_url.password:
return str(parsed_url.set(username="***", password="***"))
return str(parsed_url)
except (ImportError, Exception):
# Strategy 2: urllib.parse (Standard Library Fallback)
try:
p = urlparse(url)
if p.username or p.password:
# Rebuild netloc without credentials
port_str = f":{p.port}" if p.port else ""
new_netloc = f"***:***@{p.hostname}{port_str}"
return urlunparse(p._replace(netloc=new_netloc))
return url
except Exception:
# Final Fallback: Defensive Regex
# This pattern is more aggressive, catching the :// and everything up to the @
return re.sub(r"(://).*?@", r"\1***:***@", url)
Codecov Report✅ All modified and coverable lines are covered by tests. ❌ Your project status has failed because the head coverage (41.60%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #11621 +/- ##
==========================================
+ Coverage 34.95% 36.42% +1.46%
==========================================
Files 1506 1570 +64
Lines 71881 76690 +4809
Branches 10692 11638 +946
==========================================
+ Hits 25124 27931 +2807
- Misses 45425 47184 +1759
- Partials 1332 1575 +243
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
* sanitize error from db * [autofix.ci] apply automated fixes * improve regex --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
OBJECTIVE: Prevent sensitive database credentials (username, password) from being exposed in application logs when DATABASE_URL is misconfigured.
CHANGES:
Before:
After:
Summary by CodeRabbit
New Features
Tests