-
-
Notifications
You must be signed in to change notification settings - Fork 0
feat(sentry_integration): exec app in subprocess #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4ee905c
3f12aa6
66d6b43
af519f1
1bc38a8
cb48162
5c66cad
deb4dae
f9f0837
feee6e3
98b300e
6bf947c
24a9d0a
1fa578d
eabbfc3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,13 @@ | ||
| import importlib | ||
| import json | ||
| import logging | ||
| from typing import Any, Optional, cast | ||
| import multiprocessing | ||
| import sys | ||
| from typing import Any, Mapping, Optional, cast | ||
|
|
||
| import click | ||
| import jsonschema | ||
| import sentry_sdk | ||
| import yaml | ||
|
|
||
| from sentry_streams.adapters.loader import load_adapter | ||
|
|
@@ -60,37 +63,47 @@ def iterate_edges( | |
| step_streams[branch_name] = next_step_stream[branch_name] | ||
|
|
||
|
|
||
| def _load_pipeline(application: str) -> Pipeline[Any]: | ||
| """ | ||
| Worker function that runs in a separate process to load the pipeline. | ||
| Returns the Pipeline object directly, or raises an exception on error. | ||
|
|
||
| Customer code exceptions are allowed to propagate naturally so that the customer's | ||
| Sentry SDK (if initialized) can capture them. | ||
| """ | ||
| import contextlib | ||
|
|
||
| pipeline_globals: dict[str, Any] = {} | ||
|
|
||
| with contextlib.redirect_stdout(sys.stderr): | ||
| with open(application, "r") as f: | ||
| exec(f.read(), pipeline_globals) | ||
|
|
||
| if "pipeline" not in pipeline_globals: | ||
| raise ValueError("Application file must define a 'pipeline' variable") | ||
|
|
||
| pipeline = cast(Pipeline[Any], pipeline_globals["pipeline"]) | ||
| return pipeline | ||
|
|
||
|
|
||
| def load_runtime( | ||
| name: str, | ||
| log_level: str, | ||
| adapter: str, | ||
| config: str, | ||
| segment_id: Optional[str], | ||
| application: str, | ||
| environment_config: Mapping[str, Any], | ||
victoria-yining-huang marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rust caller uses incompatible old function signatureHigh Severity The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rust caller uses outdated
|
||
| ) -> Any: | ||
|
|
||
| logging.basicConfig( | ||
| level=log_level, | ||
| format="%(asctime)s - %(levelname)s - %(message)s", | ||
| datefmt="%Y-%m-%d %H:%M:%S", | ||
| ) | ||
| with multiprocessing.Pool(processes=1) as pool: | ||
| pipeline: Pipeline[Any] = pool.apply(_load_pipeline, (application,)) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pipeline with lambdas fails subprocess picklingHigh Severity Using |
||
|
|
||
| pipeline_globals: dict[str, Any] = {} | ||
|
|
||
| with open(application) as f: | ||
| exec(f.read(), pipeline_globals) | ||
|
|
||
| with open(config, "r") as config_file: | ||
| environment_config = yaml.safe_load(config_file) | ||
|
|
||
| config_template = importlib.resources.files("sentry_streams") / "config.json" | ||
| with config_template.open("r") as file: | ||
| schema = json.load(file) | ||
|
|
||
| try: | ||
| jsonschema.validate(environment_config, schema) | ||
| except Exception: | ||
| raise | ||
| validate_all_branches_have_sinks(pipeline) | ||
|
|
||
| metric_config = environment_config.get("metrics", {}) | ||
| if metric_config.get("type") == "datadog": | ||
|
|
@@ -114,8 +127,6 @@ def load_runtime( | |
| metric_config = {} | ||
|
|
||
| assigned_segment_id = int(segment_id) if segment_id else None | ||
| pipeline: Pipeline[Any] = pipeline_globals["pipeline"] | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing KeyError handling for datadog metrics configMedium Severity When Additional Locations (1) |
||
| validate_all_branches_have_sinks(pipeline) | ||
| runtime: Any = load_adapter(adapter, environment_config, assigned_segment_id, metric_config) | ||
| translator = RuntimeTranslator(runtime) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think there is a use case we did not consider and is an issue. Though if we changed that and sent config file mistakes to product, the system would be wrong anyway, as streaming owns the config file content. Do we have a plan to address this issue ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not address this issue yet. The configuration yaml file should have its own validation step, and it's not been built yet. In any case like you said it will be a problem if this step is the first place an invalid deployment config file is erroring out
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @fpacifici not sure what the course of action here is though, a mismatch between deployment config and pipeline settings could be either team's responsibility. if a team renames its steps and does not update the config file in advance then we should not have to react to that. I think we should probably try to prevent this issue in CI if it becomes a big one.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think we do not haev a precise story for how to manage this use case in a CI/CD environment. I think this should be discussed in a followup
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
|
|
@@ -124,6 +135,54 @@ def load_runtime( | |
| return runtime | ||
|
|
||
|
|
||
| def load_runtime_with_config_file( | ||
| name: str, | ||
| log_level: str, | ||
| adapter: str, | ||
| config: str, | ||
| segment_id: Optional[str], | ||
| application: str, | ||
| ) -> Any: | ||
| """Load runtime from a config file path, returning the runtime object without calling run().""" | ||
| with open(config, "r") as f: | ||
| environment_config = yaml.safe_load(f) | ||
|
|
||
| config_template = importlib.resources.files("sentry_streams") / "config.json" | ||
| with config_template.open("r") as file: | ||
| schema = json.load(file) | ||
|
|
||
| jsonschema.validate(environment_config, schema) | ||
|
|
||
| sentry_sdk_config = environment_config.get("sentry_sdk_config") | ||
| if sentry_sdk_config: | ||
| sentry_sdk.init(dsn=sentry_sdk_config["dsn"]) | ||
|
|
||
| return load_runtime(name, log_level, adapter, segment_id, application, environment_config) | ||
|
|
||
|
|
||
| def run_with_config_file( | ||
| name: str, | ||
| log_level: str, | ||
| adapter: str, | ||
| config: str, | ||
| segment_id: Optional[str], | ||
| application: str, | ||
| ) -> None: | ||
| """ | ||
| Load runtime from config file and run it. Used by the Python CLI. | ||
|
|
||
| NOTE: This function is separate from load_runtime_with_config_file() for a reason: | ||
| - load_runtime_with_config_file() returns the runtime WITHOUT calling .run() | ||
| - This allows the Rust CLI (run.rs) to call .run() itself | ||
| - Do NOT combine these functions - it would break the Rust CLI which needs to | ||
| control when .run() is called | ||
| """ | ||
| runtime = load_runtime_with_config_file( | ||
| name, log_level, adapter, config, segment_id, application | ||
| ) | ||
| runtime.run() | ||
|
|
||
|
|
||
| @click.command() | ||
| @click.option( | ||
| "--name", | ||
|
|
@@ -179,8 +238,7 @@ def main( | |
| segment_id: Optional[str], | ||
| application: str, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Click option name mismatches function parameter nameHigh Severity The Additional Locations (1) |
||
| ) -> None: | ||
| runtime = load_runtime(name, log_level, adapter, config, segment_id, application) | ||
| runtime.run() | ||
| run_with_config_file(name, log_level, adapter, config, segment_id, application) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -33,28 +33,22 @@ pub struct PyConfig { | |
|
|
||
| #[derive(Parser, Debug)] | ||
| pub struct RuntimeConfig { | ||
| /// The name of the Sentry Streams application | ||
| #[arg(short, long)] | ||
| pub name: String, | ||
|
|
||
| /// The name of the Sentry Streams application | ||
| #[arg(short, long)] | ||
| pub log_level: String, | ||
|
|
||
| /// The name of the adapter | ||
| #[arg(short, long)] | ||
| pub adapter_name: String, | ||
| pub adapter: String, | ||
|
|
||
| /// The deployment config file path. Each config file currently corresponds to a specific pipeline. | ||
| #[arg(short, long)] | ||
| pub config_file: OsString, | ||
| pub config: OsString, | ||
|
|
||
| /// The segment id to run the pipeline for | ||
| #[arg(short, long)] | ||
| pub segment_id: Option<String>, | ||
|
|
||
| /// The name of the application | ||
| pub application_name: String, | ||
| pub application: String, | ||
| } | ||
|
|
||
| pub fn run(args: Args) -> Result<(), Box<dyn std::error::Error>> { | ||
|
|
@@ -76,14 +70,14 @@ pub fn run(args: Args) -> Result<(), Box<dyn std::error::Error>> { | |
| let runtime: Py<PyAny> = traced_with_gil!(|py| { | ||
| let runtime = py | ||
| .import("sentry_streams.runner")? | ||
| .getattr("load_runtime")? | ||
| .getattr("load_runtime_with_config_file")? | ||
| .call1(( | ||
| runtime_config.name, | ||
| runtime_config.log_level, | ||
| runtime_config.adapter_name, | ||
| runtime_config.config_file, | ||
| runtime_config.adapter, | ||
| runtime_config.config, | ||
| runtime_config.segment_id, | ||
| runtime_config.application_name, | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rust calls
|
||
| runtime_config.application, | ||
| ))? | ||
| .unbind(); | ||
| PyResult::Ok(runtime) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| import sentry_sdk | ||
|
|
||
| # Initialize customer's Sentry SDK in the subprocess | ||
| sentry_sdk.init(dsn="https://customer@example.com/123") | ||
|
|
||
| from sentry_streams.pipeline import streaming_source | ||
|
|
||
| # Intentionally not defining 'pipeline' variable | ||
| my_pipeline = streaming_source(name="test", stream_name="test-stream") |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| from sentry_streams.pipeline import streaming_source | ||
| from sentry_streams.pipeline.pipeline import DevNullSink | ||
|
|
||
| pipeline = streaming_source(name="test", stream_name="test-stream").sink(DevNullSink("test-sink")) |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| sentry_sdk_config: | ||
| dsn: "https://platform@example.com/456" | ||
| metrics: | ||
| type: dummy |


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Platform Sentry not initialized in subprocess
Medium Severity
The platform's Sentry SDK is initialized with
streaming_platform_configDSN inrun_with_config_file(main process) but not in the_load_pipelinesubprocess where customer code executes. Exceptions occurring during pipeline loading in the subprocess won't be captured by the platform's Sentry, only propagating asCalledProcessErrorto the main process. This contradicts the config schema which provides platform-level DSN configuration, suggesting platform Sentry should monitor both processes.Additional Locations (1)
sentry_streams/sentry_streams/runner.py#L157-L160