Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c15e331
sdks/python: properly make milvus as extra dependency
mohamedawnallah Aug 16, 2025
316a41f
sdks/python: update image requirements
mohamedawnallah Aug 20, 2025
3ca5394
.github: trigger postcommit python
mohamedawnallah Aug 20, 2025
21262a6
sdks/python: fix linting issues
mohamedawnallah Aug 16, 2025
1655458
sdks/python: fix formatting issues
mohamedawnallah Aug 16, 2025
6ed396d
.github: trigger beam postcommit python
mohamedawnallah Aug 20, 2025
b80508f
Merge remote-tracking branch 'upstream/master' into properlyAddMilvus…
mohamedawnallah Aug 25, 2025
6bcb214
sdks/python: revert milvus version in itests
mohamedawnallah Aug 25, 2025
1e9f9fa
sdks/python: update image requirements
mohamedawnallah Aug 25, 2025
4da18cf
trigger_files: trigger postcommit python
mohamedawnallah Aug 25, 2025
0cee2e0
Bump github.com/docker/go-connections from 0.5.0 to 0.6.0 in /sdks (#…
dependabot[bot] Aug 26, 2025
bba4f93
Add the readme link to new YAML examples (#35941)
chamikaramj Aug 26, 2025
fa5f7d1
Bump google.golang.org/api from 0.247.0 to 0.248.0 in /sdks (#35969)
dependabot[bot] Aug 26, 2025
4c5cada
Remove mysql-connector-python dependency (#35932)
Abacn Aug 27, 2025
81e4db7
Fix typos and update test implementation from #35656 (#35958)
kristynsmith Aug 27, 2025
cfd07be
feat(mongodb): upgrade MongoDB Java driver to version 5.5.0 (#35946)
liferoad Aug 27, 2025
7f90455
Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#35974)
dependabot[bot] Aug 27, 2025
4c97993
Bump google.golang.org/grpc from 1.74.2 to 1.75.0 in /sdks (#35971)
dependabot[bot] Aug 27, 2025
62cbf83
Override localhost endpoint when a worker is running in docker on mac…
shunping Aug 27, 2025
2bcca48
fix(parquetio): handle missing nullable fields in row conversion (#35…
liferoad Aug 27, 2025
061191f
Bump cloud.google.com/go/storage from 1.56.0 to 1.56.1 in /sdks (#35980)
dependabot[bot] Aug 27, 2025
a7e2ac3
[Prism] Fix segv when docker container self-terminated. (#35977)
shunping Aug 27, 2025
09beeaa
add a jinja % include/import pipeline example to docs (#35931)
derrickaw Aug 27, 2025
4549214
Bump github.com/aws/aws-sdk-go-v2/config from 1.31.2 to 1.31.3 in /sd…
dependabot[bot] Aug 27, 2025
8c6ff9a
Add a security GCP log analyzer (#35922)
ksobrenat32 Aug 27, 2025
a7ec1ae
update py containers (#35982)
ahmedabu98 Aug 27, 2025
ac00807
[YAML]: add import jinja pipeline example (#35945)
derrickaw Aug 27, 2025
bf16b25
workflows: capture DinD tests in PreCommit Py Coverage workflow
mohamedawnallah Aug 25, 2025
984c0a0
workflows: temporarily removing `ubuntu-latest` till resolving deps
mohamedawnallah Aug 25, 2025
4c8c06e
workflows: add `matrix.os` label to `beam_PreCommit_Python_Coverage`
mohamedawnallah Aug 26, 2025
cadea03
Merge remote-tracking branch 'upstream/master' into properlyAddMilvus…
mohamedawnallah Aug 27, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/trigger_files/beam_PostCommit_Python.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"comment": "Modify this file in a trivial way to cause this test suite to run.",
"modification": 33
"modification": 27
}

77 changes: 77 additions & 0 deletions .github/workflows/beam_Infrastructure_SecurityLogging.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# This workflow works with the GCP security log analyzer to
# generate weekly security reports and initialize log sinks

name: GCP Security Log Analyzer

on:
workflow_dispatch:
schedule:
# Once a week at 9:00 AM on Monday
- cron: '0 9 * * 1'
push:
paths:
- 'infra/security/config.yml'

# This allows a subsequently queued workflow run to interrupt previous runs
concurrency:
group: '${{ github.workflow }} @ ${{ github.sha || github.head_ref || github.ref }}-${{ github.event.schedule || github.event.sender.login }}'
cancel-in-progress: true

#Setting explicit permissions for the action to avoid the default permissions which are `write-all` in case of pull_request_target event
permissions:
contents: read

jobs:
beam_GCP_Security_LogAnalyzer:
name: GCP Security Log Analysis
runs-on: [self-hosted, ubuntu-20.04, main]
timeout-minutes: 30
steps:
- uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.13'

- name: Install Python dependencies
working-directory: ./infra/security
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

- name: Setup gcloud
uses: google-github-actions/setup-gcloud@v2

- name: Initialize Log Sinks
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch'
working-directory: ./infra/security
run: python log_analyzer.py --config config.yml initialize

- name: Generate Weekly Security Report
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
working-directory: ./infra/security
env:
SMTP_SERVER: smtp.gmail.com
SMTP_PORT: 465
EMAIL_ADDRESS: ${{ secrets.ISSUE_REPORT_SENDER_EMAIL_ADDRESS }}
EMAIL_PASSWORD: ${{ secrets.ISSUE_REPORT_SENDER_EMAIL_PASSWORD }}
EMAIL_RECIPIENT: "dev@beam.apache.org"
run: python log_analyzer.py --config config.yml generate-report --dry-run
37 changes: 27 additions & 10 deletions .github/workflows/beam_PreCommit_Python_Coverage.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,35 +58,45 @@ env:

jobs:
beam_PreCommit_Python_Coverage:
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
runs-on: [self-hosted, ubuntu-20.04, highmem]
name: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }}) (${{ join(matrix.os, ', ') }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
job_name: [beam_PreCommit_Python_Coverage]
job_phrase: [Run Python_Coverage PreCommit]
python_version: ['3.9']
# Run on both self-hosted and GitHub-hosted runners.
# Some tests (marked require_docker_in_docker) can't run on Beam's
# self-hosted runners due to Docker-in-Docker environment constraint.
# These tests will only execute on ubuntu-latest (GitHub-hosted).
# Context: https://github.com/apache/beam/pull/35585
# Temporary removed the ubuntu-latest env till resolving deps issues.
os: [[self-hosted, ubuntu-20.04, highmem]]
timeout-minutes: 180
if: |
github.event_name == 'push' ||
github.event_name == 'pull_request_target' ||
(github.event_name == 'schedule' && github.repository == 'apache/beam') ||
github.event_name == 'workflow_dispatch' ||
github.event.comment.body == 'Run Python_Coverage PreCommit'
startswith(github.event.comment.body, 'Run Python_Coverage PreCommit 3.')
steps:
- uses: actions/checkout@v4
- name: Setup repository
uses: ./.github/actions/setup-action
with:
comment_phrase: ${{ matrix.job_phrase }}
comment_phrase: ${{ matrix.job_phrase }} ${{ matrix.python_version }}
github_token: ${{ secrets.GITHUB_TOKEN }}
github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }})
github_job: ${{ matrix.job_name }} (${{ matrix.job_phrase }} ${{ matrix.python_version }}) (${{ join(matrix.os, ', ') }})
- name: Setup environment
uses: ./.github/actions/setup-environment-action
with:
java-version: default
python-version: default
python-version: ${{ matrix.python_version }}
- name: Start DinD
uses: ./.github/actions/dind-up-action
id: dind
if: contains(matrix.os, 'self-hosted')
with:
# Enable all the new features
cleanup-dind-on-start: "true"
Expand All @@ -97,9 +107,9 @@ jobs:
export-gh-env: "true"
- name: Run preCommitPyCoverage
env:
DOCKER_HOST: ${{ steps.dind.outputs.docker-host }}
DOCKER_HOST: ${{ contains(matrix.os, 'self-hosted') && steps.dind.outputs.docker-host || '' }}
TOX_TESTENV_PASSENV: "DOCKER_*,TESTCONTAINERS_*,TC_*,BEAM_*,GRPC_*,OMP_*,OPENBLAS_*,PYTHONHASHSEED,PYTEST_*"
TESTCONTAINERS_HOST_OVERRIDE: ${{ env.DIND_IP }}
TESTCONTAINERS_HOST_OVERRIDE: ${{ contains(matrix.os, 'self-hosted') && env.DIND_IP || '' }}
TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE: "/var/run/docker.sock"
TESTCONTAINERS_RYUK_DISABLED: "false"
TESTCONTAINERS_RYUK_CONTAINER_PRIVILEGED: "true"
Expand All @@ -110,6 +120,12 @@ jobs:
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:python:test-suites:tox:py39:preCommitPyCoverage
arguments: |
-Pposargs="${{
contains(matrix.os, 'self-hosted') &&
'-m (not require_docker_in_docker)' ||
'-m require_docker_in_docker'
}}"
- uses: codecov/codecov-action@v3
with:
flags: python
Expand All @@ -118,7 +134,7 @@ jobs:
uses: actions/upload-artifact@v4
if: failure()
with:
name: Python Test Results
name: Python ${{ matrix.python_version }} Test Results (${{ join(matrix.os, ', ') }})
path: '**/pytest*.xml'
- name: Publish Python Test Results
env:
Expand All @@ -129,4 +145,5 @@ jobs:
commit: '${{ env.prsha || env.GITHUB_SHA }}'
comment_mode: ${{ github.event_name == 'issue_comment' && 'always' || 'off' }}
files: '**/pytest*.xml'
large_files: true
large_files: true
check_name: "Python ${{ matrix.python_version }} Test Results (${{ join(matrix.os, ', ') }})"
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,9 @@ class BeamModulePlugin implements Plugin<Project> {
log4j2_log4j12_api : "org.apache.logging.log4j:log4j-1.2-api:$log4j2_version",
mockito_core : "org.mockito:mockito-core:4.11.0",
mockito_inline : "org.mockito:mockito-inline:4.11.0",
mongo_java_driver : "org.mongodb:mongo-java-driver:3.12.11",
mongo_java_driver : "org.mongodb:mongodb-driver-sync:5.5.0",
mongo_bson : "org.mongodb:bson:5.5.0",
mongodb_driver_core : "org.mongodb:mongodb-driver-core:5.5.0",
nemo_compiler_frontend_beam : "org.apache.nemo:nemo-compiler-frontend-beam:$nemo_version",
netty_all : "io.netty:netty-all:$netty_version",
netty_handler : "io.netty:netty-handler:$netty_version",
Expand Down
84 changes: 84 additions & 0 deletions infra/security/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# GCP Security Analyzer

This document describes the implementation of a security analyzer for Google Cloud Platform (GCP) resources. The analyzer is designed to enhance security monitoring within our GCP environment by capturing critical events and generating alerts for specific security-sensitive actions.

## How It Works

1. **Log Sinks**: The system uses [GCP Log Sinks](https://cloud.google.com/logging/docs/export/configure_export_v2) to capture specific security-related log entries. These sinks are configured to filter for events like IAM policy changes or service account key creation.
2. **Log Storage**: The filtered logs are routed to a dedicated Google Cloud Storage (GCS) bucket for persistence and analysis.
3. **Report Generation**: A scheduled job runs weekly, executing the `log_analyzer.py` script.
4. **Email Alerts**: The script analyzes the logs from the past week, compiles a summary of security events, and sends a report to a configured email address.

## Configuration

The behavior of the log analyzer is controlled by a `config.yml` file. Here’s an overview of the configuration options:

- `project_id`: The GCP project ID where the resources are located.
- `bucket_name`: The name of the GCS bucket where logs will be stored.
- `logging`: Configures the logging level and format for the script.
- `sinks`: A list of log sinks to be created. Each sink has the following properties:
- `name`: A unique name for the sink.
- `description`: A brief description of what the sink monitors.
- `filter_methods`: A list of GCP API methods to include in the filter (e.g., `SetIamPolicy`).
- `excluded_principals`: A list of service accounts or user emails to exclude from monitoring, such as CI/CD service accounts.

### Example Configuration (`config.yml`)

```yaml
project_id: your-gcp-project-id
bucket_name: your-log-storage-bucket

sinks:
- name: iam-policy-changes
description: Monitors changes to IAM policies.
filter_methods:
- "SetIamPolicy"
excluded_principals:
- "ci-cd-account@your-project.iam.gserviceaccount.com"
```

## Usage

The `log_analyzer.py` script provides two main commands for managing the security analyzer.

### Initializing Sinks

To create or update the log sinks in GCP based on your `config.yml` file, run the following command:

```bash
python log_analyzer.py --config config.yml initialize
```

This command ensures that the log sinks are correctly configured to capture the desired security events.

### Generating Weekly Reports

To generate and send the weekly security report, run this command:

```bash
python log_analyzer.py --config config.yml generate-report
```

This is typically run as a scheduled job (GitHub Action) to automate the delivery of weekly security reports.



43 changes: 43 additions & 0 deletions infra/security/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

project_id: testing-me-460223

# Logging
logging:
level: DEBUG
format: "[%(asctime)s] %(levelname)s: %(message)s"

# gcloud storage bucket
bucket_name: "testing-me-460223-tfstate"

# GCP Log sinks
sinks:
- name: iam-policy-changes
description: Monitors changes to IAM policies, excluding approved CI/CD service accounts.
filter_methods:
- "SetIamPolicy"
excluded_principals:
- beam-github-actions@apache-beam-testing.iam.gserviceaccount.com
- github-self-hosted-runners@apache-beam-testing.iam.gserviceaccount.com

- name: sa-key-management
description: Monitors creation and deletion of service account keys.
filter_methods:
- "google.iam.admin.v1.IAM.CreateServiceAccountKey"
- "google.iam.admin.v1.IAM.DeleteServiceAccountKey"
excluded_principals:
- beam-github-actions@apache-beam-testing.iam.gserviceaccount.com
- github-self-hosted-runners@apache-beam-testing.iam.gserviceaccount.com
Loading
Loading