Skip to content

Element84/terraform-aws-cirrus-processing

Repository files navigation

Cirrus Processing Terraform Module

Deploy and manage Cirrus processing infrastructure on AWS with Terraform.

Overview

This module creates the AWS resources that comprise Cirrus processing components - feeders, batch compute environments, workflows, and their associated tasks - via a flexible interface that reduces resource complexity.

Components are configured via YAML definition files and can be deployed together or split across separate terraform states, depending on your needs:

  • Single deployment - manage all processing components in one terraform state. Suitable for smaller deployments.
  • Partitioned by component type - manage feeders, compute environments, and workflows in separate states or repositories. Reduces blast radius; for example, isolating compute environments prevents accidental redeployment of batch infrastructure when iterating on workflow definitions.
  • Partitioned by component instance - each feeder, compute environment, or workflow in its own deployment. Enables rapid iteration with focused changes.
  • Any combination of the above - the only requirement is that the names of your component instances do not collide.

Note: using YAML definitions is not strictly necessary, though suggested. The component modules (feeder, compute, and workflow) can be targeted directly if an HCL interface is preferred.

Getting Started

Prerequisites

  • Your target AWS account and region has an active cirrus-core deployment. This module will query SSM for Cirrus deployment parameters.
  • You have a VPC, subnets, and one or more security groups already deployed.

Usage

  1. Create a directory named cirrus (convention, not requirement) at the root of your terraform deployment repository

  2. Under this new directory, create subdirectories for whichever components you'll be deploying. Any of these may be omitted (or set to null in the next step) to skip that component type entirely:

    • feeders: for feeder definitions
    • compute: for batch compute environment definitions
    • workflows: for workflow definitions
  3. Add the module to your terraform configuration, pointing the *_definitions_dir inputs at the directories created above:

    module "cirrus_processing" {
      source = "git::https://github.com/Element84/filmdrop-aws-tf-modules.git//modules/cirrus-processing?ref=main"
    
      feeder_definitions_dir   = "cirrus/feeders"
      compute_definitions_dir  = "cirrus/compute"
      workflow_definitions_dir = "cirrus/workflows"
    
      # These aren't being used yet - you'll update them later as needed:
      feeder_definitions_variables   = {}
      compute_definitions_variables  = {}
      workflow_definitions_variables = {}
    
      ... other required variables ...
    }
  4. The module now knows where to glob for definition.yaml files at terraform runtime. The next step is to create some definitions.

Note: The directory paths used in this guide (cirrus/feeders, cirrus/compute, cirrus/workflows) are conventions, not requirements. You may use any paths you like as long as the module input variables point to them.

Example Configurations

Usage Patterns

Example module invocations are provided in examples/usage-patterns/ showing different ways to structure deployments depending on your needs. These examples focus on how to invoke the module, not on definition content - see Definition Examples for example definition.yaml files.

Definition Examples

Example definition.yaml files are provided in examples/definitions/ for each component type. Each file is annotated with usage notes and a corresponding *_definitions_variables snippet.

Feeders

Compute

Workflows

State Machine Snippets

Copy-paste starting points for individual states within a workflow's state-machine.json.

Cirrus Task Infrastructure vs STAC Task Code

This module configures the infrastructure component for cirrus feeders, compute, tasks, and workflows. It does not build STAC task code into ZIP archives or Docker images; it only creates the AWS infrastructure and resources necessary to deploy existing artifacts into your Cirrus AWS environment(s).

The code component for your STAC tasks and feeders must already exist in a different repository, and that repository is responsible for building and pushing your versioned artifacts to somewhere this repository's deployment IAM Role has permission to access, such as:

  • Docker images for Cirrus tasks and feeders should be in an AWS Elastic Container Registry (ECR) accessible by the deployment role.
  • ZIP files for Cirrus tasks or feeders should be in an S3 Bucket accessible by the deployment role.

You must do one of the following for each Cirrus task or feeder you'll be deploying:

  • Either identify an existing STAC task or feeder repository that produces one of the artifacts listed above; or
  • create your own repository instead (see cirrus-task-example).

Once you've done this, you're ready to continue.

How Cirrus Terraform YAML Configuration Works

YAML is used to facilitate splitting configuration objects into individual files for improved readability/maintainability since terraform does not allow reading an arbitrary number of HCL files into a list of objects. This also enables a single configuration set with abstraction via parameterization.

The configuration files you'll be writing are YAML representations of the expected input HCL objects for each of following:

  • feeder module's feeder_config input variable
  • compute module's compute_config input variable
  • workflow module's workflow_config input variable

These YAML files are automatically decoded into HCL via the yaml-def-loader submodule.

During the execution of any terraform command that reads input configurations, such as terraform plan or terraform apply, the yaml-def-loader submodule will glob the specified definition directories to gather the YAML configuration files you've created according to this guide into collections of HCL-equivalent objects. The HCL objects are then passed into the applicable modules.

Those modules will then provision the appropriate set of AWS resources for that configuration.

Template Variables

Templating in YAML Definitions and State Machine JSONs

Configurations for the component modules may have slight variations across deployment environments; for example, a task in a dev environment will likely be writing to a different S3 bucket than the same task in prod. Because of this, Cirrus YAML definitions support environment-specific template variables via the yaml-def-loader submodule. This templating mechanism also applies to workflow state machine JSON files, making it possible to parameterize resource ARNs and other values that vary by environment directly within the state machine definition.

Consider the following workflow definition.yaml snippet that allows a task to write to a specific bucket:

# ...
tasks:
  - name: task-example
    type: batch
    role_statements:
      - sid: AllowWriteDataBucket
        effect: Allow
        actions:
          - s3:ListBucket
          - s3:PutObject
        resources:
          - arn:aws:s3:::my-dev-bucket # hardcoded - bad
          - arn:aws:s3:::my-dev-bucket/* # hardcoded - bad
# ...

If you want to make the target bucket environment-specific, you should replace the bucket name with a terraform interpolation sequence (${...}) instead:

# ...
tasks:
  - name: task-example
    type: batch
    role_statements:
      - sid: AllowWriteDataBucket
        effect: Allow
        actions:
          - s3:ListBucket
          - s3:PutObject
        resources:
          - arn:aws:s3:::${my-workflow.task-example.data_bucket} # templated - good
          - arn:aws:s3:::${my-workflow.task-example.data_bucket}/* # templated - good
# ...

Given the example above, you now need to configure what the yaml-def-loader module should replace ${my-workflow.task-example.data_bucket} with. This is configured in the input variable file specific to each environment, e.g., inputs/dev/cirrus.tfvars, inputs/prod/cirrus.tfvars, etc. You will need to open each of those input variable files, find the workflow_definitions_variables map, and add an entry for your template variable(s).

For example, an inputs/dev/cirrus.tfvars file could look something like:

workflow_definitions_variables = {
  my-workflow = {
    task-example = {
      data_bucket = "my-dev-bucket"
    }
    ...
  }
  ...
}

And the inputs/prod/cirrus.tfvars file could look something like:

workflow_definitions_variables = {
  my-workflow = {
    task-example = {
      data_bucket = "my-prod-bucket"
    }
    ...
  }
  ...
}

And so on for each environment.

The same pattern applies to all definition types. For each interpolation sequence in a YAML definition (and state machine JSON, for workflows), add the corresponding entry to feeder_definitions_variables, compute_definitions_variables, or workflow_definitions_variables as appropriate for each environment-specific input variable file.

Templating Task Resource ARNs in State Machine JSONs

Each state in your workflow's state machine will typically reference a Cirrus task's associated AWS resource ARN, such as a lambda function ARN or batch job queue and job definition ARNs. Since these AWS resources are managed by terraform, their ARNs are not immediately available to you without deploying them first and then hardcoding their values into the state machine JSON. This is not ideal.

Instead, template variables are used in place of hardcoded ARNs. These template variables are interpolation sequences that will always have the following form:

${tasks.TASK-NAME.TASK-TYPE.TASK-ATTR}

Where:

tasks     : static namespace for Cirrus task outputs
TASK-NAME : name of the Cirrus task
TASK-TYPE : one of [lambda, batch]
TASK-ATTR : one of [function_arn, job_definition_arn, job_queue_arn]

For example:

${tasks.my-example-task.lambda.function_arn}
${tasks.my-other-task.batch.job_queue_arn}
${tasks.my-other-task.batch.job_definition_arn}

Unlike the normal user-provided template variables defined in workflow_definitions_variables, you do not need to provide the values these template variables get interpolated into. The workflow terraform module will use these variables as a lookup into a task's output resources and replace each variable with the proper resource's ARN at terraform runtime.

Batch task states also use pre-batch and post-batch lambdas that are automatically created for you, referenced via the builtin namespace:

${builtin.CIRRUS_PRE_BATCH_LAMBDA_ARN}
${builtin.CIRRUS_POST_BATCH_LAMBDA_ARN}

You do not need to define these as tasks in your workflow definition.

Template Variable Considerations

  • All Cirrus SSM parameters are accessible as predefined builtin variables. You don't need to add entries in your variable files for these. Examples:
    • ${builtin.CIRRUS_DATA_BUCKET}
    • ${builtin.CIRRUS_PAYLOAD_BUCKET}
    • ${builtin.CIRRUS_PRE_BATCH_LAMBDA_ARN}
    • ${builtin.CIRRUS_POST_BATCH_LAMBDA_ARN}
    • ${builtin.CIRRUS_EVENT_DB_AND_TABLE}
    • ${builtin.CIRRUS_PROCESS_QUEUE_URL}
    • ${builtin.CIRRUS_PROCESS_QUEUE_ARN}
    • ${builtin.CIRRUS_WORKFLOW_EVENT_TOPIC_ARN}
    • ${builtin.CIRRUS_LOG_LEVEL}
    • ${builtin.CIRRUS_STATE_DB}
    • ${builtin.CIRRUS_BASE_WORKFLOW_ARN}
    • ${builtin.CIRRUS_PREFIX}
    • ${builtin.CIRRUS_CLI_IAM_ARN}
  • Organize namespace keys consistently to avoid configuration errors. Common conventions include:
    • Using a shared key for cross-definition values (e.g., a shared ECR registry hostname or API URL) if you are deploying multiple instances of a component.
    • Having definition-specific namespaces (e.g., my-workflow, my-compute) for per-definition variables.
  • The templated value should be a primitive type (string, number, bool). Avoid complex or nested types as they can easily cause issues during templating.

Public vs Private Subnets

When designing Cirrus workflows and tasks, be mindful of where you place compute. Improper configuration can result in significant unexpected charges from data ingress/egress charges or internet accessibility issues.

The following table outlines behavior between resource and subnet types:

Resource Type Subnet Type Public IP? Internet/ECR Access Additional Cost Concern
EC2 Instance Public Yes OK (via IGW) Public IPv4 Charge (~$3.60/mo); standard egress rates apply¹
Fargate Task Public Yes OK (via IGW) Public IPv4 Charge (~$3.60/mo); standard egress rates apply¹
Standard Lambda N/A AWS Managed OK (Managed) Standard Lambda rates; standard egress rates apply¹
VPC Lambda Public No FAIL (No Public IP) N/A
Any VPC Resource Private No OK (via NAT) NAT Processing (~$0.045/GB in + out)¹ ², plus standard egress

¹ Standard AWS data transfer out charges (~$0.09/GB) apply to all configurations. Inbound data transfer from the internet is free for all resource types.

² NAT processing charges are incurred in addition to standard data transfer out charges, not instead of them.

While deploying compute resources into private subnets is generally preferred for security, it introduces significant cost considerations for data-intensive processing.

NAT gateways provide internet access for resources in private subnets by routing outbound traffic through a managed gateway in a public subnet. These gateways charge ~$0.045/GB for data processed, regardless of direction. For tasks that are routinely working with large data volumes, consider placing them in a public subnet instead; this bypasses NAT entirely. If you do, just ensure the security groups you specify are sufficiently locked down on ingress to prevent exposure.

Note that lambdas cannot be both in a VPC and in a public subnet as they cannot have public IPs assigned to them.

Always attach an S3 gateway endpoint. It's free, routes traffic over the AWS private network, and bypasses NAT regardless of subnet type. Other interface endpoints can be used as needed, though they have a small cost.

Creating Cirrus Workflows

Cirrus workflows are pipelines of one or more Cirrus tasks run via AWS State Machines. An AWS state machine is defined by a JSON file that uses the AWS States Language. See the AWS State Machine documentation for a deeper technical dive.

Creating a Workflow

  1. Determine a name for your workflow.
    • Use lowercase alphanumerics and hyphens.
    • As a best practice, do not use workflow, cirrus, stac, or task in the name unless they mean something other than "this is a workflow for Cirrus STAC tasks".
  2. Create a new directory at cirrus/workflows/<your-workflow-name>
  3. Create a definition.yaml in your workflow directory.
  4. Create a state-machine.json in your workflow directory.

Adding Tasks to a Workflow

Tasks are defined inline in the tasks: list within the workflow's definition.yaml. Task infrastructure is specific to the containing workflow.

Choosing Lambda or Batch

Each Cirrus task's infrastructure must be configured as either a lambda function or a batch job; the two types are mutually exclusive. Consider the following when determining the appropriate type for your task:

  • Lambda:
    • Pros: easier to configure, easier to manage, shorter startup times, and generally cheaper.
    • Cons: limited run duration, limited vCPU, limited memory, limited storage (without resorting to EFS, which this module does not currently support), and more. Full set of limitations here.
    • Format: requires your STAC task to be packaged as a lambda-compatible ZIP archive or Docker image.
    • Suitable for: simple utility tasks or operations that are guaranteed to always complete within the lambda limits.
  • Batch:
    • Pros: unbounded run duration with a wide range of vCPU, GPU, memory, and storage options.
    • Cons: increased configuration complexity, increased operational overhead, potentially longer task startup times, and generally more expensive (can be mitigated).
    • Format: requires your STAC task to be packaged as a Docker image that implements the stac-task CLI interface as the container entrypoint.
    • Suitable for: operations that are long-running or have consistently high resource requirements that lambda cannot provide.

Regardless of which type you choose, it can always be changed later.

Lambda Tasks

  1. Determine a name for your task.
    • Use lowercase alphanumerics and hyphens.
    • As a best practice, do not use lambda, cirrus, stac, or task in the name unless they mean something other than "this is a lambda Cirrus STAC task".
  2. Add a task entry to the tasks: list of your workflow's definition.yaml, specifying type: lambda and the lambda: configuration block.

Batch Tasks

Batch tasks require a compute environment to be configured first. If you already have one applicable to your target task, note its name. Otherwise, see Batch Compute Environments below to create one, then return here. Prefer reusing compute environments when possible - the default quota is 50 per region.

  1. Determine a name for your task.
    • Use lowercase alphanumerics and hyphens.
    • As a best practice, do not use batch, cirrus, stac, or task in the name unless they mean something other than "this is a batch Cirrus STAC task".
  2. Add a task entry to the tasks: list of your workflow's definition.yaml, specifying type: batch and the batch: configuration block. Reference the compute environment by its definition name via batch.compute_name.

Batch Compute Environments

Batch compute environments are provisioned independently from workflows so they can be shared across multiple workflows.

  1. Determine a name for your compute environment.
    • Use lowercase alphanumerics and hyphens.
    • As a best practice, do not use cirrus, stac, or task in the name unless they mean something other than "this is compute for a Cirrus STAC task".
  2. Create a new directory at cirrus/compute/<your-compute-name>
  3. Create a definition.yaml in your compute directory.

Configuring a Workflow State Machine

Basic Structure

All workflows share some common structure within the state machine definition:

  • A Comment provides a short description of the workflow
  • At least one task State defined
  • StartAt set to the first state in the state machine
  • At least one state with End: true representing a successful completion of the workflow
  • A State of type Fail to which all workflow states will go on fatal error
    • Doing so necessitates that each state properly catches all errors and defines the Fail state as the next step in the event of an error (with some exceptions, like batch tasks, which are part of a larger block of connected states - see Batch Tasks in Workflows)

Sample State Snippets

State machine definitions often contain some level of boilerplate, especially for State objects. See the following for concrete examples of lambda and batch task states:

Whenever you need to create a State object in your state machine JSON:

  1. Copy/paste an appropriate state from one of the examples above.
  2. Indent appropriately.
  3. Replace the task name with your task's name throughout.
  4. Choose if the state is terminal by keeping Next or End; if Next, you will need to replace the next state name with the correct following state.
  5. Update the Retry and Catch blocks to something suitable for your target Cirrus task. The blocks in these snippets are best-practice starting points, but you may need to extend or modify them to better suit your needs.
  6. Modify remaining configuration as necessary.

Batch Tasks in Workflows

Batch task states are more complex than lambda task states because AWS Batch's SubmitJob API only accepts flat key/value string pairs as job parameters - it has no mechanism to pass a structured JSON payload directly the way Lambda does. As a result, Cirrus payloads must be staged to S3 first and passed to the batch job as an S3 URL. Each batch task must therefore be wrapped in a Parallel block with three states:

  1. A pre-batch lambda that writes the Cirrus payload to S3 and passes the resulting S3 URL to the batch job.
  2. The batch job itself, which processes the data and writes its output payload to a different S3 location.
  3. A post-batch lambda that reads the output payload from S3 and passes it forward in the workflow.

The Parallel block is only used to ensure an all-or-nothing execution of the three tasks; it does not actually run other tasks in parallel (unless you configure it to do so).

Convenience implementations of the pre-batch and post-batch lambdas are provided automatically and referenced via the builtin namespace (see above). These cover the standard Cirrus payload handoff pattern; if your use case requires different behavior, you can substitute your own lambda implementations and creating them as tasks.

See the cirrus-geo batch tasks documentation for more information.

Workflow Best Practices

See the cirrus-geo workflow documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors