Deploy and manage Cirrus processing infrastructure on AWS with Terraform.
This module creates the AWS resources that comprise Cirrus processing components - feeders, batch compute environments, workflows, and their associated tasks - via a flexible interface that reduces resource complexity.
Components are configured via YAML definition files and can be deployed together or split across separate terraform states, depending on your needs:
- Single deployment - manage all processing components in one terraform state. Suitable for smaller deployments.
- Partitioned by component type - manage feeders, compute environments, and workflows in separate states or repositories. Reduces blast radius; for example, isolating compute environments prevents accidental redeployment of batch infrastructure when iterating on workflow definitions.
- Partitioned by component instance - each feeder, compute environment, or workflow in its own deployment. Enables rapid iteration with focused changes.
- Any combination of the above - the only requirement is that the names of your component instances do not collide.
Note: using YAML definitions is not strictly necessary, though suggested. The component modules
(feeder, compute, and workflow) can be targeted directly if an HCL interface is preferred.
- Your target AWS account and region has an active cirrus-core deployment. This module will query SSM for Cirrus deployment parameters.
- You have a VPC, subnets, and one or more security groups already deployed.
-
Create a directory named
cirrus(convention, not requirement) at the root of your terraform deployment repository -
Under this new directory, create subdirectories for whichever components you'll be deploying. Any of these may be omitted (or set to
nullin the next step) to skip that component type entirely:feeders: for feeder definitionscompute: for batch compute environment definitionsworkflows: for workflow definitions
-
Add the module to your terraform configuration, pointing the
*_definitions_dirinputs at the directories created above:module "cirrus_processing" { source = "git::https://github.com/Element84/filmdrop-aws-tf-modules.git//modules/cirrus-processing?ref=main" feeder_definitions_dir = "cirrus/feeders" compute_definitions_dir = "cirrus/compute" workflow_definitions_dir = "cirrus/workflows" # These aren't being used yet - you'll update them later as needed: feeder_definitions_variables = {} compute_definitions_variables = {} workflow_definitions_variables = {} ... other required variables ... }
-
The module now knows where to glob for
definition.yamlfiles at terraform runtime. The next step is to create some definitions.
Note: The directory paths used in this guide (cirrus/feeders, cirrus/compute,
cirrus/workflows) are conventions, not requirements. You may use any paths you like as long as the
module input variables point to them.
Example module invocations are provided in examples/usage-patterns/ showing different ways to
structure deployments depending on your needs. These examples focus on how to invoke the module, not
on definition content - see Definition Examples for example definition.yaml files.
- examples/usage-patterns/monolith/ - all component types in a single invocation; simplest starting point
- examples/usage-patterns/single-component-type/ - deploy only one component type (e.g., compute) while another team owns the rest
- examples/usage-patterns/single-component-instance/ - deploy exactly one workflow; simplest standalone deployment
- examples/usage-patterns/private-and-public-subnets/ - split compute across public and private subnet tiers
- examples/usage-patterns/multi-region/ - deploy identical compute and workflows to two regions via aliased providers
Example definition.yaml files are provided in examples/definitions/ for each component type. Each
file is annotated with usage notes and a corresponding *_definitions_variables snippet.
- examples/definitions/feeders/basic-feeder/ - minimal SNS trigger; good starting point for most feeders
- examples/definitions/feeders/complex-feeder/ - multiple S3 and SNS triggers, SQS customization, role statements
- examples/definitions/compute/fargate-basic/ - minimal Fargate serverless; simplest starting point
- examples/definitions/compute/ec2-spot-processing/ - EC2 Spot with launch template, instance type selection, fair-share job queue
- examples/definitions/workflows/simple-lambda-example/ - single Lambda task
- examples/definitions/workflows/batch-workflow-example/ - Batch task on Fargate with container properties, role statements, and pre/post-batch structure
Copy-paste starting points for individual states within a workflow's state-machine.json.
- examples/state-machine-snippets/state-machine-lambda-state-snippet.txt - Lambda task state with retry and error handling
- examples/state-machine-snippets/state-machine-batch-state-snippet.txt - Batch task Parallel block with pre-batch, batch, and post-batch states
This module configures the infrastructure component for cirrus feeders, compute, tasks, and workflows. It does not build STAC task code into ZIP archives or Docker images; it only creates the AWS infrastructure and resources necessary to deploy existing artifacts into your Cirrus AWS environment(s).
The code component for your STAC tasks and feeders must already exist in a different repository, and that repository is responsible for building and pushing your versioned artifacts to somewhere this repository's deployment IAM Role has permission to access, such as:
- Docker images for Cirrus tasks and feeders should be in an AWS Elastic Container Registry (ECR) accessible by the deployment role.
- ZIP files for Cirrus tasks or feeders should be in an S3 Bucket accessible by the deployment role.
You must do one of the following for each Cirrus task or feeder you'll be deploying:
- Either identify an existing STAC task or feeder repository that produces one of the artifacts listed above; or
- create your own repository instead (see cirrus-task-example).
Once you've done this, you're ready to continue.
YAML is used to facilitate splitting configuration objects into individual files for improved readability/maintainability since terraform does not allow reading an arbitrary number of HCL files into a list of objects. This also enables a single configuration set with abstraction via parameterization.
The configuration files you'll be writing are YAML representations of the expected input HCL objects for each of following:
feedermodule'sfeeder_configinput variablecomputemodule'scompute_configinput variableworkflowmodule'sworkflow_configinput variable
These YAML files are automatically decoded into HCL via the yaml-def-loader submodule.
During the execution of any terraform command that reads input configurations, such as
terraform plan or terraform apply, the yaml-def-loader submodule will glob the specified
definition directories to gather the YAML configuration files you've created according to this guide
into collections of HCL-equivalent objects. The HCL objects are then passed into the applicable
modules.
Those modules will then provision the appropriate set of AWS resources for that configuration.
Configurations for the component modules may have slight variations across deployment environments;
for example, a task in a dev environment will likely be writing to a different S3 bucket than the
same task in prod. Because of this, Cirrus YAML definitions support environment-specific template
variables via the yaml-def-loader submodule. This templating mechanism also applies to workflow
state machine JSON files, making it possible to parameterize resource ARNs and other values that
vary by environment directly within the state machine definition.
Consider the following workflow definition.yaml snippet that allows a task to write to a specific
bucket:
# ...
tasks:
- name: task-example
type: batch
role_statements:
- sid: AllowWriteDataBucket
effect: Allow
actions:
- s3:ListBucket
- s3:PutObject
resources:
- arn:aws:s3:::my-dev-bucket # hardcoded - bad
- arn:aws:s3:::my-dev-bucket/* # hardcoded - bad
# ...If you want to make the target bucket environment-specific, you should replace the bucket name with
a terraform interpolation sequence (${...}) instead:
# ...
tasks:
- name: task-example
type: batch
role_statements:
- sid: AllowWriteDataBucket
effect: Allow
actions:
- s3:ListBucket
- s3:PutObject
resources:
- arn:aws:s3:::${my-workflow.task-example.data_bucket} # templated - good
- arn:aws:s3:::${my-workflow.task-example.data_bucket}/* # templated - good
# ...Given the example above, you now need to configure what the yaml-def-loader module should replace
${my-workflow.task-example.data_bucket} with. This is configured in the input variable file
specific to each environment, e.g., inputs/dev/cirrus.tfvars, inputs/prod/cirrus.tfvars, etc.
You will need to open each of those input variable files, find the workflow_definitions_variables
map, and add an entry for your template variable(s).
For example, an inputs/dev/cirrus.tfvars file could look something like:
workflow_definitions_variables = {
my-workflow = {
task-example = {
data_bucket = "my-dev-bucket"
}
...
}
...
}And the inputs/prod/cirrus.tfvars file could look something like:
workflow_definitions_variables = {
my-workflow = {
task-example = {
data_bucket = "my-prod-bucket"
}
...
}
...
}And so on for each environment.
The same pattern applies to all definition types. For each interpolation sequence in a YAML
definition (and state machine JSON, for workflows), add the corresponding entry to
feeder_definitions_variables, compute_definitions_variables, or workflow_definitions_variables
as appropriate for each environment-specific input variable file.
Each state in your workflow's state machine will typically reference a Cirrus task's associated AWS resource ARN, such as a lambda function ARN or batch job queue and job definition ARNs. Since these AWS resources are managed by terraform, their ARNs are not immediately available to you without deploying them first and then hardcoding their values into the state machine JSON. This is not ideal.
Instead, template variables are used in place of hardcoded ARNs. These template variables are interpolation sequences that will always have the following form:
${tasks.TASK-NAME.TASK-TYPE.TASK-ATTR}
Where:
tasks : static namespace for Cirrus task outputs
TASK-NAME : name of the Cirrus task
TASK-TYPE : one of [lambda, batch]
TASK-ATTR : one of [function_arn, job_definition_arn, job_queue_arn]
For example:
${tasks.my-example-task.lambda.function_arn}
${tasks.my-other-task.batch.job_queue_arn}
${tasks.my-other-task.batch.job_definition_arn}
Unlike the normal user-provided template variables defined in workflow_definitions_variables, you
do not need to provide the values these template variables get interpolated into. The workflow
terraform module will use these variables as a lookup into a task's output resources and replace
each variable with the proper resource's ARN at terraform runtime.
Batch task states also use pre-batch and post-batch lambdas that are automatically created
for you, referenced via the builtin namespace:
${builtin.CIRRUS_PRE_BATCH_LAMBDA_ARN}
${builtin.CIRRUS_POST_BATCH_LAMBDA_ARN}
You do not need to define these as tasks in your workflow definition.
- All Cirrus SSM parameters are accessible as predefined
builtinvariables. You don't need to add entries in your variable files for these. Examples:${builtin.CIRRUS_DATA_BUCKET}${builtin.CIRRUS_PAYLOAD_BUCKET}${builtin.CIRRUS_PRE_BATCH_LAMBDA_ARN}${builtin.CIRRUS_POST_BATCH_LAMBDA_ARN}${builtin.CIRRUS_EVENT_DB_AND_TABLE}${builtin.CIRRUS_PROCESS_QUEUE_URL}${builtin.CIRRUS_PROCESS_QUEUE_ARN}${builtin.CIRRUS_WORKFLOW_EVENT_TOPIC_ARN}${builtin.CIRRUS_LOG_LEVEL}${builtin.CIRRUS_STATE_DB}${builtin.CIRRUS_BASE_WORKFLOW_ARN}${builtin.CIRRUS_PREFIX}${builtin.CIRRUS_CLI_IAM_ARN}
- Organize namespace keys consistently to avoid configuration errors. Common conventions include:
- Using a
sharedkey for cross-definition values (e.g., a shared ECR registry hostname or API URL) if you are deploying multiple instances of a component. - Having definition-specific namespaces (e.g.,
my-workflow,my-compute) for per-definition variables.
- Using a
- The templated value should be a primitive type (string, number, bool). Avoid complex or nested types as they can easily cause issues during templating.
When designing Cirrus workflows and tasks, be mindful of where you place compute. Improper configuration can result in significant unexpected charges from data ingress/egress charges or internet accessibility issues.
The following table outlines behavior between resource and subnet types:
| Resource Type | Subnet Type | Public IP? | Internet/ECR Access | Additional Cost Concern |
|---|---|---|---|---|
| EC2 Instance | Public | Yes | OK (via IGW) | Public IPv4 Charge (~$3.60/mo); standard egress rates apply¹ |
| Fargate Task | Public | Yes | OK (via IGW) | Public IPv4 Charge (~$3.60/mo); standard egress rates apply¹ |
| Standard Lambda | N/A | AWS Managed | OK (Managed) | Standard Lambda rates; standard egress rates apply¹ |
| VPC Lambda | Public | No | FAIL (No Public IP) | N/A |
| Any VPC Resource | Private | No | OK (via NAT) | NAT Processing (~$0.045/GB in + out)¹ ², plus standard egress |
¹ Standard AWS data transfer out charges (~$0.09/GB) apply to all configurations. Inbound data transfer from the internet is free for all resource types.
² NAT processing charges are incurred in addition to standard data transfer out charges, not instead of them.
While deploying compute resources into private subnets is generally preferred for security, it introduces significant cost considerations for data-intensive processing.
NAT gateways provide internet access for resources in private subnets by routing outbound traffic through a managed gateway in a public subnet. These gateways charge ~$0.045/GB for data processed, regardless of direction. For tasks that are routinely working with large data volumes, consider placing them in a public subnet instead; this bypasses NAT entirely. If you do, just ensure the security groups you specify are sufficiently locked down on ingress to prevent exposure.
Note that lambdas cannot be both in a VPC and in a public subnet as they cannot have public IPs assigned to them.
Always attach an S3 gateway endpoint. It's free, routes traffic over the AWS private network, and bypasses NAT regardless of subnet type. Other interface endpoints can be used as needed, though they have a small cost.
Cirrus workflows are pipelines of one or more Cirrus tasks run via AWS State Machines. An AWS state machine is defined by a JSON file that uses the AWS States Language. See the AWS State Machine documentation for a deeper technical dive.
- Determine a name for your workflow.
- Use lowercase alphanumerics and hyphens.
- As a best practice, do not use
workflow,cirrus,stac, ortaskin the name unless they mean something other than "this is a workflow for Cirrus STAC tasks".
- Create a new directory at
cirrus/workflows/<your-workflow-name> - Create a
definition.yamlin your workflow directory.- See the workflow module's
workflow_configinput for a description and list of the available workflow and task arguments. Remember that you will be writing the YAML equivalent of this HCL schema. - See examples/definitions/workflows/ for example workflow definitions.
- Identify your tasks and define them in the
tasks:list. See Adding Tasks to a Workflow below for details. - If you add any template variables, update
workflow_definitions_variablesfor each environment-specific input variable file. See Template Variables in YAML Definitions and State Machine JSONs above for more information.
- See the workflow module's
- Create a
state-machine.jsonin your workflow directory.- See Configuring a Workflow State Machine below to get started.
- See examples/definitions/workflows/ for example state machine definitions.
- If you add any template variables, update
workflow_definitions_variablesfor each environment-specific input variable file.
Tasks are defined inline in the tasks: list within the workflow's definition.yaml. Task
infrastructure is specific to the containing workflow.
Each Cirrus task's infrastructure must be configured as either a lambda function or a batch job; the two types are mutually exclusive. Consider the following when determining the appropriate type for your task:
- Lambda:
- Pros: easier to configure, easier to manage, shorter startup times, and generally cheaper.
- Cons: limited run duration, limited vCPU, limited memory, limited storage (without resorting to EFS, which this module does not currently support), and more. Full set of limitations here.
- Format: requires your STAC task to be packaged as a lambda-compatible ZIP archive or Docker image.
- Suitable for: simple utility tasks or operations that are guaranteed to always complete within the lambda limits.
- Batch:
- Pros: unbounded run duration with a wide range of vCPU, GPU, memory, and storage options.
- Cons: increased configuration complexity, increased operational overhead, potentially longer task startup times, and generally more expensive (can be mitigated).
- Format: requires your STAC task to be packaged as a Docker image that implements the stac-task CLI interface as the container entrypoint.
- Suitable for: operations that are long-running or have consistently high resource requirements that lambda cannot provide.
Regardless of which type you choose, it can always be changed later.
- Determine a name for your task.
- Use lowercase alphanumerics and hyphens.
- As a best practice, do not use
lambda,cirrus,stac, ortaskin the name unless they mean something other than "this is a lambda Cirrus STAC task".
- Add a task entry to the
tasks:list of your workflow'sdefinition.yaml, specifyingtype: lambdaand thelambda:configuration block.- See the workflow module's
workflow_configinput for a description and list of the available task arguments. Remember that you will be writing the YAML equivalent of this HCL schema. - See examples/definitions/workflows/simple-lambda-example/ for an example workflow definition with a lambda task.
- If you add any template variables, update
workflow_definitions_variablesfor each environment-specific input variable file. See Template Variables in YAML Definitions and State Machine JSONs above for more information.
- See the workflow module's
Batch tasks require a compute environment to be configured first. If you already have one applicable to your target task, note its name. Otherwise, see Batch Compute Environments below to create one, then return here. Prefer reusing compute environments when possible - the default quota is 50 per region.
- Determine a name for your task.
- Use lowercase alphanumerics and hyphens.
- As a best practice, do not use
batch,cirrus,stac, ortaskin the name unless they mean something other than "this is a batch Cirrus STAC task".
- Add a task entry to the
tasks:list of your workflow'sdefinition.yaml, specifyingtype: batchand thebatch:configuration block. Reference the compute environment by its definition name viabatch.compute_name.- See the workflow module's
workflow_configinput for a description and list of the available task arguments. Remember that you will be writing the YAML equivalent of this HCL schema. - See examples/definitions/workflows/batch-workflow-example/ for an example workflow definition with a batch task.
- If you add any template variables, update
workflow_definitions_variablesfor each environment-specific input variable file. See Template Variables in YAML Definitions and State Machine JSONs above for more information.
- See the workflow module's
Batch compute environments are provisioned independently from workflows so they can be shared across multiple workflows.
- Determine a name for your compute environment.
- Use lowercase alphanumerics and hyphens.
- As a best practice, do not use
cirrus,stac, ortaskin the name unless they mean something other than "this is compute for a Cirrus STAC task".
- Create a new directory at
cirrus/compute/<your-compute-name> - Create a
definition.yamlin your compute directory.- See the compute module's
compute_configinput for a description and list of the available arguments. Remember that you will be writing the YAML equivalent of this HCL schema. - See examples/definitions/compute/ for example compute definitions.
- If you add any template variables, update
compute_definitions_variablesfor each environment-specific input variable file. See Template Variables in YAML Definitions and State Machine JSONs above for more information.
- See the compute module's
All workflows share some common structure within the state machine definition:
- A
Commentprovides a short description of the workflow - At least one task
Statedefined StartAtset to the first state in the state machine- At least one state with
End: truerepresenting a successful completion of the workflow - A
Stateof typeFailto which all workflow states will go on fatal error- Doing so necessitates that each state properly catches all errors and defines the
Failstate as the next step in the event of an error (with some exceptions, like batch tasks, which are part of a larger block of connected states - see Batch Tasks in Workflows)
- Doing so necessitates that each state properly catches all errors and defines the
State machine definitions often contain some level of boilerplate, especially for State objects.
See the following for concrete examples of lambda and batch task states:
- Lambda task state: examples/state-machine-snippets/state-machine-lambda-state-snippet.txt
- Batch task state: examples/state-machine-snippets/state-machine-batch-state-snippet.txt
Whenever you need to create a State object in your state machine JSON:
- Copy/paste an appropriate state from one of the examples above.
- Indent appropriately.
- Replace the task name with your task's name throughout.
- Choose if the state is terminal by keeping
NextorEnd; ifNext, you will need to replace the next state name with the correct following state. - Update the
RetryandCatchblocks to something suitable for your target Cirrus task. The blocks in these snippets are best-practice starting points, but you may need to extend or modify them to better suit your needs. - Modify remaining configuration as necessary.
Batch task states are more complex than lambda task states because AWS Batch's SubmitJob API
only accepts flat key/value string pairs as job parameters - it has no mechanism to pass a
structured JSON payload directly the way Lambda does. As a result, Cirrus payloads must be staged
to S3 first and passed to the batch job as an S3 URL. Each batch task must therefore be wrapped
in a Parallel block with three states:
- A pre-batch lambda that writes the Cirrus payload to S3 and passes the resulting S3 URL to the batch job.
- The batch job itself, which processes the data and writes its output payload to a different S3 location.
- A post-batch lambda that reads the output payload from S3 and passes it forward in the workflow.
The Parallel block is only used to ensure an all-or-nothing execution of the three tasks; it does
not actually run other tasks in parallel (unless you configure it to do so).
Convenience implementations of the pre-batch and post-batch lambdas are provided automatically
and referenced via the builtin namespace (see above). These cover the standard Cirrus payload
handoff pattern; if your use case requires different behavior, you can substitute your own lambda
implementations and creating them as tasks.
See the cirrus-geo batch tasks documentation for more information.
See the cirrus-geo workflow documentation.