-
Notifications
You must be signed in to change notification settings - Fork 2
Write to S3 #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write to S3 #99
Conversation
This change replaces the usage of the snowflake db for event data by writing it to an s3 bucket. The files have a random uuid1 for the filename as well as being organized into directories formatted by UTC date
|
Heyo @RobertKeyser ! Wanted to get this in front of you a little early (just doing some further validation) so you could flag anything that looks a little off as far as the deployment goes. Let me know if you need any info or have any questions - thanks! |
|
@seanpreston I'll need some help in here writing some secrets to the repo, I think you might have the power 🪄 @RobertKeyser I think the S3 env vars are what I am most concerned about as far as naming etc. for the deployment to work 🤞🏽 |
|
@SteveDMurphy I understand the changes here are to write out to S3 instead of Snowflake directly, but where is the updated code that describes when/how Snowflake is pulling this data in? |
Planning on doing that as part of ethyca-analytics - was also briefly thinking about using Snowpipe until remembering about the analytics setup. Initial focus was on stopping the bleeding of a constantly running Snowflake warehouse tho |
good point, the event itself will contain the metadata we need to sort by time so it won't matter if it sits around for a bit. |
That's the plan ! I'm also planning on having the events in a subdirectory by utc date so we can load once per day and delete the directory 🤞🏽 it sounds good in my head at least 😅 |
Makes sense to me! Although S3 is cheap enough that I don't think we'd need to worry about that for a long time :) |
|
@SteveDMurphy im running the fides release in a bit but will circle back and fully review this later today :) |
|
@ThomasLaPiana I'm planning on getting back into this with your first pass review comments today, thanks! 💥 @RobertKeyser I think we might still need the prod s3 info for deployment but would love any help reviewing the deployment changes as well 🙌🏽 Ops ticket for reference -> https://ethyca.atlassian.net/browse/OPS-208 |
ThomasLaPiana
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
RobertKeyser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we’ll need to change a few things:
remove the FIDESLOG__STORAGE_AWS_SECRET_ACCESS_KEY and FIDESLOG__STORAGE_AWS_ACCESS_KEY_ID from being set.
add a taskrolearn that’s the same as the executionrolearn
The credentials should get automatically set: https://aws.amazon.com/premiumsupport/knowledge-center/ecs-fargate-access-aws-services/
More specifically, https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html
I think what that means is that you need to make the request (the curl in the docs) to get the AWS creds on the fly since it uses a role instead of a full user.
Closes #10
boto3boto3sessionTo test:
1.modify the endpoint of the SDK to use localhost instead
You should then be able to see the events you are creating listed in the s3 bucket