Skip to content

Conversation

@Kami
Copy link
Member

@Kami Kami commented Aug 3, 2015

This pull request allows user to specify arbitrary action data files when creating an action using the API. Those data files are written on disk inside the "actions/" pack sub-directory.

To avoid requiring the user to also specify the pack meta-data, this action only allows users to write data files for the packs which already exist on disk.

An example use case is allowing st2flow to automatically create and write files such as work-flow definitions and files with graph node coordinates on the server.

This functionality builds on the existing API for creating actions and it's fully backward compatible.

Every time data file is written on disk we also dispatch an internal trigger. User can then use this trigger + StackStorm rule to automatically add this file to version control or similar. Eventually we will also need to figure out the whole DB and VCS sync issue (something we have talked for a long time), but I avoided this in this PR since that will be a bigger change which needs to be done systematically and needs more through.

In addition to that, I have also discovered and fixed a potential security issue inside the get_entry_point_abs_path function (eb1a454). If st2api was running as a privileged user and action entry_point attribute would point to an absolute location outside the pack directory, user could read an arbitrary file on disk using "entry_points" API endpoint.

This change now also implies that the entry_points needs to point to a file inside a pack directory (previously we didn't enforce that). I personally believe this is the right thing to do (users should include related scripts inside the pack directory), but if you don't agree, please post your arguments here. Entry point pointing to a file outside the pack directory opens us to a whole range of potential security issues.

Example API Payload

The "POST actions" payload is the same as before, only change is that it can now also contain optional data_files attribute.

{
...
    "data_files": [
        {
            "file_path": "workflows/my_wf_1.py",
            "content": "aaaa"
        },
        {
            "file_path": "random_action.py",
            "content": "bbb"
        },
        {
            "file_path": "misc/st2flow_coordinates.yaml",
            "content": "ccc"
        }
    ]
}

TODO

  • Allow user to retrieve arbitrary data file inside the pack actions directory (needed by st2flow)
  • Agree on the new "entry_point" requirement
  • Update affected tests

Kami added 12 commits August 3, 2015 14:11
…via the

API.

Files specified in this attribute are written to the disk inside the pack
directory to which the action belongs to.
…nd make

them immutable / set a default value inside action parameters.

Conflicts:
	st2api/st2api/controllers/v1/actions.py
EntryPointController which uses that function.

Make sure the entry point file is located inside the pack directory and use
get_pack_resource_file_abs_path to prevent directory traversal attacks.
…d outside

this function.

Also fix a typo and update affected tests.
@manasdk
Copy link
Contributor

manasdk commented Aug 3, 2015

Is st2api the right place to write to the filesystem? The reason I ask is we end up baking in the requirement to always have access to the filesystem and also have access to all content from st2api. Depending on our deployment strategy these might be incorrect assumptions.

questions -

  • If st2api only has access to DB and RabbitMQ then where would we need to do this file writing for this to work?
  • This is action specific. How would we solve the same problem for rules? We already have an issue with rules that are created from API and not committed to the filesystem.

@Kami
Copy link
Member Author

Kami commented Aug 3, 2015

@manasdk

Is st2api the right place to write to the filesystem? The reason I ask is we end up baking in the requirement to always have access to the filesystem and also have access to all content from st2api. Depending on our deployment strategy these might be incorrect assumptions.

That's a good question and it's something I and we don't have a good answer (or solution) for right now.

In an ideal world, for HA and reliability purposes, all the content would be available on all the servers. That's kinda what our code base assumes in some places right now (we have an existing entry point API controller which allows users to read content of the entry point file from disk using the API, etc.), but we don't have the whole "content distribution" story and most importantly VCS integration and syncing story fleshed out yet.

In short - for the long term, we need to flesh out the whole content distribution and VCS integration story, but that's something which needs to be done systematically and it's more invasive.

I'm also fine with tackling this "big picture" story right now since it's something we have been postponing for ever, but I'm not sure how much I can scope it down to unblock @enykeev ASAP.

@enykeev
Copy link
Member

enykeev commented Aug 4, 2015

$ http 172.168.60.10:9101/packs/views/files/core/

2015-08-03 22:21:14,914 139762846157072 DEBUG log [-] No version specified in URL. Will use default controller.
2015-08-03 22:21:14,915 139762846157072 INFO log [-] GET /packs/views/files/core/ with filters={} (remote_addr='172.168.60.1',method='GET',filters={},path='/packs/views/files/core/')
2015-08-03 22:21:14,915 139762846157072 ERROR log [-] API call failed: get_one() takes exactly 4 arguments (2 given)
Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (2 given) (_exception_data={},_exception_class='TypeError',_exception_message='get_one() takes exactly 4 arguments (2 given)')
2015-08-03 22:21:14,916 139762846157072 ERROR log [-] Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (2 given)
Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (2 given)
2015-08-03 22:21:14,917 139762846157072 INFO log [-] GET /packs/views/files/core/ result={
    "faultstring": "Internal Server Error"
} (remote_addr='172.168.60.1',method='GET',result='{\n    "faultstring": "Internal Server Error"\n}',status_code='500 Internal Server Error',path='/packs/views/files/core/')

We need to make sure we return meaningful error messages on API calls.

We also need a way to list all the files related to the action or otherwise how are we going to find all the files we uploaded along with the action.


$ http 172.168.60.10:9101/packs/views/files/packs/action/pack_mgmt/virtualenv_setup_prerun.py

2015-08-03 22:26:41,669 139762846157072 DEBUG log [-] No version specified in URL. Will use default controller.
2015-08-03 22:26:41,670 139762846157072 INFO log [-] GET /packs/views/files/packs/action/pack_mgmt/virtualenv_setup_prerun.py with filters={} (remote_addr='172.168.60.1',method='GET',filters={},path='/packs/views/files/packs/action/pack_mgmt/virtualenv_setup_prerun.py')
2015-08-03 22:26:41,671 139762846157072 ERROR log [-] API call failed: get_one() takes exactly 4 arguments (5 given)
Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (5 given) (_exception_data={},_exception_class='TypeError',_exception_message='get_one() takes exactly 4 arguments (5 given)')
2015-08-03 22:26:41,672 139762846157072 ERROR log [-] Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (5 given)
Traceback (most recent call last):
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 625, in __call__
    self.invoke_controller(controller, args, kwargs, state)
  File "/mnt/st2/virtualenv/local/lib/python2.7/site-packages/pecan/core.py", line 531, in invoke_controller
    result = controller(*args, **kwargs)
  File "/mnt/st2/st2common/st2common/models/api/base.py", line 174, in callfunction
    result = f(*args, **kwargs)
TypeError: get_one() takes exactly 4 arguments (5 given)
2015-08-03 22:26:41,673 139762846157072 INFO log [-] GET /packs/views/files/packs/action/pack_mgmt/virtualenv_setup_prerun.py result={
    "faultstring": "Internal Server Error"
} (remote_addr='172.168.60.1',method='GET',result='{\n    "faultstring": "Internal Server Error"\n}',status_code='500 Internal Server Error',path='/packs/views/files/packs/action/pack_mgmt/virtualenv_setup_prerun.py')

No sub-folder support.

@Kami
Copy link
Member Author

Kami commented Aug 4, 2015

@enykeev

We need to make sure we return meaningful error messages on API calls.

I agree.

I'm really annoyed by this error and have been for a long time (I made some quick attempts to fix it in the past, but it caused too many test failures and I didn't wanna spend too much time on it back then so I didn't proceed with a fix).

There are two problems which cause this:

  1. Kwargs abuse in the method signatures
  2. jsexpose decorator which still tries to call the controller method even if the number of arguments in the method signature doesn't match

Fixing both of those requires a bunch of code changes and testing it and making sure it doesn't break stuff so I would rather do it in a separate PR.

We also need a way to list all the files related to the action or otherwise how are we going to find all the files we uploaded along with the action.

I thought we might be able to avoid this since it requires bigger model changes, but it looks like we can't.

One solution would be to store a list of all the files inside a pack on the Pack model (we can't really do it on per-action basis since we only really know about the metadata and entry-point file, but entry point file can potentially depend and require other files from a pack).

For existing packs, we would populate this when we register the packs (this functionality was added recently).

And after the info is there, I would add a new /packs/views/files/ endpoint which returns a list (and possible also content) of all the files in the pack. @enykeev does this work for you?

@enykeev
Copy link
Member

enykeev commented Aug 4, 2015

Kwargs abuse in the method signatures

You can actually fix that by abusing method signature with kwargs a little bit more =)

we can't really do it on per-action basis since we only really know about the metadata and entry-point file, but entry point file can potentially depend and require other files from a pack

You are thinking too far ahead. What I need is to be able to see whether the action has coordinates file associated to it. I can probably use convention for that, but I don't see why we can't keep data_files in the action model, strip content property and return it on Action GET.

@Kami
Copy link
Member Author

Kami commented Aug 4, 2015

You can actually fix that by abusing method signature with kwargs a little bit more =)

Haha, yeah.

You are thinking too far ahead. What I need is to be able to see whether the action has coordinates file associated to it. I can probably use convention for that, but I don't see why we can't keep data_files in the action model, strip content property and return it on Action GET.

Yeah, I'm trying to create something which also works for other resources, can be re-used later on and is not a one off "hack".

So while we could keep data files on the action model, I would prefer not to do it since this is an API only thing. Pack.files field would also be populated when registering content so it's not an API only thing and better aligns with our future plans for that.

So would an API endpoint which returns a list of all the files in a pack (+ optionally content) and additional API endpoint which allows you to retrieve the content for a single file work for now (barring in mind that you use some kind of conventions for the coordinates file name and you can always just request /packs/foo/files/actions/misc/coordinates.yaml or similar)?

Kami added 6 commits August 4, 2015 12:58
Note: This list is populated when running register content script when the pack
is registered.
get_pack_resource_file_abs_path.

Also update existing function to use this new one.
…les with

content of all the files inside a pack.

Also add a new file controller which allows user to retrieve content of a single
file.
@Kami Kami changed the title [WIP] Allow user to include arbitrary data files when creating a new action using the API Allow user to include arbitrary data files when creating a new action using the API Aug 4, 2015
@Kami
Copy link
Member Author

Kami commented Aug 4, 2015

@enykeev I pushed some changes which I believe implement what you need.

We now store a list of files inside a pack on the PackDB model. In addition to that, there are now two new API endpoints:

  1. GET /packs/views/files/<pack name> -> Retrieve content of all the files inside the pack.
  2. GET /packs/views/file/<pack name>/<file path> -> Retrieve a content of a particular file inside the pack.

Second option should work for you if you follow some convention for naming file with coordinates (e.g. /packs/views/file/<pack name>/actions/misc/coordinates.yaml ow whatever).

In addition to that, I still need to do some cleanup and other work, but except renaming data_files to files nothing visible should change for you.

@dzimine
Copy link

dzimine commented Aug 5, 2015

I second @manasdk on "Is st2api the right place to write to the filesystem?".

I see this as a good enough implementation to get us going;
Is it also a good point to talk about the right content management model? Put source control question aside: let's at least figure what is a reasonable way to keep the DB and file system in sync.

Options:

  • DB and File System always in sync (DZ: IMO no-starter, just wrong to expect them never go out of sync, and offers no control points)
  • File->DB explicit (reload-content); DB->File - sync - what we get with THIS implementation.
  • Both ways explicit - "load content", "dump content".

What are operational and implementational pro/cons with this?

@lakshmi-kannan
Copy link
Contributor

I second @manasdk on "Is st2api the right place to write to the filesystem?".

I see this as a good enough implementation to get us going;

Just noting that I read them as contradicting statements. Unfortunately, this doesn't look like iterative development in this particular case. More detailed comments below.

Perhaps this is a late in the game kind of comment but we really need to spend some time understanding our long term approach here.

VCS and distributed content is becoming a high priority with some serious users already trying it out. So not thinking about how the solution would fit in that scenario will probably push us to a tight corner. It might turn out updating content from API may not be possible at all in such situation. So I am going to insist we talk about the design at least before merging this PR. In my mind that discussion is a blocker.

I'd have preferred these API changes were put in /experimental/ as opposed to V1. This is something we need to be careful about in the future. Even though there are no backward compatible changes, putting in V1 means we are willing to support problems and bug reports. I'd prefer some baking time for this critical feature.

@manasdk
Copy link
Contributor

manasdk commented Aug 5, 2015

So I am going to insist we talk about the design at least before merging this PR. In my mind that discussion is a blocker.

+1 to designing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see the problem with content_type here and there's little we can do unless we want to override pecan.expose. Still, one character difference in the public API will cost some people a few hours of debugging, sooner or later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure what you mean with that.

file one returns text/plain. Do you want me to return content type which corresponds to the file extension or something else (e.g. application/octet-stream)

It would also help if I knew how you are going to use this controller - are you going to directly include the file from the API?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that we have file and files and it's quite easy to miss or mistype. Better if we would have a single controller that would output either a json list of all files or a content of a single file depending on a presence of file_path_components. I get that it is not possible at the moment, I'm just stating that it would result in debugging problems for someone later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

That's actually what I was planning to do first, but then I decided it's clearer if we have two separate paths.

In any case, I'm also OK with having a single path - I understand your concern with content_type there though (would probably require some jsexpose / pecan hacks to get it to work).

@Kami Kami mentioned this pull request Aug 10, 2015
@lakshmi-kannan
Copy link
Contributor

Given that we have a plan for FileSystem backend now, I am ok with merging this PR. @Kami and I had a discussion and he convinced me that it is a lot of work to move the API to /exp/. Requires a lot of refactoring if we don't want duplicated code. So I am fine with the code in /v1/ apis. It's not ideal. cc: @manasdk

Kami added a commit that referenced this pull request Aug 14, 2015
…iles

Allow user to include arbitrary data files when creating a new action using the API
@Kami Kami merged commit fc8b7f1 into master Aug 14, 2015
@Kami Kami deleted the create_action_include_data_files branch August 14, 2015 12:28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't advertise it beyond using it in the UI yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's a big deal putting it in changelog (we usually write a release announcement blog post where we more prominently announce new features so as long we don't put it there we are fine), but if you think otherwise I'm also fine with removing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants