Skip to content

Conversation

@loganasherjones
Copy link
Contributor

This pull request allows no database_path to be set in the configuration. Instead, defaulting back to a basic in-memory cache. See #11 for more details. The implementation feels pretty naive and makes me think I've overlooked something, so I'd love some feedback.

I've added test for most of the parts that I've touched. I wasn't exactly sure this was the testing style you wanted. Technically the database_test.py is more of an integration test than a unit test. We can take it out if you'd like.

The pull request is not quite finished yet as I have not updated any documentation, but I wanted to get your opinion before I updated docs and continued writing tests. I'd really like to do the following:

  • Update docs
  • Refactor the connection code in database.py (mostly just create a contextmanager for the connect portions
  • Write more tests


# ----------------------------------------------------------------------
def __init__(self, path):
def __init__(self, path, event_ttl=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi!

Should you set self._event_ttl = event_ttl?

from datetime import datetime, timedelta


class MemoryCache(object):
Copy link
Contributor

@d1skort d1skort Sep 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...
maybe good idea would be create abstract base class and inherit MemoryCache and DatabaseCache?
what do you think?

@d1skort
Copy link
Contributor

d1skort commented Sep 5, 2017

Hi! I really like it and need this feature a lot ;)

Can I help you?

@loganasherjones
Copy link
Contributor Author

loganasherjones commented Sep 5, 2017

Hey! I appreciate the feedback. Definitely missed the event_ttl. I'll fix that tonight.

I also think a base class would be a good idea, but I didn't want to modify a lot of the code without checking in with the maintainer.

@loganasherjones
Copy link
Contributor Author

Okay, this has all the changes I wanted to implement. Let me know what you all think.

Copy link
Owner

@eht16 eht16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo, thank you a lot.
The changes are great (except for the very few remarks I had).

You even updated the docs, great.

The tests look fine, I don't mind to have some integrations as well.
Often they are even more useful than simple unittests and since
there were no tests at all before, yay :).

The context manager in database.py is sweet, I never liked the always
repeating code there but never had the idea of using a context managere here.
Thanks.

One minor issue, I'd like to have the method seperators also in the new code
(the # -----... lines above method definitions.
I realize these might look a bit weird to you but I've got too used to them and
would like the code consistent across the modules.
But I can add them on my own after merging as well.

-----------

By default, you do not need to provide a :code:`database_path` to the :code:`AsynchronousLogstashHandler`.
There are a couple of things you should keep in mind should you choose to go down this path.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a native speaker but I think replacing "should" by "if" would make it more readable.

docs/usage.rst Outdated
# If you don't want to write to a SQLite database, then you do
# not have to specify a database_path.
# NOTE: Messages are lost between process restarts.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a few more words to the NOTE like "without a SQLite database", just to make clear this NOTE refers to the case with the in-memory database.

events = []
for event in self._cache.values():
if not event['pending_delete']:
events.append(event)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong: I think here is the pending_delete marker missing, like event['pending_delete'] = True, otherwise events will never be marked as pending_delete.

While this marker is less relevant for the in-memory cache, I think it would make sense to maintain it here anyway since also the rest of the in-memory cache maintains it as well and it should not be that relevant on performance.


def requeue_queued_events(self, events):
for event in events:
self._cache[event['id']]['pending_delete'] = False
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While not very likely, maybe a try-except would be useful here if self._cache does not contain event['id']. Just to prevent breaking the whole process and instead log a warning or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello. I'm happy to implement these changes, but I have a question about "logging the warning". I'm not exactly sure how we should be logging a warning. We could print to STDOUT if you'd like, but using a logger would probably just end up in another enqueued message, which could cause this problem again.

What are your thoughts?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other errors, the logging framework is used as well even if the logged message could not be sent directly (e.g. on network errors). But using the logging framework enables other handlers as well to be fired (e.g. a mail handler, a stderr handler or anything else). So even if we are unable to send our events, other handlers might inform the user about the problem.

We have LogProcessingWorker._safe_log to more or less safely log a message: use the logging framework except we are in the process of a shutdown, then just print to stderr.
We could move this method into utils.py to make it a more easy accesible function.

In general, I don't like basic stderr (or stdout) logging very much. I often work on projects where there is no stderr/stdout because the application is a daemon. IMO it should be just a last resort if everything else failed.


def _delete_events(self, ids_to_delete):
for event_id in ids_to_delete:
self._cache.pop(event_id)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar as for requeue_queued_events maybe catch KeyError of the key is not in the cache or provide a default to .pop().


@abc.abstractmethod
def add_event(self, event):
"""Add the event to the cache
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a comment here stating that this is the method which is called from other threads while all other cache/database methods are called from the log processing worker thread.

But I can do as well later on.

@loganasherjones
Copy link
Contributor Author

Okay, I just fixed all the documentation issues you pointed out. Updated the tests. I'm still unsure about what to do if they request an event to be removed that is not in the cache. It will no longer throw a KeyError though. Let me know what you think!

Copy link
Owner

@eht16 eht16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with your changes now.

Regarding the logging of the potentially missing event: as said above, I would use the logging framework anyway.

If you like, I can make the sugessted changes myself after the merge. If you like to do, feel free.

Otherwise, it would be cool if you'd like to squash some of the commits so the history keeps clean.
But it's not a big deal.

Otherwise, I'm fine to merge.

@loganasherjones
Copy link
Contributor Author

okay, I've added logging messages to the error conditions we talked about. I want to make sure you like the way I've done this before I squash all the commits.

In addition, I believe squashing these commits means re-writing the history of my fork. Just want to confirm you're okay with that before I go ahead. I don't care at all.

@eht16
Copy link
Owner

eht16 commented Oct 12, 2017

Woohoo, nice.

About the squashing: yes, I'm aware of re-writing the history. Only if you want, if not, it's also alright and I would merge the PR.

@loganasherjones
Copy link
Contributor Author

Okay! All commits have been squashed into a single commit. Let me know if you need anything else!

@eht16 eht16 merged commit f54671d into eht16:master Oct 14, 2017
@eht16
Copy link
Owner

eht16 commented Oct 14, 2017

Just merged your changes. Thanks a lot!

@eht16 eht16 mentioned this pull request Oct 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants