Adding in-memory cache & TTL capabilities #12

loganasherjones · 2017-09-04T04:00:25Z

This pull request allows no database_path to be set in the configuration. Instead, defaulting back to a basic in-memory cache. See #11 for more details. The implementation feels pretty naive and makes me think I've overlooked something, so I'd love some feedback.

I've added test for most of the parts that I've touched. I wasn't exactly sure this was the testing style you wanted. Technically the database_test.py is more of an integration test than a unit test. We can take it out if you'd like.

The pull request is not quite finished yet as I have not updated any documentation, but I wanted to get your opinion before I updated docs and continued writing tests. I'd really like to do the following:

Update docs
Refactor the connection code in database.py (mostly just create a contextmanager for the connect portions
Write more tests

d1skort · 2017-09-05T11:43:52Z

logstash_async/database.py


    # ----------------------------------------------------------------------
-    def __init__(self, path):
+    def __init__(self, path, event_ttl=None):


Hi!

Should you set self._event_ttl = event_ttl?

d1skort · 2017-09-05T11:49:10Z

logstash_async/memory_cache.py

+from datetime import datetime, timedelta
+
+
+class MemoryCache(object):


Hmm...
maybe good idea would be create abstract base class and inherit MemoryCache and DatabaseCache?
what do you think?

d1skort · 2017-09-05T11:49:30Z

Hi! I really like it and need this feature a lot ;)

Can I help you?

loganasherjones · 2017-09-05T11:54:29Z

Hey! I appreciate the feedback. Definitely missed the event_ttl. I'll fix that tonight.

I also think a base class would be a good idea, but I didn't want to modify a lot of the code without checking in with the maintainer.

loganasherjones · 2017-09-06T01:25:12Z

Okay, this has all the changes I wanted to implement. Let me know what you all think.

eht16

Woohoo, thank you a lot.
The changes are great (except for the very few remarks I had).

You even updated the docs, great.

The tests look fine, I don't mind to have some integrations as well.
Often they are even more useful than simple unittests and since
there were no tests at all before, yay :).

The context manager in database.py is sweet, I never liked the always
repeating code there but never had the idea of using a context managere here.
Thanks.

One minor issue, I'd like to have the method seperators also in the new code
(the # -----... lines above method definitions.
I realize these might look a bit weird to you but I've got too used to them and
would like the code consistent across the modules.
But I can add them on my own after merging as well.

eht16 · 2017-10-08T13:37:05Z

docs/persistence.rst

+-----------
+
+By default, you do not need to provide a :code:`database_path` to the :code:`AsynchronousLogstashHandler`.
+There are a couple of things you should keep in mind should you choose to go down this path.


I'm not a native speaker but I think replacing "should" by "if" would make it more readable.

eht16 · 2017-10-08T13:38:32Z

docs/usage.rst

+
+  # If you don't want to write to a SQLite database, then you do
+  # not have to specify a database_path.
+  # NOTE: Messages are lost between process restarts.


Maybe add a few more words to the NOTE like "without a SQLite database", just to make clear this NOTE refers to the case with the in-memory database.

eht16 · 2017-10-08T13:42:40Z

logstash_async/memory_cache.py

+        events = []
+        for event in self._cache.values():
+            if not event['pending_delete']:
+                events.append(event)


Correct me if I'm wrong: I think here is the pending_delete marker missing, like event['pending_delete'] = True, otherwise events will never be marked as pending_delete.

While this marker is less relevant for the in-memory cache, I think it would make sense to maintain it here anyway since also the rest of the in-memory cache maintains it as well and it should not be that relevant on performance.

eht16 · 2017-10-08T13:48:02Z

logstash_async/memory_cache.py

+
+    def requeue_queued_events(self, events):
+        for event in events:
+            self._cache[event['id']]['pending_delete'] = False


While not very likely, maybe a try-except would be useful here if self._cache does not contain event['id']. Just to prevent breaking the whole process and instead log a warning or so.

Hello. I'm happy to implement these changes, but I have a question about "logging the warning". I'm not exactly sure how we should be logging a warning. We could print to STDOUT if you'd like, but using a logger would probably just end up in another enqueued message, which could cause this problem again.

What are your thoughts?

For other errors, the logging framework is used as well even if the logged message could not be sent directly (e.g. on network errors). But using the logging framework enables other handlers as well to be fired (e.g. a mail handler, a stderr handler or anything else). So even if we are unable to send our events, other handlers might inform the user about the problem.

We have LogProcessingWorker._safe_log to more or less safely log a message: use the logging framework except we are in the process of a shutdown, then just print to stderr.
We could move this method into utils.py to make it a more easy accesible function.

In general, I don't like basic stderr (or stdout) logging very much. I often work on projects where there is no stderr/stdout because the application is a daemon. IMO it should be just a last resort if everything else failed.

eht16 · 2017-10-08T13:51:53Z

logstash_async/memory_cache.py

+
+    def _delete_events(self, ids_to_delete):
+        for event_id in ids_to_delete:
+            self._cache.pop(event_id)


Similar as for requeue_queued_events maybe catch KeyError of the key is not in the cache or provide a default to .pop().

eht16 · 2017-10-08T13:56:20Z

logstash_async/cache.py

+
+    @abc.abstractmethod
+    def add_event(self, event):
+        """Add the event to the cache


What do you think about adding a comment here stating that this is the method which is called from other threads while all other cache/database methods are called from the log processing worker thread.

But I can do as well later on.

loganasherjones · 2017-10-09T17:22:42Z

Okay, I just fixed all the documentation issues you pointed out. Updated the tests. I'm still unsure about what to do if they request an event to be removed that is not in the cache. It will no longer throw a KeyError though. Let me know what you think!

eht16

I'm happy with your changes now.

Regarding the logging of the potentially missing event: as said above, I would use the logging framework anyway.

If you like, I can make the sugessted changes myself after the merge. If you like to do, feel free.

Otherwise, it would be cool if you'd like to squash some of the commits so the history keeps clean.
But it's not a big deal.

Otherwise, I'm fine to merge.

loganasherjones · 2017-10-10T00:57:47Z

okay, I've added logging messages to the error conditions we talked about. I want to make sure you like the way I've done this before I squash all the commits.

In addition, I believe squashing these commits means re-writing the history of my fork. Just want to confirm you're okay with that before I go ahead. I don't care at all.

eht16 · 2017-10-12T22:02:02Z

Woohoo, nice.

About the squashing: yes, I'm aware of re-writing the history. Only if you want, if not, it's also alright and I would merge the PR.

loganasherjones · 2017-10-14T03:46:20Z

Okay! All commits have been squashed into a single commit. Let me know if you need anything else!

eht16 · 2017-10-14T15:07:23Z

Just merged your changes. Thanks a lot!

d1skort reviewed Sep 5, 2017

View reviewed changes

eht16 reviewed Oct 8, 2017

View reviewed changes

eht16 approved these changes Oct 9, 2017

View reviewed changes

Added in-memory cache/TTL capabilities (Issue#11)

a16eb84

loganasherjones force-pushed the master branch from 98d7700 to a16eb84 Compare October 14, 2017 03:45

eht16 merged commit f54671d into eht16:master Oct 14, 2017

eht16 mentioned this pull request Oct 21, 2017

Memory cache with timeout #11

Closed

		from datetime import datetime, timedelta


		class MemoryCache(object):

Adding in-memory cache & TTL capabilities #12

Adding in-memory cache & TTL capabilities #12

Uh oh!

Conversation

loganasherjones commented Sep 4, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d1skort Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d1skort commented Sep 5, 2017

Uh oh!

loganasherjones commented Sep 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loganasherjones commented Sep 6, 2017

Uh oh!

eht16 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loganasherjones commented Oct 9, 2017

Uh oh!

eht16 left a comment

Choose a reason for hiding this comment

Uh oh!

loganasherjones commented Oct 10, 2017

Uh oh!

eht16 commented Oct 12, 2017

Uh oh!

loganasherjones commented Oct 14, 2017

Uh oh!

eht16 commented Oct 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

d1skort Sep 5, 2017 •

edited

Loading

loganasherjones commented Sep 5, 2017 •

edited

Loading