Conversation
zverok
left a comment
There was a problem hiding this comment.
Adding to what said in comments, changes without tests are unverifiable :)
daru-io.gemspec
Outdated
|
|
||
| spec.add_development_dependency 'guard-rspec' if RUBY_VERSION >= '2.2.5' | ||
|
|
||
| spec.add_runtime_dependency 'request-log-analyzer', '~> 1.13.4' |
There was a problem hiding this comment.
Note that no other file format gem is required unconditionally: daru-io relies on user installing the proper gems. Otherwise, anybody who wants daru-io just for CSV import, will install TON of other gems for pdf, excel and whatnot. For example, look here: https://github.com/SciRuby/daru-io/blob/master/lib/daru/io/importers/excelx.rb#L12 — we have optional_gem construct for requiring something ONLY when it is used.
lib/daru/io/importers/rails.rb
Outdated
| module IO | ||
| module Importers | ||
| class Rails < Base | ||
| Daru::DataFrame.register_io_module :read_rails, self |
There was a problem hiding this comment.
read_rails_log is better name, read_rails is too vague.
lib/daru/io/importers/rails.rb
Outdated
| def initialize | ||
| require 'request_log_analyzer' | ||
| RequestLogAnalyzer::Source::LogParser.class_eval do | ||
| def initialize(format, options = {}) |
There was a problem hiding this comment.
Better extract it all into importers/rails_logs/request_log_analyzer_patch.rb, create there module RailsLogAnalyzerPatch, and here just include that into RailsLogAnalyzer.
There was a problem hiding this comment.
The current initialize,parse and read methods are going to be the same for all the importers that will be added. Shall I instead create module RequestLogAnalyzerPatch in importers/request_log_analyzer.rb? Then I will pass :rails3, :apache and :amazon_s3 to the module in rails.rb, apache.rb and amazon_s3.rb.
lib/daru/io/importers/rails.rb
Outdated
| end | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
In fact, here you are copy-pasting and slightly changing code from other gem. It is totally unreliable: if the gem changes a slightest bit, you'll never follow this with your code. Why do you need this rewriting?
There was a problem hiding this comment.
I will try to come up with a single method instead of changing the code.
|
Please review the changes so that I can start the documentation and fix rubocop tests. |
zverok
left a comment
There was a problem hiding this comment.
Also, please don't ignore Rubocop's suggestions.
lib/daru/io/importers/base.rb
Outdated
| new.read(path) | ||
| end | ||
|
|
||
| def self.parse_log(file,format) |
There was a problem hiding this comment.
Why do you think it is appropriate to add method parse_log, related to a very specific task, to Base importer?.. All code related to some importer should be totally local to this importer: either in importers/rails_log.rb, or in importers/rails_log/request_log_analyzer_patch.rb (the latter is preferable, as well as extracting patch to a module, instead of doing class_eval)
There was a problem hiding this comment.
parse_log is a helper method as apache and amazon_s3 importers will also use it. Making this part local will later require adding the same code for the remaining two importers. I feel the method is not that specific to be ineligible for the base class.
There was a problem hiding this comment.
I feel the method is not that specific to be ineligible for the base class.
In fact, it is. The base class provides the common base all importers need, not "some 3 of 20 importers may need that method, so we'll just drop it here". It requires specific external gem. If several different importers will need it, the module with this "patch" could be stored in, like importers/shared/request_log_analyzer_patch.rb, and required/included where it is necessary.
lib/daru/io/importers/rails_log.rb
Outdated
| def call() | ||
| ind = [:method, :path, :ip, :timestamp, :line_type, :lineno, :source, | ||
| :controller, :action, :format, :params, :rendered_file, | ||
| :partial_duration, :status, :duration, :view, :db] |
There was a problem hiding this comment.
That definitely should be a constant.
lib/daru/io/importers/rails_log.rb
Outdated
| row = [] | ||
| ind.each do |attr| | ||
| row << hash[attr] | ||
| end |
There was a problem hiding this comment.
That should be map, at least.
But if I am reading it correctly, Hash#values_at should be totally enough.
lib/daru/io/importers/rails_log.rb
Outdated
| # initializes the instance variables @path and @file_data to nil | ||
| def initialize | ||
| @path = nil | ||
| @file_data = nil |
There was a problem hiding this comment.
Why do you need this vars here?
lib/daru/io/importers/rails_log.rb
Outdated
| include RequestLogAnalyzerPatch | ||
| Daru::DataFrame.register_io_module :read_rails_log, self | ||
|
|
||
| # initializes the instance variables @path and @file_data to nil |
There was a problem hiding this comment.
Comments should not just retell the code, code should speak for itself.
For such simple methods, comments aren't necessary at all.
lib/daru/io/importers/rails_log.rb
Outdated
| # @example Reading from plaintext file | ||
| # instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
| def read(path) | ||
| @path = path |
lib/daru/io/importers/rails_log.rb
Outdated
| # instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
| def read(path) | ||
| @path = path | ||
| @file_data = RequestLogAnalyzerPatch.parse_log(@path,:rails3) |
There was a problem hiding this comment.
Why :rails3, couldn't it be another log format?..
There was a problem hiding this comment.
Having the log format as another argument (with maybe :rails3 as default) would be a more generic way to do this.
| @@ -0,0 +1,73 @@ | |||
| module RequestLogAnalyzerPatch | |||
| require 'request_log_analyzer' | |||
There was a problem hiding this comment.
This file now is required always when daru-io is required. It should be made optional_gem in importer's initialize, like other importers do.
| @@ -0,0 +1,73 @@ | |||
| module RequestLogAnalyzerPatch | |||
There was a problem hiding this comment.
When I proposed this module name, I've meant the following: You define ONLY parse_hash in that module, and then, in importer (not here in separate method) you do the following:
parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format))
parser.extend(RequestLogAnalyzerPatch) # only this instance has method replaced
parser.parse_hash(path)This way it is easy to see what you "inserting" into the third-party gem, and keep it local.
| # @example Reading from rails log file | ||
| # format = RequestLogAnalyzer::FileFormat.load(:rails3) | ||
| # instance = RequestLogAnalyzer::Source::LogParser.new(format) | ||
| # list = instance.parse_hash('test_rails.log') |
There was a problem hiding this comment.
This is internal to implementation, so does NOT need any comment about "how it can be used".
Instead, the comment that is absolutely necessary here is: explain that this method only copy-pasted (from where exactly; from which version exactly) and this and that lines changed (with "how it was, why it was changed" in comments). This way, it would be possible to follow request_log_analyzer's development, if it would change the method significantly.
|
|
||
| private | ||
|
|
||
| def attempt_sqlite3_connection(db) |
There was a problem hiding this comment.
db is totally good name. Please add this to Naming/UncommunicativeMethodParamName: cop's AllowedNames: settings.
In general, if Rubocop is starting to fail on code unrelated to your PR, it is better to notify maintainer (@athityakumar) than judge yourself.
| order: %i[method path ip timestamp line_type lineno source | ||
| controller action format params rendered_file | ||
| partial_duration status duration view db], | ||
| :'timestamp.to_a' => [20180312174118], # rubocop:disable Style/NumericLiterals |
There was a problem hiding this comment.
Don't disable here, just do what Rubocop suggests.
lib/daru/io/importers/rails_log.rb
Outdated
| @file_data.each do |hash| | ||
| data.add_row(INDEX.map { |attr| hash[attr] }) | ||
| end | ||
| data |
There was a problem hiding this comment.
I think df is more readable than data, and has either been consistently named so in other Importers or rather just returned a Daru::DataFrame directly.
lib/daru/io/importers/rails_log.rb
Outdated
| # # 1 GET / 127.0.0.1 2018022716 completed 12 /home/roh Rails... | ||
| # # ... ... ... ... ... ... ... ... ... | ||
| def call | ||
| data = Daru::DataFrame.new({},index: INDEX).transpose |
There was a problem hiding this comment.
Um, INDEX is actually supposed to be the order of the DataFrame and has been weirdly named so just due to transpose operation? 🤔
lib/daru/io/importers/rails_log.rb
Outdated
| data = Daru::DataFrame.new({},index: INDEX).transpose | ||
| @file_data.each do |hash| | ||
| data.add_row(INDEX.map { |attr| hash[attr] }) | ||
| end |
There was a problem hiding this comment.
Can't the whole transpose + add_row logic be replaced by Daru::DataFrame.rows(array_of_arrays, order: ORDER)?
There was a problem hiding this comment.
It can be, but then the array_of_arrays will need further processing as all vectors don't have the same length due to missing attributes in some of them.
|
@zverok @athityakumar please review and suggest changes for rubocop failure. |
|
How about adding importers of all three formats in a single file? It would eliminate the additional separate module for the patch. The log.rb file will look like this module Importers
class Log < Base
Daru::Dataframe.register_io_module :read_log, self
ORDER_RAILS = %i[...].freeze
ORDER_APACHE = %i[...].freeze
ORDER_S3 = %i[...].freeze
...
def read(path, format: :rails3) # default format is :rails3
case format
when :rails3 then order = ORDER_RAILS
when :apache then order = ORDER_APACHE
when :amazon_s3 then order = ORDER_S3
else raise ArgumentError, 'Invalid format provided.'
...
@file_data = parser.parse_hash(path, order)
self
end
...
private
module RequestLogAnalyzerPatch
...
def parse_hash(file, order)
...
parsed_list[i] = order.map ...
...
end
end
end
endIn the above method, I will have need to modify the line containing |
That may work. Just the code can be simpler: ORDERS = {
rails: %i[...].freeze,
apache: %i[...].freeze,
s3: %i[...].freeze
}
# and then...
order = ORDERS.fetch(format) |
lib/daru/io/importers/rails_log.rb
Outdated
| Daru::DataFrame.register_io_module :read_rails_log, self | ||
|
|
||
| # Checks for required gem dependencies of RailsLog importer | ||
| # and requires the patch for request-log-analyzer gem |
There was a problem hiding this comment.
Just remove the comment. Please don't repeat what method code says clearly (even if other importers do that :)).
lib/daru/io/importers/rails_log.rb
Outdated
|
|
||
| # Reads data from a rails log file | ||
| # | ||
| # @!method self.read(path) |
There was a problem hiding this comment.
Please document format parameter too.
lib/daru/io/importers/rails_log.rb
Outdated
| # instance = Daru::IO::Importers::RailsLog.read("rails_test.log") | ||
| def read(path, format: :rails3) | ||
| parser = RequestLogAnalyzer::Source::LogParser.new(RequestLogAnalyzer::FileFormat.load(format)) | ||
| parser.extend(RequestLogAnalyzerPatch) |
There was a problem hiding this comment.
This means, that RequestLogAnalyzerPatch should look like this:
module RequestLogAnalyzerPatch
# look, no other nested modules!
def parse_hash(file)
end
endThis way, when you do this extend, parse_hash becomes redefined.
| # parse_io and parse_line of the LogParser class. Assigns all the necessary instance | ||
| # variables defined in the above specified methods. Creates a request for each line of | ||
| # the file stream and stores the hash of parsed information in raw_list. Each element of | ||
| # parsed_list is an array of one parsed entry in log file. |
There was a problem hiding this comment.
Ugh, now, I see a problem here (I thought that LogParser already has method parse_hash, you just fixing something in it):
combine the methods parse_file, parse_io and parse_line of the LogParser class
Why???? What's the point of combining three methods in one 40-lines mess?.. Why can't you use original methods as they were, just calling them?
There was a problem hiding this comment.
parse_file returns nil and so does parse_io. That's why I have tried fixing these methods in the earlier commits instead of making a new single method. But then it was a lot of rewriting. It was a hard time finding points to tap out the parsed hash from the gem as it is only meant for CLI.
There was a problem hiding this comment.
Ugh. You should have spend a bit more time on understanding LogParser logic. From reading the code, I believe what you need is (without ANY patching) just call LogParser#each_request:
Reads the input, which can either be a file, sequence of files or STDIN to parse lines specified in the FileFormat. This lines will be combined into Request instances, that will be yielded.
...and, as it is aliased as each, and Enumerable is included, ALL you really need is:
RequestLogAnalyzer::Source::LogParser
.new(RequestLogAnalyzer::FileFormat.load(format))
.map { converting Request into line for dataframe }...and that would be completely it.
|
@zverok please review |
lib/daru/io/importers/log.rb
Outdated
| # # ... ... ... ... ... ... ... ... ... | ||
| def call | ||
| Daru::DataFrame.rows(@file_data, order: ORDERS.fetch(@format) | ||
| .map { |attr| attr == :path ? :resource_path : attr }) |
There was a problem hiding this comment.
I'd say RENAME_FIELDS.fetch(attr, attr) with
RENAME_FIELDS = {
path: :resource_path
}It would be more regular (even if for now used only for one field).
|
(Just turn off the |
|
Merged! Congrats 🎉 |
This PR integrates the gem
request-log-analyzerto convert rails logs toDaru::DataFrame.