Skip to content

Commit ed03db5

Browse files
committed
Improve performance of collate & reduce memorty consumption
We used to read all files into memory - which gets too much if someone runs a massively parallel CI because 400 * 10MB is still ~4GB and that's just pure file size and we do a lot more with it. Breaks the interface of ResultMerger.merge_and_store but it's not intended as a public interface. Will leave a note in the Changelog anyhow. What is attempted to do top leve is perhaps easier to see when looking at the spike code: https://github.com/simplecov-ruby/simplecov/compare/collate-plus-plus?expand=1 Changes go further than just not reading all files in at once, during the merge process we also operate on the raw file structure as opposed to creating SimpleCov::Result. Creating SimpleCov::Result comes with a lot of overhead, notably reading in all source files. So, that's even worse doing ~400 times in a large code base. There's more optimization potential for cases like these which I'll open a ticket about but notably: * Potentially don't create SimpleCov::Result at all until we really produce results (just dump the raw coverage more or less) * allow running without a formatter as only the last one really needs the formatter
1 parent 0fe63fd commit ed03db5

File tree

11 files changed

+166
-124
lines changed

11 files changed

+166
-124
lines changed

.rubocop.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,7 @@ Metrics/MethodLength:
125125

126126
Metrics/ModuleLength:
127127
Description: Avoid modules longer than 100 lines of code.
128+
Max: 300
128129
Exclude:
129130
- "lib/simplecov.rb"
130131

lib/simplecov.rb

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -81,17 +81,13 @@ def start(profile = nil, &block)
8181
# information about coverage collation
8282
#
8383
def collate(result_filenames, profile = nil, &block)
84-
raise "There's no reports to be merged" if result_filenames.empty?
84+
raise "There are no reports to be merged" if result_filenames.empty?
8585

8686
initial_setup(profile, &block)
8787

88-
results = result_filenames.flat_map do |filename|
89-
# Re-create each included instance of SimpleCov::Result from the stored run data.
90-
Result.from_hash(JSON.parse(File.read(filename)) || {})
91-
end
92-
9388
# Use the ResultMerger to produce a single, merged result, ready to use.
94-
@result = ResultMerger.merge_and_store(*results)
89+
# TODO: Did/does collate ignore old results? It probably shouldn't, right?
90+
@result = ResultMerger.merge_and_store(*result_filenames)
9591

9692
run_exit_tasks!
9793
end

lib/simplecov/configuration.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ module SimpleCov
1010
# defined here are usable from SimpleCov directly. Please check out
1111
# SimpleCov documentation for further info.
1212
#
13-
module Configuration # rubocop:disable Metrics/ModuleLength
13+
module Configuration
1414
attr_writer :filters, :groups, :formatter, :print_error_status
1515

1616
#

lib/simplecov/result.rb

Lines changed: 1 addition & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ class Result
2626
# Initialize a new SimpleCov::Result from given Coverage.result (a Hash of filenames each containing an array of
2727
# coverage data)
2828
def initialize(original_result, command_name: nil, created_at: nil)
29-
result = adapt_result(original_result)
29+
result = original_result
3030
@original_result = result.freeze
3131
@command_name = command_name
3232
@created_at = created_at
@@ -72,10 +72,6 @@ def to_hash
7272
}
7373
end
7474

75-
def time_since_creation
76-
Time.now - created_at
77-
end
78-
7975
# Loads a SimpleCov::Result#to_hash dump
8076
def self.from_hash(hash)
8177
hash.map do |command_name, data|
@@ -85,31 +81,6 @@ def self.from_hash(hash)
8581

8682
private
8783

88-
# We changed the format of the raw result data in simplecov, as people are likely
89-
# to have "old" resultsets lying around (but not too old so that they're still
90-
# considered we can adapt them).
91-
# See https://github.com/simplecov-ruby/simplecov/pull/824#issuecomment-576049747
92-
def adapt_result(result)
93-
if pre_simplecov_0_18_result?(result)
94-
adapt_pre_simplecov_0_18_result(result)
95-
else
96-
result
97-
end
98-
end
99-
100-
# pre 0.18 coverage data pointed from file directly to an array of line coverage
101-
def pre_simplecov_0_18_result?(result)
102-
_key, data = result.first
103-
104-
data.is_a?(Array)
105-
end
106-
107-
def adapt_pre_simplecov_0_18_result(result)
108-
result.transform_values do |line_coverage_data|
109-
{"lines" => line_coverage_data}
110-
end
111-
end
112-
11384
def coverage
11485
keys = original_result.keys & filenames
11586
Hash[keys.zip(original_result.values_at(*keys))]

lib/simplecov/result_merger.rb

Lines changed: 101 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -19,81 +19,110 @@ def resultset_writelock
1919
File.join(SimpleCov.coverage_path, ".resultset.json.lock")
2020
end
2121

22-
# Loads the cached resultset from JSON and returns it as a Hash,
23-
# caching it for subsequent accesses.
24-
def resultset
25-
@resultset ||= begin
26-
data = stored_data
27-
if data
28-
begin
29-
JSON.parse(data) || {}
30-
rescue StandardError
31-
{}
32-
end
33-
else
34-
{}
35-
end
22+
def merge_and_store(*file_paths)
23+
result = merge_results(*file_paths)
24+
store_result(result) if result
25+
result
26+
end
27+
28+
def merge_results(*file_paths)
29+
# It is intentional here that files are only read in and parsed one at a time.
30+
#
31+
# In big CI setups you might deal with 100s of CI jobs and each one producing Megabytes
32+
# of data. Reading them all in easily produces Gigabytes of memory consumption which
33+
# we want to avoid.
34+
#
35+
# For similar reasons a SimpleCov::Result is only created in the end as that'd create
36+
# even more data especially when it also reads in all source files.
37+
initial_memo = valid_results(file_paths.shift)
38+
39+
command_names, coverage = file_paths.reduce(initial_memo) do |memo, file_path|
40+
merge_coverage(memo, valid_results(file_path))
3641
end
42+
43+
SimpleCov::Result.new(coverage, command_name: Array(command_names).sort.join(", "))
3744
end
3845

39-
# Returns the contents of the resultset cache as a string or if the file is missing or empty nil
40-
def stored_data
41-
synchronize_resultset do
42-
return unless File.exist?(resultset_path)
46+
def valid_results(file_path)
47+
parsed = parse_file(file_path)
48+
valid_results = parsed.select { |_command_name, data| within_merge_timeout?(data) }
49+
command_plus_coverage = valid_results.map { |command_name, data| [[command_name], adapt_result(data.fetch("coverage"))] }
50+
51+
# one file itself _might_ include multiple test runs
52+
merge_coverage(*command_plus_coverage)
53+
end
4354

44-
data = File.read(resultset_path)
45-
return if data.nil? || data.length < 2
55+
def parse_file(path)
56+
data = read_file(path)
57+
parse_json(data)
58+
end
4659

47-
data
48-
end
60+
def read_file(path)
61+
return unless File.exist?(path)
62+
63+
data = File.read(path)
64+
return if data.nil? || data.length < 2
65+
66+
data
4967
end
5068

51-
# Gets the resultset hash and re-creates all included instances
52-
# of SimpleCov::Result from that.
53-
# All results that are above the SimpleCov.merge_timeout will be
54-
# dropped. Returns an array of SimpleCov::Result items.
55-
def results
56-
results = Result.from_hash(resultset)
57-
results.select { |result| result.time_since_creation < SimpleCov.merge_timeout }
69+
def parse_json(content)
70+
return {} unless content
71+
72+
JSON.parse(content) || {}
73+
rescue StandardError
74+
warn "[SimpleCov]: Warning! Parsing JSON content of resultset file failed"
75+
{}
5876
end
5977

60-
def merge_and_store(*results)
61-
result = merge_results(*results)
62-
store_result(result) if result
63-
result
78+
def within_merge_timeout?(data)
79+
time_since_result_creation(data) < SimpleCov.merge_timeout
6480
end
6581

66-
# Merge two or more SimpleCov::Results into a new one with merged
67-
# coverage data and the command_name for the result consisting of a join
68-
# on all source result's names
69-
def merge_results(*results)
70-
parsed_results = JSON.parse(JSON.dump(results.map(&:original_result)))
71-
combined_result = SimpleCov::Combine::ResultsCombiner.combine(*parsed_results)
72-
result = SimpleCov::Result.new(combined_result)
73-
# Specify the command name
74-
result.command_name = results.map(&:command_name).sort.join(", ")
75-
result
82+
def time_since_result_creation(data)
83+
Time.now - Time.at(data.fetch("timestamp"))
84+
end
85+
86+
def merge_coverage(*results)
87+
return results.first if results.size == 1
88+
89+
results.reduce do |(memo_command, memo_coverage), (command, coverage)|
90+
# timestamp is dropped here, which is intentional
91+
merged_coverage = SimpleCov::Combine::ResultsCombiner.combine(memo_coverage, coverage)
92+
merged_command = memo_command + command
93+
94+
[merged_command, merged_coverage]
95+
end
7696
end
7797

7898
#
79-
# Gets all SimpleCov::Results from cache, merges them and produces a new
99+
# Gets all SimpleCov::Results stored in resultset, merges them and produces a new
80100
# SimpleCov::Result with merged coverage data and the command_name
81101
# for the result consisting of a join on all source result's names
82102
#
103+
# TODO: Maybe put synchronization just around the reading?
83104
def merged_result
84-
merge_results(*results)
105+
synchronize_resultset do
106+
merge_results(resultset_path)
107+
end
108+
end
109+
110+
def read_resultset
111+
synchronize_resultset do
112+
parse_file(resultset_path)
113+
end
85114
end
86115

87116
# Saves the given SimpleCov::Result in the resultset cache
88117
def store_result(result)
89118
synchronize_resultset do
90119
# Ensure we have the latest, in case it was already cached
91-
clear_resultset
92-
new_set = resultset
120+
new_resultset = read_resultset
121+
# FIXME
93122
command_name, data = result.to_hash.first
94-
new_set[command_name] = data
123+
new_resultset[command_name] = data
95124
File.open(resultset_path, "w+") do |f_|
96-
f_.puts JSON.pretty_generate(new_set)
125+
f_.puts JSON.pretty_generate(new_resultset)
97126
end
98127
end
99128
true
@@ -116,9 +145,29 @@ def synchronize_resultset
116145
end
117146
end
118147

119-
# Clear out the previously cached .resultset
120-
def clear_resultset
121-
@resultset = nil
148+
# We changed the format of the raw result data in simplecov, as people are likely
149+
# to have "old" resultsets lying around (but not too old so that they're still
150+
# considered we can adapt them).
151+
# See https://github.com/simplecov-ruby/simplecov/pull/824#issuecomment-576049747
152+
def adapt_result(result)
153+
if pre_simplecov_0_18_result?(result)
154+
adapt_pre_simplecov_0_18_result(result)
155+
else
156+
result
157+
end
158+
end
159+
160+
# pre 0.18 coverage data pointed from file directly to an array of line coverage
161+
def pre_simplecov_0_18_result?(result)
162+
_key, data = result.first
163+
164+
data.is_a?(Array)
165+
end
166+
167+
def adapt_pre_simplecov_0_18_result(result)
168+
result.transform_values do |line_coverage_data|
169+
{"lines" => line_coverage_data}
170+
end
122171
end
123172
end
124173
end
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# some comment
2+
puts "wargh"
3+
puts "wargh 1"
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# some comment
2+
puts "wargh"
3+
puts "wargh 2"

spec/fixtures/parallel_tests.rb

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# foo
2+
puts "foo"
3+
# bar
4+
puts "bar"

0 commit comments

Comments
 (0)