Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
6d7199f
changes
Feb 9, 2019
3048d13
changes
Feb 9, 2019
7bd5eac
changes
Feb 9, 2019
6f62c6c
changes
Feb 9, 2019
109ee9d
changes
Feb 9, 2019
8d585fd
changes
Feb 9, 2019
f6b1e02
changes
Feb 12, 2019
c19eb1d
changes
Feb 12, 2019
fd3cfb5
changes
Feb 12, 2019
8e16e32
changes
Feb 12, 2019
9453d70
changes
Feb 15, 2019
4514780
chg
Feb 15, 2019
fc8a1ed
changes for kubelet health
Feb 16, 2019
0e28a4e
changes
Feb 16, 2019
627de84
changes to include message
Feb 20, 2019
4efb7ac
Merge ci_feature into node-health-perf
r-dilip Feb 26, 2019
14ab446
First iteration of health monitor signals
r-dilip Mar 8, 2019
b8dcc52
Fixed Bugs for NotifyInstantly Monitor
r-dilip Mar 9, 2019
1a2d8ce
Health and Input plugins, logs cleaned up
r-dilip Mar 9, 2019
e0c431e
Hooking up input and filter plugins to out_oms_api plugin
r-dilip Mar 11, 2019
654c7c9
1. Tag Changes 2. Adding Health Monitor Configuration 3. Added Agent …
r-dilip Mar 12, 2019
c5e739f
Merge branch 'ci_feature' into dilipr/node-health-perf
r-dilip Mar 12, 2019
b60ee71
Fix Base Container.data, include kube-system containers, fix input pl…
r-dilip Mar 12, 2019
99b7fe1
More fixes to config, process kube-system
r-dilip Mar 12, 2019
fd5bbf6
Adding in_kube_health
r-dilip Mar 12, 2019
8b9fd53
Merging after pulling
r-dilip Mar 12, 2019
e667406
Send Node_name parameter to reduceSignal for node level monitors
r-dilip Mar 12, 2019
31a3931
Fix Typo in method invocation
r-dilip Mar 14, 2019
c380a5e
1. Added pod_status monitor (unused), 2. Removed processing for conta…
r-dilip Mar 20, 2019
c61a3e6
Merge branch 'dilipr/kubeHealth' of https://github.com/Microsoft/Dock…
r-dilip Mar 20, 2019
b68572f
Fix issue when pods are created since last kube api
r-dilip Mar 29, 2019
b80366e
Remove duplicate plugin entry from container.conf
r-dilip Apr 11, 2019
bac932d
Merging after fixing conflicts from ci_feature
r-dilip Apr 11, 2019
c6d0fee
Updating Agent Version in fluent-bit config
r-dilip Apr 11, 2019
42957df
Updating Agent Version
r-dilip Apr 12, 2019
8ab3ee8
Fix Error when Pods dont have a controller
r-dilip Apr 17, 2019
a8837f8
Add Telemetry for plugin start
r-dilip Apr 26, 2019
f6e9c0e
Merge from ci_feature
r-dilip Apr 26, 2019
147d688
Change getMonitorInstanceId method signature
r-dilip Apr 30, 2019
80a5d36
Remove references to HealthMonitorRecord struct in code
r-dilip May 1, 2019
d7a71d5
Rake
May 7, 2019
a18eb83
Running Ruby tests
r-dilip May 7, 2019
c7a0a50
Merge branch 'dilipr/rubyTest' into dilipr/healthModelAggregation
r-dilip May 7, 2019
5895951
Working Version for Health Model Builder on the agent
r-dilip May 22, 2019
8892458
Calculate old and new states for Aggregate and Unit Monitors
r-dilip May 22, 2019
81df39f
Remove Controller Name from labels and details, use Deployment/Daemon…
r-dilip May 23, 2019
deda155
Change label namespaces, remove ClusterName from records sent, send d…
r-dilip May 29, 2019
a57b0b4
Configuration Split for Monitors
r-dilip May 29, 2019
9dbc7a8
working version for 2 pods before naming changes
r-dilip May 30, 2019
0f9f5d4
Working Model Builder version after name changes, TODO: test on the a…
r-dilip May 30, 2019
2f7be02
E2E working version for health model aggregation TODO: Missing Signal…
r-dilip May 30, 2019
0f210f5
Change pod-aggregator to workload-name, remove node monitor hierarchy…
r-dilip Jun 5, 2019
b89b107
Refactor signal reduction logic
r-dilip Jun 13, 2019
adb8f94
Missing Pod signals/Node Signals send none or unknown based on the in…
r-dilip Jun 14, 2019
876bb3c
serialization and deserialization of state
r-dilip Jun 14, 2019
7c459c4
Working cadvisor_health_node filter
r-dilip Jun 17, 2019
497c26a
working version E2E with state serialization and deserialization
r-dilip Jun 18, 2019
f3520fe
adding source, health config to base_container.data
r-dilip Jun 18, 2019
bc57eb2
Container conf changes, permissions for log files etc.
r-dilip Jun 18, 2019
ded1867
Merge branch 'ci_feature' into dilipr/refactorSignalReduction
r-dilip Jun 18, 2019
88621c7
Reinstate run_interval that was removed accidentally
r-dilip Jun 18, 2019
966a0b1
Remove single sample flip configs, fixed details.to_json bug, pass in…
r-dilip Jun 19, 2019
5edf616
Remove unnecessary logging
r-dilip Jun 19, 2019
09063ba
Fix Aggregation logic for 'percentage' agg algorithm monitors
r-dilip Jun 20, 2019
aba0d17
Scale up Scale down bugs fixed, sending none signal on first occurenc…
r-dilip Jun 21, 2019
24b0479
Enable state initialization, fix bug where records are always sent th…
r-dilip Jun 21, 2019
d3d267a
Fix percentage agg algorithm state calculation
r-dilip Jun 22, 2019
d0f4a7b
Fix the bug where if signal is unknown state, its state is not update…
r-dilip Jun 22, 2019
990f70c
fix compute percentage bug when value is in warning state
r-dilip Jun 22, 2019
275fcf3
Update state_transition_time to current time whenever state change ha…
r-dilip Jun 22, 2019
2901e99
Update missing signal state to be the instance state for correct rollup
r-dilip Jun 24, 2019
bd7cf0a
1. Remove some unnecessary logging
r-dilip Jun 25, 2019
23fa7a2
Removing calls to kube api since they are not required as of now. Wil…
r-dilip Jun 25, 2019
1697f40
Send telemetry for cluster level state changes
r-dilip Jun 27, 2019
ec65d49
Testing Rake
r-dilip Jul 8, 2019
d0a62d3
First Round of Tests
r-dilip Jul 17, 2019
2e50407
added integration tests for aks and aks-engine
r-dilip Jul 17, 2019
8af3554
committing missing renamed file
r-dilip Jul 17, 2019
02bce13
Fix base_Container.data
r-dilip Jul 17, 2019
0d4ae84
Added test_helpers.rb
r-dilip Jul 17, 2019
60384df
Fix ruby 1.9 issue where __dir__is not recognized
r-dilip Jul 17, 2019
ce8c748
moving some methods into health_monitor_helpers, so that unit tests c…
r-dilip Jul 17, 2019
c70cfe7
Changed references to health_monitor_helpers
r-dilip Jul 17, 2019
603ab25
Fixing ruby incompatibility errors
r-dilip Jul 17, 2019
338b752
Dont load health_monitor_utils
r-dilip Jul 17, 2019
dd8dfef
Dumm commit to force pull
r-dilip Jul 17, 2019
c161fc1
remove non existent file from base_container.data, update Makefile
r-dilip Jul 17, 2019
142a5a5
Updated tomlparser.rb to handle agent_settings for health_model
r-dilip Jul 30, 2019
d415f07
Fixing merge conflicts from ci_feature
r-dilip Jul 30, 2019
d9f2e4e
Toggle health plugins based on Feature flag
r-dilip Jul 30, 2019
7b09fcf
Added health_monitor_helpers, and fixed log
r-dilip Jul 30, 2019
fa5e31d
Send start telemetry only if health model is enabled
r-dilip Aug 1, 2019
041e20f
After merge and stash pop
r-dilip Aug 23, 2019
c46d311
Merge remote-tracking branch 'origin/ci_feature' into dilipr/containe…
r-dilip Aug 28, 2019
5998151
Testing image creation
r-dilip Aug 28, 2019
5ae7d0c
fixing merge issues from ci_feature
r-dilip Aug 29, 2019
68f04e7
Committing for smoke test
r-dilip Aug 29, 2019
3d655c1
Fixed references, fixed the issue where aggregator is not initialized…
r-dilip Aug 29, 2019
d491a39
add node_condition changes to provide state per condition type, confi…
r-dilip Aug 30, 2019
db46c1b
Deleting inventory files
r-dilip Aug 31, 2019
b0a562a
Removing json file
r-dilip Aug 31, 2019
64102f9
Unit Tests
r-dilip Sep 4, 2019
5178a25
Container CPU Memory
r-dilip Sep 20, 2019
73b725a
1. Health Forward
r-dilip Sep 26, 2019
838a12e
Merging from ci_feature
r-dilip Sep 26, 2019
ace23ea
Merging with Tag Changes from Dupe Perf Fix
r-dilip Sep 28, 2019
cc0bca5
Fix failing unit test
r-dilip Sep 28, 2019
fcdcf23
Added Log init
r-dilip Sep 28, 2019
37dfa93
removing feature flag check, and revert config parser changes
r-dilip Sep 30, 2019
61dfdaf
Missing config for master_node_pool, add pod name, state and containe…
r-dilip Oct 2, 2019
819cbca
Fix the following bugs
r-dilip Oct 3, 2019
1334b30
Added Operator, changed thresholds to be pass %, added retries when k…
r-dilip Oct 3, 2019
06dd657
Merge from ci_feature before merging back
r-dilip Oct 3, 2019
4a1a079
Use env variable instead of hardcoded PORT
r-dilip Oct 3, 2019
7f44b09
send when an agg monitor details change, but state did not change
r-dilip Oct 7, 2019
63de09e
Merge branch 'ci_feature' into dilipr/send_on_details_change
r-dilip Oct 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion source/code/plugin/filter_health_model_builder.rb
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,8 @@ def filter_stream(tag, es)
all_monitors.each{|monitor_instance_id, monitor|
if monitor.is_aggregate_monitor
@state.update_state(monitor,
@provider.get_config(monitor.monitor_id)
@provider.get_config(monitor.monitor_id),
true
)
end

Expand Down
25 changes: 21 additions & 4 deletions source/code/plugin/health/health_monitor_state.rb
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,11 @@ def initialize_state(deserialized_state)
2. if there is a "consistent" state change for monitors
3. if the signal is stale (> 4hrs)
4. If the latest state is none
5. If an aggregate monitor has a change in its details, but no change in state
=end
def update_state(monitor, #UnitMonitor/AggregateMonitor
monitor_config #Hash
)
monitor_config, #Hash
is_aggregate_monitor = false)
samples_to_keep = 1
monitor_instance_id = monitor.monitor_instance_id
log = HealthMonitorHelpers.get_log_handle
Expand All @@ -76,12 +77,13 @@ def update_state(monitor, #UnitMonitor/AggregateMonitor
samples_to_keep = monitor_config['ConsecutiveSamplesForStateTransition'].to_i
end

deleted_record = {}
if @@monitor_states.key?(monitor_instance_id)
health_monitor_instance_state = @@monitor_states[monitor_instance_id]
health_monitor_records = health_monitor_instance_state.prev_records #This should be an array

if health_monitor_records.size == samples_to_keep
health_monitor_records.delete_at(0)
deleted_record = health_monitor_records.delete_at(0)
end
health_monitor_records.push(monitor.details)
health_monitor_instance_state.prev_records = health_monitor_records
Expand All @@ -106,7 +108,6 @@ def update_state(monitor, #UnitMonitor/AggregateMonitor
@@monitor_states[monitor_instance_id] = health_monitor_instance_state
end


# update old and new state based on the history and latest record.
# TODO: this is a little hairy. Simplify

Expand Down Expand Up @@ -142,6 +143,10 @@ def update_state(monitor, #UnitMonitor/AggregateMonitor
@@first_record_sent[monitor_instance_id] = true
health_monitor_instance_state.should_send = true
set_state(monitor_instance_id, health_monitor_instance_state)
elsif agg_monitor_details_changed?(is_aggregate_monitor, deleted_record, health_monitor_instance_state.prev_records[0])
health_monitor_instance_state.should_send = true
set_state(monitor_instance_id, health_monitor_instance_state)
log.debug "#{monitor_instance_id} condition: agg monitor details changed should_send #{health_monitor_instance_state.should_send}"
end
# latest state is different that last sent state
else
Expand Down Expand Up @@ -212,5 +217,17 @@ def is_state_change_consistent(health_monitor_records, samples_to_check)
end
return true
end

def agg_monitor_details_changed?(is_aggregate_monitor, last_sent_details, latest_details)
log = HealthMonitorHelpers.get_log_handle
if !is_aggregate_monitor
return false
end
if latest_details['details'] != last_sent_details['details']
log.info "Last Sent Details #{JSON.pretty_generate(last_sent_details)} \n Latest Details: #{JSON.pretty_generate(latest_details)}"
return true
end
return false
end
end
end