Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
3c5b46d
Updatng release history
vishiy Aug 1, 2018
d31f588
fixing the plugin logs for emit stream
Aug 1, 2018
11fd5f6
updating log message
Aug 5, 2018
87a9cf8
Remove Log Processing from fluentd configuration
r-dilip Aug 16, 2018
308be41
Remove plugin references from base_container.data
r-dilip Aug 16, 2018
5bee0af
Merge pull request #124 from Microsoft/dilipr/fluentdConfigUpdates
r-dilip Aug 30, 2018
bcd1a3f
Dilipr/fluent bit log processing (#126)
r-dilip Sep 14, 2018
b02f2ec
Dilipr/glide updates (#127)
r-dilip Sep 14, 2018
e01c678
containerID="" for pull issues
vishiy Sep 17, 2018
b0ba22d
Using KubeAPI for getting image,name. Adding more logs (#129)
r-dilip Sep 18, 2018
9783419
Dilipr/mark comments (#130)
r-dilip Sep 27, 2018
8e35b73
Rashmi/segfault latest (#132)
rashmichandrashekar Sep 27, 2018
4b63021
Adding a missed null check (#135)
rashmichandrashekar Sep 27, 2018
8b964fd
reusing some variables (#136)
rashmichandrashekar Sep 28, 2018
938c2ed
Rashmi/cjson delete null check (#138)
rashmichandrashekar Sep 28, 2018
fbfdf11
updating log level to debug for some provider workflows (#139)
rashmichandrashekar Oct 3, 2018
d426066
Fixing CPU Utilization and removing Fluent-bit filters (#140)
r-dilip Oct 4, 2018
c2cabab
Minor tweaks 1. Remove some logging 2. Added more Error Handling 3. C…
r-dilip Oct 9, 2018
32567db
* Change FluentBit flush interval to 30 secs (from 5 secs)
vishiy Oct 10, 2018
afc981d
Container Log Telemetry
r-dilip Oct 12, 2018
4b958dd
Fixing an issue with Send Init Event if Telemetry is not initialized …
r-dilip Oct 12, 2018
510ef9f
PR feedback
r-dilip Oct 12, 2018
684c39b
PR feedback
r-dilip Oct 12, 2018
e165275
Sending an event every 5 mins(Heartbeat) (#146)
r-dilip Oct 15, 2018
eecb5db
Merge branch 'ci_feature_prod' into ci_feature
vishiy Oct 16, 2018
cfe1ca9
PR feedback to cleanup removed workflows
vishiy Oct 16, 2018
892b51c
updating agent version for telemetry
vishiy Oct 16, 2018
9c83160
updating agent version
vishiy Oct 17, 2018
f0b5a61
Telemetry Updates (#149)
r-dilip Oct 25, 2018
a58998e
Changes to send omsagent/omsagent-rs kubectl logs to App Insights (#159)
r-dilip Oct 30, 2018
4c2da9f
Rashmi/fluentd docker inventory (#160)
rashmichandrashekar Nov 5, 2018
6698fcd
Fix Telemetry Bug -- Initialize Telemetry Client after Initializing a…
r-dilip Nov 8, 2018
ad6bb93
Fix kube events memory leak due to yaml serialization for > 5k events…
vishiy Nov 12, 2018
eff92df
Setting Timeout for HTTP Client in PostDataHelper in outoms go plugi…
r-dilip Nov 14, 2018
9893e36
Vishwa/perftelemetry 2 (#165)
vishiy Nov 16, 2018
4f3c898
environment variable fix (#166)
rashmichandrashekar Nov 27, 2018
5e16467
Fixing a bug where we were crashing due to container statuses not pre…
vishiy Nov 27, 2018
b482b1e
Updating title
vishiy Nov 29, 2018
d75ba89
updating right versions for last release
vishiy Nov 29, 2018
cbd815c
Updating the break condition to look for end of response (#168)
rashmichandrashekar Nov 29, 2018
d0d5bf7
updating AgentVersion for telemetry
vishiy Nov 29, 2018
bfe27e5
Updating readme for latest release changes
vishiy Nov 29, 2018
5677560
Merge branch 'ci_feature_prod' into ci_feature
vishiy Nov 29, 2018
a621f88
Changes - (#173)
vishiy Dec 17, 2018
c9cf4fd
Rashmi/kubenodeinventory (#174)
rashmichandrashekar Dec 17, 2018
df6f122
Get cpuusage from usageseconds (#175)
vishiy Dec 20, 2018
dac9931
Rashmi/kubenodeinventory (#176)
rashmichandrashekar Dec 21, 2018
04cc1a8
Rashmi/kubenodeinventory (#178)
rashmichandrashekar Dec 26, 2018
5883f53
Fixing an issue on the cpurate metric, which happens for the first ti…
vishiy Dec 26, 2018
191f328
Rashmi/kubenodeinventory (#180)
rashmichandrashekar Dec 28, 2018
7e52e8c
Exclude docker containers from container inventory (#181)
rashmichandrashekar Jan 7, 2019
f0591f9
Exclude pauseamd64 containers from container inventory (#182)
rashmichandrashekar Jan 8, 2019
99e8813
Merge branch 'ci_feature_prod' into ci_feature
vishiy Jan 9, 2019
4782435
Update agent version
vishiy Jan 9, 2019
23bcc41
Updating readme for the latest release
vishiy Jan 9, 2019
51d5e93
Fix indentation in kube.conf and update readme (#184)
rashmichandrashekar Jan 11, 2019
decf86a
updating agent tag
rashmichandrashekar Jan 11, 2019
a1b35db
Get Pods for current Node Only (#185)
r-dilip Jan 29, 2019
22649ba
changes for container node inventory fixed type (#186)
rashmichandrashekar Jan 30, 2019
61e2eaf
Fix for mooncake (disable telemetry optionally) (#191)
vishiy Feb 13, 2019
30dff41
CustomMetrics to ci_feature (#193)
r-dilip Feb 15, 2019
f1b0cd2
add ContainerNotRunning column to KubePodInventory
bragi92 Jan 24, 2019
616a803
merge pr feedback: update name to ContainerStatusReason
bragi92 Jan 24, 2019
c33ca34
Zero Fill for Missing Pod Phases, Change Namespace Dimension to Kuber…
r-dilip Feb 19, 2019
2651750
No Retries for non 404 4xx errors (#196)
r-dilip Feb 20, 2019
195bc33
Update agent version for telemetry
vishiy Feb 20, 2019
59d6c61
Update readme for upcoming (ciprod01202019) release
vishiy Feb 20, 2019
0189bc0
fix readme formatting
vishiy Feb 20, 2019
8221d2d
fix formatting for readme
vishiy Feb 20, 2019
30aa305
fix formatting for readme
vishiy Feb 20, 2019
f401116
fix readme
vishiy Feb 20, 2019
a2f45af
fix readme
vishiy Feb 21, 2019
759dbb5
fix agent version for telemetry
vishiy Feb 21, 2019
8bff5f9
Merge branch 'ci_feature_prod' into ci_feature
vishiy Feb 21, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,22 @@ additional questions or comments.

Note : The agent version(s) below has dates (ciprod<mmddyyyy>), which indicate the agent build dates (not release dates)

### 01/20/2019 - Version microsoft/oms:ciprod02202019
- Container logs enrichment optimization
* Get container meta data only for containers in current node (vs cluster before)
- Update fluent bit 0.13.7 => 0.14.4
* This fixes the escaping issue in the container logs
- Mooncake cloud support for agent (AKS only)
* Ability to disable agent telemetry
* Ability to onboard and ingest to mooncake cloud
- Add & populate 'ContainerStatusReason' column to KubePodInventory
- Alertable (custom) metrics (to AzureMonitor - only for AKS clusters)
* Cpuusagenanocores & % metric
* MemoryWorkingsetBytes & % metric
* MemoryRssBytes & % metric
* Podcount by node, phase & namespace metric
* Nodecount metric

### 01/09/2018 - Version microsoft/oms:ciprod01092019
- Omsagent - 1.8.1.256 (nov 2018 release)
- Persist fluentbit state between container restarts
Expand All @@ -25,7 +41,7 @@ Note : The agent version(s) below has dates (ciprod<mmddyyyy>), which indicate t
- Agent telemetry - ContainerLogsAgentSideLatencyMs
- Agent telemetry - PodCount
- Agent telemetry - ControllerCount
- Agent telemetry - K8S Version
- Agent telemetry - K8S Version
- Agent telemetry - NodeCoreCapacity
- Agent telemetry - NodeMemoryCapacity
- Agent telemetry - KubeEvents (exceptions)
Expand Down
24 changes: 24 additions & 0 deletions installer/conf/container.conf
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,14 @@
log_level debug
</source>

#custom_metrics_mdm filter plugin
<filter mdm.cadvisorperf**>
type filter_cadvisor2mdm
custom_metrics_azure_regions eastus,southcentralus,westcentralus,westus2,southeastasia,northeurope,westEurope
metrics_to_collect cpuUsageNanoCores,memoryWorkingSetBytes,memoryRssBytes
log_level info
</filter>

<match oms.containerinsights.containerinventory**>
type out_oms
log_level debug
Expand Down Expand Up @@ -52,3 +60,19 @@
retry_wait 30s
max_retry_wait 9m
</match>

<match mdm.cadvisorperf**>
type out_mdm
log_level debug
num_threads 5
buffer_chunk_limit 20m
buffer_type file
buffer_path %STATE_DIR_WS%/out_mdm_cdvisorperf*.buffer
buffer_queue_limit 20
buffer_queue_full_action drop_oldest_chunk
flush_interval 20s
retry_limit 10
retry_wait 30s
max_retry_wait 9m
retry_mdm_post_wait_minutes 60
</match>
26 changes: 24 additions & 2 deletions installer/conf/kube.conf
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@
log_level debug
</source>

<filter mdm.kubepodinventory** mdm.kubenodeinventory**>
type filter_inventory2mdm
custom_metrics_azure_regions eastus,southcentralus,westcentralus,westus2,southeastasia,northeurope,westEurope
log_level info
</filter>

<match oms.containerinsights.KubePodInventory**>
type out_oms
log_level debug
Expand Down Expand Up @@ -119,8 +125,8 @@
max_retry_wait 9m
</match>

<match oms.api.ContainerNodeInventory**>
type out_oms_api
<match oms.containerinsights.ContainerNodeInventory**>
type out_oms
log_level debug
buffer_chunk_limit 20m
buffer_type file
Expand All @@ -146,3 +152,19 @@
retry_wait 30s
max_retry_wait 9m
</match>

<match mdm.kubepodinventory** mdm.kubenodeinventory** >
type out_mdm
log_level debug
num_threads 5
buffer_chunk_limit 20m
buffer_type file
buffer_path /var/opt/microsoft/omsagent/6bb1e963-b08c-43a8-b708-1628305e964a/state/out_mdm_*.buffer
buffer_queue_limit 20
buffer_queue_full_action drop_oldest_chunk
flush_interval 20s
retry_limit 10
retry_wait 30s
max_retry_wait 9m
retry_mdm_post_wait_minutes 60
</match>
2 changes: 1 addition & 1 deletion installer/conf/td-agent-bit.conf
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,5 @@
EnableTelemetry true
TelemetryPushIntervalSeconds 300
Match oms.container.log.*
AgentVersion ciprod01092019
AgentVersion ciprod02202019

14 changes: 14 additions & 0 deletions installer/datafiles/base_container.data
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,19 @@ MAINTAINER: 'Microsoft Corporation'
/opt/microsoft/omsagent/plugin/in_cadvisor_perf.rb; source/code/plugin/in_cadvisor_perf.rb; 644; root; root
/opt/microsoft/omsagent/plugin/in_kube_services.rb; source/code/plugin/in_kube_services.rb; 644; root; root
/opt/microsoft/omsagent/plugin/in_kube_nodes.rb; source/code/plugin/in_kube_nodes.rb; 644; root; root
/opt/microsoft/omsagent/plugin/filter_inventory2mdm.rb; source/code/plugin/filter_inventory2mdm.rb; 644; root; root
/opt/microsoft/omsagent/plugin/CustomMetricsUtils.rb; source/code/plugin/CustomMetricsUtils.rb; 644; root; root


/opt/microsoft/omsagent/plugin/ApplicationInsightsUtility.rb; source/code/plugin/ApplicationInsightsUtility.rb; 644; root; root
/opt/microsoft/omsagent/plugin/ContainerInventoryState.rb; source/code/plugin/ContainerInventoryState.rb; 644; root; root
/opt/microsoft/omsagent/plugin/DockerApiClient.rb; source/code/plugin/DockerApiClient.rb; 644; root; root
/opt/microsoft/omsagent/plugin/DockerApiRestHelper.rb; source/code/plugin/DockerApiRestHelper.rb; 644; root; root
/opt/microsoft/omsagent/plugin/in_containerinventory.rb; source/code/plugin/in_containerinventory.rb; 644; root; root

/opt/microsoft/omsagent/plugin/out_mdm.rb; source/code/plugin/out_mdm.rb; 644; root; root
/opt/microsoft/omsagent/plugin/filter_cadvisor2mdm.rb; source/code/plugin/filter_cadvisor2mdm.rb; 644; root; root

/opt/microsoft/omsagent/plugin/lib/application_insights/version.rb; source/code/plugin/lib/application_insights/version.rb; 644; root; root
/opt/microsoft/omsagent/plugin/lib/application_insights/rack/track_request.rb; source/code/plugin/lib/application_insights/rack/track_request.rb; 644; root; root
/opt/microsoft/omsagent/plugin/lib/application_insights/unhandled_exception.rb; source/code/plugin/lib/application_insights/unhandled_exception.rb; 644; root; root
Expand Down Expand Up @@ -170,6 +176,14 @@ touch /var/opt/microsoft/docker-cimprov/log/kubernetes_perf_log.txt
chmod 666 /var/opt/microsoft/docker-cimprov/log/kubernetes_perf_log.txt
chown omsagent:omiusers /var/opt/microsoft/docker-cimprov/log/kubernetes_perf_log.txt

touch /var/opt/microsoft/docker-cimprov/log/filter_cadvisor2mdm.log
chmod 666 /var/opt/microsoft/docker-cimprov/log/filter_cadvisor2mdm.log
chown omsagent:omiusers /var/opt/microsoft/docker-cimprov/log/filter_cadvisor2mdm.log

touch /var/opt/microsoft/docker-cimprov/log/filter_inventory2mdm.log
chmod 666 /var/opt/microsoft/docker-cimprov/log/filter_inventory2mdm.log
chown omsagent:omiusers /var/opt/microsoft/docker-cimprov/log/filter_inventory2mdm.log

mv /etc/opt/microsoft/docker-cimprov/container.conf /etc/opt/microsoft/omsagent/sysconf/omsagent.d/container.conf
chown omsagent:omsagent /etc/opt/microsoft/omsagent/sysconf/omsagent.d/container.conf

Expand Down
47 changes: 25 additions & 22 deletions source/code/go/src/plugins/oms.go
Original file line number Diff line number Diff line change
Expand Up @@ -77,15 +77,15 @@ var (

// DataItem represents the object corresponding to the json that is sent by fluentbit tail plugin
type DataItem struct {
LogEntry string `json:"LogEntry"`
LogEntrySource string `json:"LogEntrySource"`
LogEntryTimeStamp string `json:"LogEntryTimeStamp"`
LogEntryTimeOfCommand string `json:"TimeOfCommand"`
ID string `json:"Id"`
Image string `json:"Image"`
Name string `json:"Name"`
SourceSystem string `json:"SourceSystem"`
Computer string `json:"Computer"`
LogEntry string `json:"LogEntry"`
LogEntrySource string `json:"LogEntrySource"`
LogEntryTimeStamp string `json:"LogEntryTimeStamp"`
LogEntryTimeOfCommand string `json:"TimeOfCommand"`
ID string `json:"Id"`
Image string `json:"Image"`
Name string `json:"Name"`
SourceSystem string `json:"SourceSystem"`
Computer string `json:"Computer"`
}

// ContainerLogBlob represents the object corresponding to the payload that is sent to the ODS end point
Expand Down Expand Up @@ -137,7 +137,10 @@ func updateContainerImageNameMaps() {
_imageIDMap := make(map[string]string)
_nameIDMap := make(map[string]string)

pods, err := ClientSet.CoreV1().Pods("").List(metav1.ListOptions{})
listOptions := metav1.ListOptions{}
listOptions.FieldSelector = fmt.Sprintf("spec.nodeName=%s", Computer)
pods, err := ClientSet.CoreV1().Pods("").List(listOptions)

if err != nil {
message := fmt.Sprintf("Error getting pods %s\nIt is ok to log here and continue, because the logs will be missing image and Name, but the logs will still have the containerID", err.Error())
Log(message)
Expand Down Expand Up @@ -244,31 +247,31 @@ func PostDataHelper(tailPluginRecords []map[interface{}]interface{}) int {
if val, ok := imageIDMap[containerID]; ok {
stringMap["Image"] = val
} else {
Log("ContainerId %s not present in Map ", containerID)
Log("ContainerId %s not present in Name Map ", containerID)
}

if val, ok := nameIDMap[containerID]; ok {
stringMap["Name"] = val
} else {
Log("ContainerId %s not present in Map ", containerID)
Log("ContainerId %s not present in Image Map ", containerID)
}


dataItem := DataItem{
ID: stringMap["Id"],
LogEntry: stringMap["LogEntry"],
LogEntrySource: stringMap["LogEntrySource"],
LogEntryTimeStamp: stringMap["LogEntryTimeStamp"],
LogEntryTimeOfCommand: start.Format(time.RFC3339),
SourceSystem: stringMap["SourceSystem"],
Computer: Computer,
Image: stringMap["Image"],
Name: stringMap["Name"],
ID: stringMap["Id"],
LogEntry: stringMap["LogEntry"],
LogEntrySource: stringMap["LogEntrySource"],
LogEntryTimeStamp: stringMap["LogEntryTimeStamp"],
LogEntryTimeOfCommand: start.Format(time.RFC3339),
SourceSystem: stringMap["SourceSystem"],
Computer: Computer,
Image: stringMap["Image"],
Name: stringMap["Name"],
}

dataItems = append(dataItems, dataItem)
loggedTime, e := time.Parse(time.RFC3339, dataItem.LogEntryTimeStamp)
if e!= nil {
if e != nil {
message := fmt.Sprintf("Error while converting LogEntryTimeStamp for telemetry purposes: %s", e.Error())
Log(message)
SendException(message)
Expand Down
5 changes: 5 additions & 0 deletions source/code/go/src/plugins/telemetry.go
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,11 @@ func InitializeTelemetryClient(agentVersion string) (int, error) {
}

TelemetryClient = appinsights.NewTelemetryClient(string(decIkey))
telemetryOffSwitch := os.Getenv("DISABLE_TELEMETRY")
if strings.Compare(strings.ToLower(telemetryOffSwitch), "true") == 0 {
Log("Appinsights telemetry is disabled \n")
TelemetryClient.SetIsEnabled(false)
}

CommonProperties = make(map[string]string)
CommonProperties["Computer"] = Computer
Expand Down
28 changes: 25 additions & 3 deletions source/code/plugin/ApplicationInsightsUtility.rb
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,16 @@ def initializeUtility()
@@CustomProperties['AgentVersion'] = ENV[@@EnvAgentVersion]
@@CustomProperties['ControllerType'] = ENV[@@EnvControllerType]
encodedAppInsightsKey = ENV[@@EnvApplicationInsightsKey]
if !encodedAppInsightsKey.nil?

#Check if telemetry is turned off
telemetryOffSwitch = ENV['DISABLE_TELEMETRY']
if telemetryOffSwitch && !telemetryOffSwitch.nil? && !telemetryOffSwitch.empty? && telemetryOffSwitch.downcase == "true".downcase
$log.warn("AppInsightsUtility: Telemetry is disabled")
@@Tc = ApplicationInsights::TelemetryClient.new
elsif !encodedAppInsightsKey.nil?
decodedAppInsightsKey = Base64.decode64(encodedAppInsightsKey)
@@Tc = ApplicationInsights::TelemetryClient.new decodedAppInsightsKey

end
rescue => errorStr
$log.warn("Exception in AppInsightsUtility: initilizeUtility - error: #{errorStr}")
Expand Down Expand Up @@ -91,7 +98,7 @@ def sendHeartBeatEvent(pluginName)
end
end

def sendCustomMetric(pluginName, properties)
def sendLastProcessedContainerInventoryCountMetric(pluginName, properties)
begin
if !(@@Tc.nil?)
@@Tc.track_metric 'LastProcessedContainerInventoryCount', properties['ContainerCount'],
Expand All @@ -105,6 +112,21 @@ def sendCustomMetric(pluginName, properties)
end
end

def sendCustomEvent(eventName, properties)
begin
if @@CustomProperties.empty? || @@CustomProperties.nil?
initializeUtility()
end
if !(@@Tc.nil?)
@@Tc.track_event eventName, :properties => @@CustomProperties
@@Tc.flush
$log.info("AppInsights Custom Event #{eventName} sent successfully")
end
rescue => errorStr
$log.warn("Exception in AppInsightsUtility: sendCustomEvent - error: #{errorStr}")
end
end

def sendExceptionTelemetry(errorStr)
begin
if @@CustomProperties.empty? || @@CustomProperties.nil?
Expand Down Expand Up @@ -132,7 +154,7 @@ def sendTelemetry(pluginName, properties)
end
@@CustomProperties['Computer'] = properties['Computer']
sendHeartBeatEvent(pluginName)
sendCustomMetric(pluginName, properties)
sendLastProcessedContainerInventoryCountMetric(pluginName, properties)
rescue => errorStr
$log.warn("Exception in AppInsightsUtility: sendTelemetry - error: #{errorStr}")
end
Expand Down
26 changes: 26 additions & 0 deletions source/code/plugin/CustomMetricsUtils.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/local/bin/ruby
# frozen_string_literal: true

class CustomMetricsUtils
def initialize
end

class << self
def check_custom_metrics_availability(custom_metric_regions)
aks_region = ENV['AKS_REGION']
aks_resource_id = ENV['AKS_RESOURCE_ID']
if aks_region.to_s.empty? && aks_resource_id.to_s.empty?
false # This will also take care of AKS-Engine Scenario. AKS_REGION/AKS_RESOURCE_ID is not set for AKS-Engine. Only ACS_RESOURCE_NAME is set
end

custom_metrics_regions_arr = custom_metric_regions.split(',')
custom_metrics_regions_hash = custom_metrics_regions_arr.map {|x| [x.downcase,true]}.to_h

if custom_metrics_regions_hash.key?(aks_region.downcase)
true
else
false
end
end
end
end
Loading