Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
04826d0
Add in pv metrics from cadvisor
Aug 18, 2020
e0fbdef
Merge branch 'ci_dev' into grwehner/pv
Aug 19, 2020
a459794
changed to send only pv usage & add kube-system toggle config
Aug 20, 2020
0ec8ef9
variable name fixes
Aug 21, 2020
fb8a214
Added kube-system config
Aug 24, 2020
0b2f9dc
mdm filter
Aug 24, 2020
1bad74f
add pv_used_bytes to mdm filter metrics conf
Aug 24, 2020
7068629
filter fixes
Aug 24, 2020
94348cd
more filter fixes
Aug 24, 2020
58230fd
end statement fix
Aug 24, 2020
f68c04a
log fixes
Aug 25, 2020
46c1b50
all pv records to mdm
Aug 25, 2020
db24b0f
different mdm generator method
Aug 25, 2020
9d6874f
out_mdm log path
Aug 26, 2020
cdf96a0
try to get out_mdm logging path
Aug 26, 2020
c902df6
pv metric now sending to ME
Aug 26, 2020
0f41269
add in threshold condition
Aug 27, 2020
d4148cc
constants and consistent naming
Aug 27, 2020
1d6cee6
comments and code cleanup
Aug 27, 2020
357914a
remove container name, add pod name/uid
Aug 31, 2020
9377262
log fixes and constnat change
Aug 31, 2020
ee14b2b
naming fix
Aug 31, 2020
c1d46e8
cleanup
Aug 31, 2020
130e5d7
add pvUsedBytes as metric to collect
gracewehner Aug 31, 2020
d0f8d58
more cleanup
Aug 31, 2020
7cce941
Merge branch 'grwehner/pv' of https://github.com/microsoft/Docker-Pro…
Aug 31, 2020
0e7593b
Merge remote-tracking branch 'origin/ci_dev' into grwehner/pv
Aug 31, 2020
f0885e4
boolean fix
Aug 31, 2020
62b84ba
set threshold to 60
gracewehner Sep 1, 2020
9da59bb
add pv inventory fluent plugin structure
gracewehner Sep 1, 2020
9163a90
structure fixes
gracewehner Sep 2, 2020
6f705b7
send as insights metrics
gracewehner Sep 2, 2020
2e1a2cd
include in pod inventory
gracewehner Sep 2, 2020
8fca127
add check that pvUsedBytes is a configured metric to collect
gracewehner Sep 3, 2020
da0a34d
code review feedback changes
gracewehner Sep 4, 2020
68404bf
after testing changes
gracewehner Sep 4, 2020
4505b45
whitespace fix
gracewehner Sep 4, 2020
c08054b
variable name fix
gracewehner Sep 4, 2020
c88b9ab
naming changes
gracewehner Sep 8, 2020
d2c6c4a
Merge remote-tracking branch 'origin/grwehner/pv' into grwehner/pv-in…
gracewehner Sep 9, 2020
4447cdd
make call for pv instead of pvc
gracewehner Sep 9, 2020
dc3351d
disk info and telemetry
gracewehner Sep 9, 2020
4d5e228
logging and pvcs in pod inventory
gracewehner Sep 9, 2020
8312caf
parsing fixes
gracewehner Sep 9, 2020
610cc71
pv inventory in pod inventory and more telemetry
gracewehner Sep 9, 2020
7c4a547
cleanup and add logging for kube api response size
gracewehner Sep 9, 2020
a578ba3
payload investigation
gracewehner Sep 10, 2020
4578b80
getting more resposne size info
gracewehner Sep 10, 2020
bc6b8cc
use continuation token, get rid of mdm path
gracewehner Sep 10, 2020
7545214
use kubepvinventory path
gracewehner Sep 10, 2020
4ae76a1
add back in parse_and_emit
gracewehner Sep 10, 2020
a9218a1
additions for PV Type
gracewehner Sep 14, 2020
bc4a943
updated schema, sending to insights metrics for testing
gracewehner Sep 14, 2020
948bf9a
bug fixes
gracewehner Sep 18, 2020
75d1e75
Merge remote-tracking branch 'origin/ci_dev' into grwehner/pv-inventory
gracewehner Oct 5, 2020
9016f14
route to new LA table
gracewehner Oct 5, 2020
cc8c023
refactoring
gracewehner Oct 6, 2020
062306a
add back in pv type list
gracewehner Oct 7, 2020
b1ff023
after testing fixes
gracewehner Oct 7, 2020
59108b4
comments and rescues
gracewehner Oct 7, 2020
677d938
remove extra logging
gracewehner Oct 7, 2020
db84280
fix variable naming
gracewehner Oct 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions build/linux/installer/conf/kube.conf
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,14 @@
custom_metrics_azure_regions eastus,southcentralus,westcentralus,westus2,southeastasia,northeurope,westeurope,southafricanorth,centralus,northcentralus,eastus2,koreacentral,eastasia,centralindia,uksouth,canadacentral,francecentral,japaneast,australiaeast,eastus2,westus,australiasoutheast,brazilsouth,germanywestcentral,northcentralus,switzerlandnorth
</source>

#Kubernetes Persistent Volume inventory
<source>
type kubepvinventory
tag oms.containerinsights.KubePVInventory
run_interval 60
log_level debug
</source>

#Kubernetes events
<source>
type kubeevents
Expand Down Expand Up @@ -98,6 +106,21 @@
max_retry_wait 5m
</match>

<match oms.containerinsights.KubePVInventory**>
type out_oms
log_level debug
num_threads 5
buffer_chunk_limit 4m
buffer_type file
buffer_path %STATE_DIR_WS%/state/out_oms_kubepv*.buffer
buffer_queue_limit 20
buffer_queue_full_action drop_oldest_chunk
flush_interval 20s
retry_limit 10
retry_wait 5s
max_retry_wait 5m
</match>

<match oms.containerinsights.KubeEvents**>
type out_oms
log_level debug
Expand Down
1 change: 1 addition & 0 deletions build/linux/installer/datafiles/base_container.data
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ MAINTAINER: 'Microsoft Corporation'
/opt/microsoft/omsagent/plugin/filter_container.rb; source/plugins/ruby/filter_container.rb; 644; root; root

/opt/microsoft/omsagent/plugin/in_kube_podinventory.rb; source/plugins/ruby/in_kube_podinventory.rb; 644; root; root
/opt/microsoft/omsagent/plugin/in_kube_pvinventory.rb; source/plugins/ruby/in_kube_pvinventory.rb; 644; root; root
/opt/microsoft/omsagent/plugin/in_kube_events.rb; source/plugins/ruby/in_kube_events.rb; 644; root; root
/opt/microsoft/omsagent/plugin/KubernetesApiClient.rb; source/plugins/ruby/KubernetesApiClient.rb; 644; root; root

Expand Down
24 changes: 24 additions & 0 deletions kubernetes/omsagent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ rules:
"nodes/proxy",
"namespaces",
"services",
"persistentvolumes"
]
verbs: ["list", "get", "watch"]
- apiGroups: ["apps", "extensions", "autoscaling"]
Expand Down Expand Up @@ -67,6 +68,14 @@ data:
custom_metrics_azure_regions eastus,southcentralus,westcentralus,westus2,southeastasia,northeurope,westeurope,southafricanorth,centralus,northcentralus,eastus2,koreacentral,eastasia,centralindia,uksouth,canadacentral,francecentral,japaneast,australiaeast,eastus2,westus,australiasoutheast,brazilsouth,germanywestcentral,northcentralus,switzerlandnorth
</source>

#Kubernetes Persistent Volume inventory
<source>
type kubepvinventory
tag oms.containerinsights.KubePVInventory
run_interval 60
log_level debug
</source>

#Kubernetes events
<source>
type kubeevents
Expand Down Expand Up @@ -149,6 +158,21 @@ data:
max_retry_wait 5m
</match>

<match oms.containerinsights.KubePVInventory**>
type out_oms
log_level debug
num_threads 5
buffer_chunk_limit 4m
buffer_type file
buffer_path %STATE_DIR_WS%/state/out_oms_kubepv*.buffer
buffer_queue_limit 20
buffer_queue_full_action drop_oldest_chunk
flush_interval 20s
retry_limit 10
retry_wait 5s
max_retry_wait 5m
</match>

<match oms.containerinsights.KubeEvents**>
type out_oms
log_level debug
Expand Down
4 changes: 4 additions & 0 deletions source/plugins/ruby/constants.rb
Original file line number Diff line number Diff line change
Expand Up @@ -77,13 +77,17 @@ class Constants
OMSAGENT_ZERO_FILL = "omsagent"
KUBESYSTEM_NAMESPACE_ZERO_FILL = "kube-system"
VOLUME_NAME_ZERO_FILL = "-"
PV_TYPES =["awsElasticBlockStore", "azureDisk", "azureFile", "cephfs", "cinder", "csi", "fc", "flexVolume",
"flocker", "gcePersistentDisk", "glusterfs", "hostPath", "iscsi", "local", "nfs",
"photonPersistentDisk", "portworxVolume", "quobyte", "rbd", "scaleIO", "storageos", "vsphereVolume"]

#Telemetry constants
CONTAINER_METRICS_HEART_BEAT_EVENT = "ContainerMetricsMdmHeartBeatEvent"
POD_READY_PERCENTAGE_HEART_BEAT_EVENT = "PodReadyPercentageMdmHeartBeatEvent"
CONTAINER_RESOURCE_UTIL_HEART_BEAT_EVENT = "ContainerResourceUtilMdmHeartBeatEvent"
PV_USAGE_HEART_BEAT_EVENT = "PVUsageMdmHeartBeatEvent"
PV_KUBE_SYSTEM_METRICS_ENABLED_EVENT = "CollectPVKubeSystemMetricsEnabled"
PV_INVENTORY_HEART_BEAT_EVENT = "KubePVInventoryHeartBeatEvent"
TELEMETRY_FLUSH_INTERVAL_IN_MINUTES = 10
KUBE_STATE_TELEMETRY_FLUSH_INTERVAL_IN_MINUTES = 15
ZERO_FILL_METRICS_INTERVAL_IN_MINUTES = 30
Expand Down
253 changes: 253 additions & 0 deletions source/plugins/ruby/in_kube_pvinventory.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
module Fluent
class Kube_PVInventory_Input < Input
Plugin.register_input("kubepvinventory", self)

@@hostName = (OMS::Common.get_hostname)

def initialize
super
require "yaml"
require "yajl/json_gem"
require "yajl"
require "time"
require_relative "KubernetesApiClient"
require_relative "ApplicationInsightsUtility"
require_relative "oms_common"
require_relative "omslog"
require_relative "constants"

# Response size is around 1500 bytes per PV
@PV_CHUNK_SIZE = "5000"
@pvTypeToCountHash = {}
end

config_param :run_interval, :time, :default => 60
config_param :tag, :string, :default => "oms.containerinsights.KubePVInventory"

def configure(conf)
super
end

def start
if @run_interval
@finished = false
@condition = ConditionVariable.new
@mutex = Mutex.new
@thread = Thread.new(&method(:run_periodic))
@@pvTelemetryTimeTracker = DateTime.now.to_time.to_i
end
end

def shutdown
if @run_interval
@mutex.synchronize {
@finished = true
@condition.signal
}
@thread.join
end
end

def enumerate
begin
pvInventory = nil
telemetryFlush = false
@pvTypeToCountHash = {}
currentTime = Time.now
batchTime = currentTime.utc.iso8601

continuationToken = nil
$log.info("in_kube_pvinventory::enumerate : Getting PVs from Kube API @ #{Time.now.utc.iso8601}")
continuationToken, pvInventory = KubernetesApiClient.getResourcesAndContinuationToken("persistentvolumes?limit=#{@PV_CHUNK_SIZE}")
$log.info("in_kube_pvinventory::enumerate : Done getting PVs from Kube API @ #{Time.now.utc.iso8601}")

if (!pvInventory.nil? && !pvInventory.empty? && pvInventory.key?("items") && !pvInventory["items"].nil? && !pvInventory["items"].empty?)
parse_and_emit_records(pvInventory, batchTime)
else
$log.warn "in_kube_pvinventory::enumerate:Received empty pvInventory"
end

# If we receive a continuation token, make calls, process and flush data until we have processed all data
while (!continuationToken.nil? && !continuationToken.empty?)
continuationToken, pvInventory = KubernetesApiClient.getResourcesAndContinuationToken("persistentvolumes?limit=#{@PV_CHUNK_SIZE}&continue=#{continuationToken}")
if (!pvInventory.nil? && !pvInventory.empty? && pvInventory.key?("items") && !pvInventory["items"].nil? && !pvInventory["items"].empty?)
parse_and_emit_records(pvInventory, batchTime)
else
$log.warn "in_kube_pvinventory::enumerate:Received empty pvInventory"
end
end

# Setting this to nil so that we dont hold memory until GC kicks in
pvInventory = nil

# Adding telemetry to send pod telemetry every 10 minutes
timeDifference = (DateTime.now.to_time.to_i - @@pvTelemetryTimeTracker).abs
timeDifferenceInMinutes = timeDifference / 60
if (timeDifferenceInMinutes >= Constants::TELEMETRY_FLUSH_INTERVAL_IN_MINUTES)
telemetryFlush = true
end

# Flush AppInsights telemetry once all the processing is done
if telemetryFlush == true
telemetryProperties = {}
telemetryProperties["CountsOfPVTypes"] = @pvTypeToCountHash
ApplicationInsightsUtility.sendCustomEvent(Constants::PV_INVENTORY_HEART_BEAT_EVENT, telemetryProperties)
@@pvTelemetryTimeTracker = DateTime.now.to_time.to_i
end

rescue => errorStr
$log.warn "in_kube_pvinventory::enumerate:Failed in enumerate: #{errorStr}"
$log.debug_backtrace(errorStr.backtrace)
ApplicationInsightsUtility.sendExceptionTelemetry(errorStr)
end
end # end enumerate

def parse_and_emit_records(pvInventory, batchTime = Time.utc.iso8601)
currentTime = Time.now
emitTime = currentTime.to_f
eventStream = MultiEventStream.new

begin
records = []
pvInventory["items"].each do |item|

# Node, pod, & usage info can be found by joining with pvUsedBytes metric using PVCNamespace/PVCName
record = {}
record["CollectionTime"] = batchTime
record["ClusterId"] = KubernetesApiClient.getClusterId
record["ClusterName"] = KubernetesApiClient.getClusterName
record["PVName"] = item["metadata"]["name"]
record["PVStatus"] = item["status"]["phase"]
record["PVAccessModes"] = item["spec"]["accessModes"].join(', ')
record["PVStorageClassName"] = item["spec"]["storageClassName"]
record["PVCapacityBytes"] = KubernetesApiClient.getMetricNumericValue("memory", item["spec"]["capacity"]["storage"])
record["PVCreationTimeStamp"] = item["metadata"]["creationTimestamp"]

# Optional values
pvcNamespace, pvcName = getPVCInfo(item)
type, typeInfo = getTypeInfo(item)
record["PVCNamespace"] = pvcNamespace
record["PVCName"] = pvcName
record["PVType"] = type
record["PVTypeInfo"] = typeInfo

records.push(record)

# Record telemetry
if type == nil
type = "empty"
end
if (@pvTypeToCountHash.has_key? type)
@pvTypeToCountHash[type] += 1
else
@pvTypeToCountHash[type] = 1
end
end

records.each do |record|
if !record.nil?
wrapper = {
"DataType" => "KUBE_PV_INVENTORY_BLOB",
"IPName" => "ContainerInsights",
"DataItems" => [record.each { |k, v| record[k] = v }],
}
eventStream.add(emitTime, wrapper) if wrapper
end
end

router.emit_stream(@tag, eventStream) if eventStream

rescue => errorStr
$log.warn "Failed in parse_and_emit_record for in_kube_pvinventory: #{errorStr}"
$log.debug_backtrace(errorStr.backtrace)
ApplicationInsightsUtility.sendExceptionTelemetry(errorStr)
end
end

def getPVCInfo(item)
begin
if !item["spec"].nil? && !item["spec"]["claimRef"].nil?
claimRef = item["spec"]["claimRef"]
pvcNamespace = claimRef["namespace"]
pvcName = claimRef["name"]
return pvcNamespace, pvcName
end
rescue => errorStr
$log.warn "Failed in getPVCInfo for in_kube_pvinventory: #{errorStr}"
$log.debug_backtrace(errorStr.backtrace)
ApplicationInsightsUtility.sendExceptionTelemetry(errorStr)
end

# No PVC or an error
return nil, nil
end

def getTypeInfo(item)
begin
if !item["spec"].nil?
(Constants::PV_TYPES).each do |pvType|

# PV is this type
if !item["spec"][pvType].nil?

# Get additional info if azure disk/file
typeInfo = {}
if pvType == "azureDisk"
azureDisk = item["spec"]["azureDisk"]
typeInfo["DiskName"] = azureDisk["diskName"]
typeInfo["DiskUri"] = azureDisk["diskURI"]
elsif pvType == "azureFile"
typeInfo["FileShareName"] = item["spec"]["azureFile"]["shareName"]
end

# Can only have one type: return right away when found
return pvType, typeInfo

end
end
end
rescue => errorStr
$log.warn "Failed in getTypeInfo for in_kube_pvinventory: #{errorStr}"
$log.debug_backtrace(errorStr.backtrace)
ApplicationInsightsUtility.sendExceptionTelemetry(errorStr)
end

# No matches from list of types or an error
return nil, {}
end


def run_periodic
@mutex.lock
done = @finished
@nextTimeToRun = Time.now
@waitTimeout = @run_interval
until done
@nextTimeToRun = @nextTimeToRun + @run_interval
@now = Time.now
if @nextTimeToRun <= @now
@waitTimeout = 1
@nextTimeToRun = @now
else
@waitTimeout = @nextTimeToRun - @now
end
@condition.wait(@mutex, @waitTimeout)
done = @finished
@mutex.unlock
if !done
begin
$log.info("in_kube_pvinventory::run_periodic.enumerate.start #{Time.now.utc.iso8601}")
enumerate
$log.info("in_kube_pvinventory::run_periodic.enumerate.end #{Time.now.utc.iso8601}")
rescue => errorStr
$log.warn "in_kube_pvinventory::run_periodic: enumerate Failed to retrieve pod inventory: #{errorStr}"
ApplicationInsightsUtility.sendExceptionTelemetry(errorStr)
end
end
@mutex.lock
end
@mutex.unlock
end

end # Kube_PVInventory_Input
end # module