Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 31 additions & 12 deletions source/plugins/ruby/kubernetes_container_inventory.rb
Original file line number Diff line number Diff line change
Expand Up @@ -193,25 +193,41 @@ def obtainContainerEnvironmentVars(containerId)
$log.info("KubernetesContainerInventory::obtainContainerEnvironmentVars @ #{Time.now.utc.iso8601}")
envValueString = ""
begin
unless @@containerCGroupCache.has_key?(containerId)
isCGroupPidFetchRequired = false
if !@@containerCGroupCache.has_key?(containerId)
isCGroupPidFetchRequired = true
else
cGroupPid = @@containerCGroupCache[containerId]
if cGroupPid.nil? || cGroupPid.empty?
isCGroupPidFetchRequired = true
@@containerCGroupCache.delete(containerId)
elsif !File.exist?("/hostfs/proc/#{cGroupPid}/environ")
isCGroupPidFetchRequired = true
@@containerCGroupCache.delete(containerId)
end
end

if isCGroupPidFetchRequired
$log.info("KubernetesContainerInventory::obtainContainerEnvironmentVars fetching cGroup parent pid @ #{Time.now.utc.iso8601} for containerId: #{containerId}")
Dir["/hostfs/proc/*/cgroup"].each do |filename|
begin
if File.file?(filename) && File.foreach(filename).grep(/#{containerId}/).any?
if File.file?(filename) && File.exist?(filename) && File.foreach(filename).grep(/#{containerId}/).any?
Copy link
Member

@vishiy vishiy Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has 'more than 40 containers in a pod' do with this ? How does this fix take care of that ?

Copy link
Contributor Author

@ganga1980 ganga1980 Jan 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has 'more than 40 containers in a pod' do with this ? How does this fix take care of that ?

if the pod has more than ~40 containers, then cproc process was deleted and created new process and this happening for the pods which are high number of containers. Very first time we are able to get envvar from the cproc pid and there after that process gone so we are getting empty env vars . This was reproducible with pod ~40 containers. Fix to refetch cproc pid if the file cproc doesnt exist any more and use parent process and also added logic to ignore the cproc pids which are non-numeric.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying pids have changed for the same container between last time & current ? are the containers restarting ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you saying pids have changed for the same container between last time & current ? are the containers restarting ?

correct. This reproducible with large pods (>40 containers also, pod has same container image across all the containers) and no container/pod restart. This was the yaml which deployed to dev cluster and used the same https://raw.githubusercontent.com/microsoft/Docker-Provider/ci_dev/test/scenario/yamls/many-containers-in-pod.yaml for repro and validating the fix with different containers. Fix to refetch cgroup id and also use parent id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont know what this 40 is about. But based on what you are saying, for PIds that are reused, you will be picking up wrong envs due to wrong mappin in the cache. I will approve this PR for now, but i am not sure whats the issue here and we should understand the root cause of it.

# file full path is /hostfs/proc/<cGroupPid>/cgroup
cGroupPid = filename.split("/")[3]
if @@containerCGroupCache.has_key?(containerId)
tempCGroupPid = @@containerCGroupCache[containerId]
if tempCGroupPid > cGroupPid
cGroupPid = filename.split("/")[3]
if is_number?(cGroupPid)
if @@containerCGroupCache.has_key?(containerId)
tempCGroupPid = @@containerCGroupCache[containerId]
if tempCGroupPid.to_i > cGroupPid.to_i
@@containerCGroupCache[containerId] = cGroupPid
end
else
@@containerCGroupCache[containerId] = cGroupPid
end
else
@@containerCGroupCache[containerId] = cGroupPid
end
end
end
rescue SystemCallError # ignore Error::ENOENT,Errno::ESRCH which is expected if any of the container gone while we read
end
end
rescue SystemCallError # ignore Error::ENOENT,Errno::ESRCH which is expected if any of the container gone while we read
end
end
end
cGroupPid = @@containerCGroupCache[containerId]
if !cGroupPid.nil? && !cGroupPid.empty?
Expand Down Expand Up @@ -341,5 +357,8 @@ def deleteCGroupCacheEntryForDeletedContainer(containerId)
ApplicationInsightsUtility.sendExceptionTelemetry(error)
end
end
def is_number?(value)
true if Integer(value) rescue false
end
end
end