Skip to content

Comments

LLAMA Probe - Lamoid Grazer#24

Open
driftingaway86 wants to merge 32 commits intoscline:masterfrom
driftingaway86:lamoid-grazer
Open

LLAMA Probe - Lamoid Grazer#24
driftingaway86 wants to merge 32 commits intoscline:masterfrom
driftingaway86:lamoid-grazer

Conversation

@driftingaway86
Copy link

@driftingaway86 driftingaway86 commented Oct 29, 2021

Lamoid Grazer

Go Application that manages the Reflector/Collector Processes and the checking of the collectors current running config and reloads the application if it detects a change in the config by way of MD5 compare.

I tried to keep the external packages to a minimum but for the most part this is a fairly lightweight process/configuration manager for the services.

House Keeping

  • Updated .gitignore to not upload local build artifacts
  • Fixed some typos in README.MD
  • Not sure if you wanted the build.sh script to try a push after building so removed that.
  • Updated Dockerfile to use entry point and the new process manager.
  • Moved old content to legacy directory.

Testing

Testing was done by compiling the binary and rebuilding the probe container. The main loop was modified to mimic the need to reload the config just so we can observe the process manager behavior and reload speed.

Main goal was to send SIGHUP to the collector forcing the reload.

Startup

  • Loads Environment Vars
  • Starts Reflector
  • Registers with LLAMA Server
  • Starts Collector
2021/10/29 05:48:48 [LAMOID]: Starting Reflector
2021/10/29 05:48:48 [REFLECTOR-PID]: 12
2021/10/29 05:48:48 [LAMOID-REGISTER]: Registering With LLAMA Server....
2021/10/29 05:48:48 Beginning reflection on: [::]:8100
2021/10/29 05:48:48 [LAMOID-REGISTER]: Registering Process Completed
2021/10/29 05:48:48 [LAMOID-REGISTER]: Response Status: 200 OK
2021/10/29 05:48:58 [LAMOID]: Starting Collector
2021/10/29 05:48:58 [COLLECTOR-PID]: 19

Config Reload

  • Once a configuration change is detected SIGHUP will be sent to the collector process; spawned by go code,
    and will reload. I have no idea if the collector actually reads the new config or just reuses a cached copy.
2021/10/29 05:49:28 [LAMOID-INFO]: New Config - Reloading Collector
2021/10/29 05:49:30 Summarizing results
2021/10/29 05:49:30 Found 6 results to summarize
2021/10/29 05:49:30 Summarization complete
2021/10/29 05:49:38 [LAMOID-INFO]: Updating Registration
2021/10/29 05:49:38 [LAMOID-REGISTER]: Registering With LLAMA Server....
2021/10/29 05:49:38 [LAMOID-REGISTER]: Registering Process Completed
2021/10/29 05:49:38 [LAMOID-REGISTER]: Response Status: 200 OK
2021/10/29 05:49:38 [LAMOID-INFO]: Writing New Config
2021/10/29 05:49:40 Summarizing results
2021/10/29 05:49:40 Found 6 results to summarize
2021/10/29 05:49:40 Summarization complete

Config Compare

Hard to test this without spining up my own API server locally but for the most part its just writing both configs to the local nodes and doing an MD5 sum on the files to check if they match. Based on that a bool is returned to the main grazing loop.

//ValidateConfig - Validates the new and current running config via MD5 Hash.
func (g *LamoidEnv) ValidateConfig() bool {

	g.WriteTempConfig(g.GrazeConfig())

	newConfig := md5.Sum(g.ReadConfig("tmp-config.yaml"))

	currentConfig := md5.Sum(g.ReadConfig("config.yaml"))

	log.Printf("[NEW-CONFIG]: Hash - %s", fmt.Sprint(newConfig))
	log.Printf("[OLD-CONFIG]: Hash - %s", fmt.Sprint(currentConfig))

	os.Remove("tmp-config.yaml")

	return cmp.Equal(newConfig, currentConfig)

}

TODOs:

This is in a usable state but I wanted to point out some improvements that will be submitted in the near future.

  • Refactor HTTP Client usage
  • Clean up functions that don't need to be a method and move them some place else.
  • CLI Flag to control config check interval
  • Unit Testing
  • Documentation of methods and types could be better.
  • Be less comedic with the naming.....

🥂 :shipit: 🦀

@driftingaway86
Copy link
Author

@scline I added retry logic that will prevent the process manager from killing the entire container, now it just waits until it can reach the llama server.

021/11/23 20:25:49 [LAMOID]: Starting Reflector
2021/11/23 20:25:49 [REFLECTOR-PID]: 12
2021/11/23 20:25:49 [LAMOID-INIT]: Waiting for Llama Server....
2021/11/23 20:25:49 [LAMOID-REGISTER]: Performing Registration with LLAMA Server http://llama.packetpals.com:8105/
2021/11/23 20:25:49 Beginning reflection on: [::]:8100
2021/11/23 20:25:49 [LAMOID-REGISTER]: Regestiering Process Completed
2021/11/23 20:25:49 [LAMOID-REGISTER]: Response Status: 200 OK
2021/11/23 20:25:59 [LAMOID]: Starting Collector
2021/11/23 20:25:59 [COLLECTOR-PID]: 19
2021/11/23 20:25:59 Setting up collector
2021/11/23 20:25:59 Loading collector config
2021/11/23 20:25:59 Setting up tag set
2021/11/23 20:25:59 Setting up test runners
2021/11/23 20:25:59 Setting up summarizer
2021/11/23 20:25:59 Setting up 2 result handlers
2021/11/23 20:25:59 Setting up API
2021/11/23 20:25:59 Collector setup complete
2021/11/23 20:25:59 Starting Collector
2021/11/23 20:25:59 All Collector components running
2021/11/23 20:26:00 Starting ticker for Summarizer at 10s intervals
2021/11/23 20:26:10 Summarizing results
2021/11/23 20:26:10 Found 7 results to summarize
2021/11/23 20:26:10 Summarization complete
2021/11/23 20:26:20 Summarizing results
2021/11/23 20:26:20 Found 7 results to summarize
2021/11/23 20:26:20 Summarization complete
2021/11/23 20:26:30 Summarizing results
2021/11/23 20:26:30 Found 7 results to summarize
2021/11/23 20:26:30 Summarization complete
2021/11/23 20:26:40 Summarizing results
2021/11/23 20:26:40 Found 7 results to summarize
2021/11/23 20:26:40 Summarization complete
2021/11/23 20:26:50 Summarizing results
2021/11/23 20:26:50 Found 7 results to summarize
2021/11/23 20:26:50 Summarization complete
2021/11/23 20:26:59 [LAMOID-INFO]: Polling Config
2021/11/23 20:26:59 [NEW-CONFIG]: Hash - [212 29 140 217 143 0 178 4 233 128 9 152 236 248 66 126]
2021/11/23 20:26:59 [OLD-CONFIG]: Hash - [212 29 140 217 143 0 178 4 233 128 9 152 236 248 66 126]
2021/11/23 20:26:59 [LAMOID-REGISTER]: Performing Registration with LLAMA Server http://llama.packetpals.com:8105/
2021/11/23 20:26:59 [LAMOID-REGISTER]: Regestiering Process Completed
2021/11/23 20:26:59 [LAMOID-REGISTER]: Response Status: 200 OK
2021/11/23 20:27:00 Summarizing results
2021/11/23 20:27:00 Found 7 results to summarize
2021/11/23 20:27:00 Summarization complete
2021/11/23 20:27:10 Summarizing results
2021/11/23 20:27:10 Found 7 results to summarize
2021/11/23 20:27:10 Summarization complete
2021/11/23 20:27:20 Summarizing results
2021/11/23 20:27:20 Found 7 results to summarize
2021/11/23 20:27:20 Summarization complete
2021/11/23 20:27:30 Summarizing results
2021/11/23 20:27:30 Found 7 results to summarize
2021/11/23 20:27:30 Summarization complete
2021/11/23 20:27:40 Summarizing results
2021/11/23 20:27:40 Found 7 results to summarize
2021/11/23 20:27:40 Summarization complete
2021/11/23 20:27:50 Summarizing results
2021/11/23 20:27:50 Found 7 results to summarize
2021/11/23 20:27:50 Summarization complete
2021/11/23 20:27:59 [LAMOID-INFO]: Polling Config
2021/11/23 20:28:00 Summarizing results
2021/11/23 20:28:00 Found 7 results to summarize
2021/11/23 20:28:00 Summarization complete
2021/11/23 20:28:04 [LAMOID-CLIENT]: There was a problem making a request to LLAMA Server, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:04 [CONFIG-ERROR]: There was and Error getting the config, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:09 [LAMOID-CLIENT]: There was a problem making a request to LLAMA Server, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:09 [CONFIG-ERROR]: There was and Error getting the config, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:10 Summarizing results
2021/11/23 20:28:10 Found 7 results to summarize
2021/11/23 20:28:10 Summarization complete
2021/11/23 20:28:14 [LAMOID-CLIENT]: There was a problem making a request to LLAMA Server, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:14 [CONFIG-ERROR]: There was and Error getting the config, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:19 [LAMOID-CLIENT]: There was a problem making a request to LLAMA Server, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:19 [CONFIG-ERROR]: There was and Error getting the config, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:20 Summarizing results
2021/11/23 20:28:20 Found 7 results to summarize
2021/11/23 20:28:20 Summarization complete
2021/11/23 20:28:24 [LAMOID-CLIENT]: There was a problem making a request to LLAMA Server, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:24 [CONFIG-ERROR]: There was and Error getting the config, Get "http://llama.packetpals.com:8105/api/v1/config/scline?llamaport=8100": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021/11/23 20:28:24 [NEW-CONFIG]: Hash - [212 29 140 217 143 0 178 4 233 128 9 152 236 248 66 126]
2021/11/23 20:28:24 [OLD-CONFIG]: Hash - [212 29 140 217 143 0 178 4 233 128 9 152 236 248 66 126]
2021/11/23 20:28:24 [LAMOID-REGISTER]: Performing Registration with LLAMA Server http://llama.packetpals.com:8105/
2021/11/23 20:28:24 [LAMOID-REGISTER]: Regestiering Process Completed
2021/11/23 20:28:24 [LAMOID-REGISTER]: Response Status: 200 OK
....
^C

Should be able to see where I killed my local connection to test.

@scline
Copy link
Owner

scline commented Nov 24, 2021

One thing I recently added. When the interval is changed on the server/config we kill -9 the collector. This is because -HUP does not make the collector re-evaluate its interval (bug?). Since the server is the one handing out the interval no changes to the registration is required on the probe's end.

We expose this value via API as well http://<server_host>/api/v1/interval - it just reports a string/intiger of value vs JSON
https://github.com/scline/llama-sd/blob/master/llama-server/src/app.py#L104

Right now I simply grep the config in the entrypoint of the probe to locate the interval:
https://github.com/scline/llama-sd/blob/master/llama-probe/entrypoint.sh#L111
https://github.com/scline/llama-sd/blob/master/llama-probe/entrypoint.sh#L125

@driftingaway86
Copy link
Author

Ahh a hard value to check for!

That makes comparing the config much easier! Let me see if I can cook something up with that, can also kill and restart the collector and not send HUP signal.

@driftingaway86
Copy link
Author

@scline So if the interval changes we can kill the collector and reload the config ?

@scline
Copy link
Owner

scline commented Nov 24, 2021

I think the -HUP vs -9 is a small difference in impact. Honestly, we should be able to always -9 full reload on any configuration change. We should not expect large amounts of probe deregistrations where config changes are rampant.

Likely easier logic that way, what are your thoughts on that?

@driftingaway86
Copy link
Author

Can always add both and force -9 based on another config value. When would you even need to do -9?

@scline
Copy link
Owner

scline commented Nov 25, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants