Skip to content

Make 8.0.0 Core available. New protocol and register removed.#4599

Merged
wu-sheng merged 80 commits intomasterfrom
v8-core
Apr 10, 2020
Merged

Make 8.0.0 Core available. New protocol and register removed.#4599
wu-sheng merged 80 commits intomasterfrom
v8-core

Conversation

@wu-sheng
Copy link
Copy Markdown
Member

This would be a huge pull request, relating to protocol, agent and backend. 8.x would be an new core without back forward compatibility.
After the experiences of removing endpoint_inventory, I found out this
strategy is successful.
Especially, we totally get rid of register, so I want to do more.

SkyWalking 8.0.0
First, it is already unexpected for me, we have to move to 8.0.0 so
quickly, but after the discussion with @hanahmily , and thinking about this for several days, I think
we have to.

The key chances are following

  1. Remove service, service instance, and network address register. The old
    register protocols are totally going to be removed.
  2. The agent doesn't need to do register anymore. Service name and Service
    Instance name are generated by the agent itself, but the extra information,
    such as IP, hostname, language, should report to backend separately.
  3. Service Traffic should be added just like the endpoint traffic but keep
    the time bucket as we need accurate service name in the given duration
  4. Service Instance Traffic should be added too, with external information,
    such as language, hostname.
  5. Trace context propagation context should be changed to accept string in
    service instance name, endpoint name and network address. This could ease
    the agent logic, but also, requires changes in all language agent and
    plugin test tool,
  6. Trace report protocol requires to change too, in order to adopt the
    string.
  7. e2e tests have to ignore PHP and LUA at first, and remove the 6.x
    compatibility test(doesn't support anymore).

The benefits we will get are

  1. Don't worry about the inventory(s) that has been deleted randomly by end
    users. (We received a lot of issue reports about this)
  2. The upgrade could be easier erasing the whole storage and reboot the new
    one. (Users don't feel comfortable about upgrade)
  3. No hot-reboot case in the agent side
  4. No cache of network address register information in the agent.
  5. No service and service instance cache in the OAP
  6. No register lock in the OAP
  7. No file buffer mechanism in the OAP too, same as no register happens.

In my mind, I think this totally break upgrade is super meaningful and will
be good change. Even we break many things, they are easy to follow.
@mrproliu I think by following this, we need
to change the collaboration header to sw8 :) As no 7.1.0 release will
happen.

Link to mail list, https://lists.apache.org/thread.html/rda36fa8d191fc5750fc793993c69e14a917138bf97f9c2a461b72811%40%3Cdev.skywalking.apache.org%3E

@wu-sheng wu-sheng added the core feature Core and important feature. Sometimes, break backwards compatibility. label Mar 31, 2020
@wu-sheng wu-sheng added this to the 8.0.0 milestone Mar 31, 2020
@wu-sheng wu-sheng requested a review from a team March 31, 2020 15:35
…tanceProperties as a new source. 3. Remove Month as downsamping and query step.
@wu-sheng
Copy link
Copy Markdown
Member Author

wu-sheng commented Apr 1, 2020

The codes level key changes are

  1. Set up traffic for service and service instance. Service Inventory and Instance Inventory are removed.
  2. Add ServiceInstanceProperties as a new source.
  3. Remove Month as downsamping and query step.
  4. InfluxDB implementation doesn't depend on H2/MySQL, there is no metadb concept anymore. @dmsolr . I just made InfluxDB compiling passed, you need to add metadata query implementation.
  5. Service Instance Update source is created for properties and heartbeat update.

@wu-sheng
Copy link
Copy Markdown
Member Author

wu-sheng commented Apr 1, 2020

@wu-sheng
Copy link
Copy Markdown
Member Author

wu-sheng commented Apr 4, 2020

@apache/skywalking-committers , I just made the project package passed. Now we could move on the core tests.

  1. I will continue working on local testing including H2, MySQL and ElasticSearch 6.
  2. @dmsolr Please help on agent plugin tests and InfluxDB storage implementation.
  3. @arugal Please help on go2sky project
  4. @heyanlong Please help on PHP project
  5. @ElderJames Please help on .net project
  6. @ascrutae Please help on nodejs project.
  7. @kezhenxu94 Please help on testing and new e2e setup
  8. @mrproliu Please help on backend testing, especially profiling and LUA agent test.
  9. @JaredTan95 Please help on backend testing.
  10. @hanahmily Please run tests on mesh case after the above are done.
  11. @TinyAllen @Fine0830 Please remove the month step support when UI time selector choose more than one month. In this case, still set to days, and Max days should be 60 days. The month step has been removed from the GraphQL.

@kezhenxu94
Copy link
Copy Markdown
Member

Mark it as "Ready for review" as the test cases should be fixed now, and move to next step if anyone will review the code

@kezhenxu94 kezhenxu94 marked this pull request as ready for review April 9, 2020 11:04
@wu-sheng wu-sheng changed the title [WIP] 8.0.0 Core Changes Make 8.0.0 Core available. New protocol and register removed. Apr 9, 2020
exitSpan = parentSpan;
} else {
final int parentSpanId = parentSpan == null ? -1 : parentSpan.getSpanId();
if (StringUtil.isEmpty(remotePeer)) {
Copy link
Copy Markdown
Member Author

@wu-sheng wu-sheng Apr 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletion actually is not related to this PR. The lazy peer initialization is another feature :) @kezhenxu94

@wu-sheng
Copy link
Copy Markdown
Member Author

wu-sheng commented Apr 9, 2020

Summary

This pull request is open for 10 days, and I worked on this 2 weeks ago. Thanks everyone to help making this PR and tests work now.

Here are the list of changes we made in this pull request. Top 3 are the principle changes, others are following these principles.

  1. New agent and mesh report protocol.
  2. New agent header protocol.
  3. Service register, instance register and network address register have been removed permanently.
  4. Service traffic, instance traffic and network alias metrics are added to replace the service, instance and network address inventory.
  5. Register process has been removed.
  6. Metrics stream process supports insert only mode, especially for traffic entities.
  7. Metrics stream process supports no-downsampling mode for traffic entities and network alias.
  8. Remove all register mechanism and cache in the java agent.
  9. Remove MONTH step in GraphQL query.
  10. Update UI to remove MONTH step query, the max query range is 60 days now.
  11. Simplify the TTL to metrics and record. And the unit has been formatted in Day unit. No specific TTL for ElasticSearch storage.
  12. Buffer mechanism of trace receiver and mesh receiver has been removed due to no register.
  13. New service id, instance id and endpoint id rules, including service relation, instance relation and endpoint relation id rules.
  14. Java agent support keep tracing mode, meaning, agent generating tracing context even the backend is unconnected/unavailable.
  15. Plugin test tool up to date, in order to support new protocol.
  16. Plugin tests expected data files updated.
  17. E2E tests updated.
  18. Telemetry of Grafana config has been merged into one.
  19. Documentation updates.
  20. [TBD] InfluxDB storage implementation is not available, need @dmsolr to fix later, in order to reduce the master change block by this PR.

FYI @apache/skywalking-committers

mrproliu
mrproliu previously approved these changes Apr 9, 2020
Copy link
Copy Markdown
Contributor

@mrproliu mrproliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a test with the profile and alarm module, it works well. And test Nginx agent on the local environment, it works well too.

dmsolr
dmsolr previously approved these changes Apr 9, 2020
Copy link
Copy Markdown
Member

@dmsolr dmsolr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested java agent and agent plugin. It looks good to me.

@wu-sheng
Copy link
Copy Markdown
Member Author

@kezhenxu94 Are you going to do code level review?

@kezhenxu94
Copy link
Copy Markdown
Member

@kezhenxu94 Are you going to do code level review?

I can do another round of review recently, but that’s not a blocker to merge this, it looks good to me generally, just merge it if necessary

@wu-sheng
Copy link
Copy Markdown
Member Author

@kezhenxu94 I am adjusting some documentations, after that, I will merge this.

@wu-sheng wu-sheng dismissed stale reviews from dmsolr and mrproliu via 0b637a3 April 10, 2020 02:09
kezhenxu94
kezhenxu94 previously approved these changes Apr 10, 2020
Copy link
Copy Markdown
Contributor

@hanahmily hanahmily left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@heyanlong heyanlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHP test success

Copy link
Copy Markdown
Member

@rainbend rainbend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go2sky test success

@wu-sheng wu-sheng merged commit 6fe2041 into master Apr 10, 2020
@wu-sheng wu-sheng deleted the v8-core branch April 10, 2020 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core feature Core and important feature. Sometimes, break backwards compatibility.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants