Skip to content

Replace rapidjson with nlohmann/json#14467

Merged
mattklein123 merged 25 commits intoenvoyproxy:mainfrom
asraa:cleanup-json
Feb 14, 2021
Merged

Replace rapidjson with nlohmann/json#14467
mattklein123 merged 25 commits intoenvoyproxy:mainfrom
asraa:cleanup-json

Conversation

@asraa
Copy link
Contributor

@asraa asraa commented Dec 17, 2020

Commit Message: Replace rapidjson with nlohmann/json
Additional Description:

  • Schema validation is not supported by the core nlohmann/json library. They endorse an option here json schema validator nlohmann/json#305 but schema validation is not used anywhere in Envoy core.
  • Added runtime option envoy.reloadable_features.remove_legacy_json. It is disabled by default. I will enable the new parser in a separate PR.
  • There is one usage in core to parse metadata in the HeaderFormatter:
    Json::ObjectSharedPtr parsed_params = Json::Factory::loadFromString(std::string(json));
    This could be replaced pretty easily with protobuf's parser in a later PR
  • Originally I had an implementation that shared the SAX parser between the two implementations, but RapidJson and nlohmann/json have to handle tracking line numbers differently so I split them. (RapidJSON makes it easy to subclass the stream implementation, but nlohmann/json doesn't.)
  • The major difference between the two json_internal.cc implementations is the custom stream class (see json_loader.cc line 243-280. It has a reference to the handler to update line numbers)

Risk Level: Low for core, maybe medium for those using JSON on the dataplane.
Release Notes: Added release notes about runtime feature
Testing:

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Dec 17, 2020
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #14467 was opened by asraa.

see: more, trace.

Signed-off-by: Asra Ali <asraa@google.com>
@alyssawilk
Copy link
Contributor

@asraa I'll let you tag a reviewer once you've got CI sorted, SG?

@asraa
Copy link
Contributor Author

asraa commented Dec 17, 2020

Yep! This is still in DRAFT mode, I'll move it out of draft mode when I finish some test TODOs (benchmarking)

Comment on lines +448 to +461
com_github_nlohmann_json = dict(
project_name = "nlohmann JSON",
project_desc = "Fast JSON parser/generator for C++",
project_url = "https://nlohmann.github.io/json",
version = "3.9.1",
sha256 = "4cf0df69731494668bdd6460ed8cb269b68de9c19ad8c27abc24cd72605b2d5b",
strip_prefix = "json-{version}",
urls = ["https://github.com/nlohmann/json/archive/v{version}.tar.gz"],
# This will be a replacement for rapidJSON used in extensions and may also be a fast
# replacement for protobuf JSON.
use_category = ["controlplane", "dataplane_core"],
release_date = "2020-07-06",
cpe = "cpe:2.3:a:json_project:json:*",
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on keeping it as a runtime switchable options in extensions -- although I am down to remove :)

@jmarantz
Copy link
Contributor

/wait

strip_prefix = "json-{version}",
urls = ["https://github.com/nlohmann/json/archive/v{version}.tar.gz"],
# This will be a replacement for rapidJSON used in extensions and may also be a fast
# replacement for protobuf JSON.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

asraa added 4 commits January 5, 2021 10:17
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
asraa added 5 commits January 7, 2021 14:37
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Jan 11, 2021

There is an existing clang-tidy issue about a potential memory leak in the constructor of rapidjson::SchemaDoument

rapidjson::SchemaDocument schema_document_for_validator(schema_document);

I couldn't find related issues, tests in rapidjson support this construction. Im not sure I want to spend too much time on a rapidjson issue

external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1546:39: error: Potential memory leak [clang-analyzer-unix.Malloc,-warnings-as-errors]
        CreateSchemaRecursive(&root_, PointerType(), document, document);
                                      ^
source/common/json/json_internal_legacy.cc:511:3: note: Taking false branch
  if (schema_document.Parse<0>(schema.c_str()).HasParseError()) {
  ^
source/common/json/json_internal_legacy.cc:517:29: note: Calling constructor for 'GenericSchemaDocument<rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >, rapidjson::CrtAllocator>'
  rapidjson::SchemaDocument schema_document_for_validator(schema_document);
                            ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1535:14: note: Field 'allocator_' is null
        if (!allocator_)
             ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1535:9: note: Taking true branch
        if (!allocator_)
        ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1539:24: note: 'uri' is null
        uri_.SetString(uri ? uri : noUri, uriLength, *allocator_);
                       ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1539:24: note: '?' condition is false
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1541:46: note: Calling 'CrtAllocator::Malloc'
        typeless_ = static_cast<SchemaType*>(allocator_->Malloc(sizeof(SchemaType)));
                                             ^
external/com_github_tencent_rapidjson/include/rapidjson/internal/../allocators.h:79:13: note: 'size' is 336
        if (size) //  behavior of malloc(0) is implementation defined.
            ^
external/com_github_tencent_rapidjson/include/rapidjson/internal/../allocators.h:79:9: note: Taking true branch
        if (size) //  behavior of malloc(0) is implementation defined.
        ^
external/com_github_tencent_rapidjson/include/rapidjson/internal/../allocators.h:80:20: note: Memory is allocated
            return std::malloc(size);
                   ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1541:46: note: Returned allocated memory
        typeless_ = static_cast<SchemaType*>(allocator_->Malloc(sizeof(SchemaType)));
                                             ^
external/com_github_tencent_rapidjson/include/rapidjson/schema.h:1546:39: note: Potential memory leak
        CreateSchemaRecursive(&root_, PointerType(), document, document);
                                      ^

@asraa asraa marked this pull request as ready for review January 11, 2021 15:22
asraa added 5 commits January 12, 2021 10:03
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Jan 13, 2021

Only failing existing clang-tidy. Would appreciate any comment! (especially around the new way of holding line_number_)

Base automatically changed from master to main January 15, 2021 23:02
asraa added 2 commits January 20, 2021 10:19
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Jan 20, 2021

Would it be helpful if I displayed the diff between the old/new json loader?

The full diff is here https://gist.github.com/asraa/67c51ac7ab48570e471a744fb4c02068

The main differences are:

  • renaming: SAX callbacks are named in nlohmann as start_object instead of StartObject and have slightly different signatures (all unused params though), asRapidJsonDocument is renamed to asJsonDocument.
  • The new Json ObjectHandler holds on directly to an updating line_number_ instead of holding on to a LineCountingStringStream with a line number member
  • The new custom JSON "stream" that counts line numbers is implemented as a custom container that increments line numbers whenever the character at a position is a newline
  • nlohmann/json had no API to push back items onto a json document with an allocator parameter
  • No schema validation
  • Slight differences in error messaging formatting. nlohmann/json will also give you the token at the error.

asraa added 2 commits January 20, 2021 10:30
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Jan 25, 2021

ping @jmarantz this is ready.
clang-tidy will fail on an existing rapidJSON issue (see #14467 (comment)).

@@ -0,0 +1,668 @@
#include "common/json/json_internal.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: this file is all new code, and so needs to be reviewed line-by-line? This is not a copy of of something existing?

Copy link
Contributor Author

@asraa asraa Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, the diff is above (between the legacy and this one). They have some major differences (listed above) that made it difficult to inherit from the same SAX callbacks.

@@ -0,0 +1,22 @@
#pragma once

#include <list>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is not used in this header.

@@ -0,0 +1,22 @@
#pragma once

#include <list>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list is not used in this file.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore the style nits if this is not really code you edited.

Basically this looks good. Will need main merge.

* dispatcher: supports a stack of `Envoy::ScopeTrackedObject` instead of a single tracked object. This will allow Envoy to dump more debug information on crash.
* http: added support for :ref:`:ref:`preconnecting <envoy_v3_api_msg_config.cluster.v3.Cluster.PreconnectPolicy>`. Preconnecting is off by default, but recommended for clusters serving latency-sensitive traffic, especially if using HTTP/1.1.
* http: change frame flood and abuse checks to the upstream HTTP/2 codec to ON by default. It can be disabled by setting the `envoy.reloadable_features.upstream_http2_flood_checks` runtime key to false.
* json: introduced new JSON parser (https://github.com/nlohmann/json) to replace RapidJSON. The new parser is enabled by default. To revert to the legacy RapidJSON parser, enable the runtime feature `envoy.reloadable_features.legacy_json`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making the initial check-in leave the existing default, and then do another PR to flip the default. That way if a problem is discovered, we won't need to roll back the whole PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like that idea very much. done

// Extract position information (line, column, token) in the error.
auto start = error.find("line");
auto npos = error.find(":");
error_position_ = absl::StrCat(error.substr(start, npos - start), ", token ", token);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to see some checking of these find() results against std::npos or absl::npos. And while we're at it, let's not also have a local called npos :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, i added a reference, there should always be a ": error string" but I found in the documentation that position information may not be present, so that's handled now.

Thank you!

checkType(Type::Object);
auto value_itr = value_.object_value_.find(name);
if (value_itr == value_.object_value_.end() || !value_itr->second->isType(Type::Boolean)) {
throw Exception(fmt::format("key '{}' missing or not a boolean from lines {}-{}", name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's our general strategy to avoid this occurring during request processing?

Copy link
Contributor Author

@asraa asraa Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one usage of loadFromString in the core dataplane code here, as noted in the description I will swap this with protobuf's json parser in a future PR

There is one usage in core to parse metadata in the HeaderFormatter:

These get[String/Boolean] methods that can throw are largely unused, but there are <10 usages in filters that I will replace with those with defaults that don't throw in that PR.
#14893

auto value_itr = value_.object_value_.find(name);
if (value_itr != value_.object_value_.end()) {
return getDouble(name);
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we prefer the form

if (exceptional-condition) {
  return exceptional-value;
}
return normal value;

But I can't find a style-guide ref for that. THere's a few different instances of that in this file. Field::getDouble() above is an example where you are doing it the way I think we prefer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing style, fixed in the legacy code as well.

Signed-off-by: Asra Ali <asraa@google.com>
asraa added 2 commits February 1, 2021 12:47
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Feb 9, 2021

@envoyproxy/senior-maintainers Could you PTAL? This adds the code for a new json parser, does not flip the switch.

@mattklein123 mattklein123 self-assigned this Feb 14, 2021
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a high level. Very excited to see this moving forward!

@mattklein123 mattklein123 merged commit 280654b into envoyproxy:main Feb 14, 2021
alyssawilk added a commit that referenced this pull request Nov 8, 2021
We added the code to switch json libraries over in #14467
The removal is tracked by #4705 and smoke tested (#18451)

Risk Level: Medium for folks using json
Testing: n/a
Docs Changes: n/a
Release Notes: inline
Fixes #18451
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove RapidJSON from Envoy

7 participants