Some fixes and tests for spaces/non-ASCII chars in datasource names by jon-wei · Pull Request #6761 · apache/druid

jon-wei · 2018-12-20T00:44:00Z

This PR fixes URL encoding issues found in some internal API accesses when datasource names contain spaces or non-ASCII characters, and changes the indexing integration tests to use such datasource names.

Not fixed in this PR:

The "indexer" tab in the coordinator console has issues with such datasource names, I am not familiar with that code and haven't looked into it much
S3 and HDFS deep storage have some URL encoding issues as well, I am working on another patch that will address those and add some tests for those situations

There may be other issues beyond what I mentioned above.

nishantmonu51 · 2019-01-02T07:01:16Z

+
+    try {
+      // application/x-www-form-urlencoded encodes spaces as "+", but we use this to encode non-form
+      // data as well, so replace "+" with "%20".


nit: add this comment to javadoc for this method.

Added this to method javadoc

nishantmonu51 · 2019-01-02T07:04:10Z

+    Assert.assertEquals(s1, "aaa%20bbb");
+
+    String s2 = StringUtils.urlEncode("fff+ggg");
+    Assert.assertEquals(s2, "fff%2Bggg");


also do a decode and verify that the original string is read back ?

Added decode checks

nishantmonu51 · 2019-01-02T07:10:15Z

-              "namespace",
              "page",
-              "regionIsoCode",
-              "regionName",


why remove these dimensions ?

The full story behind this change is:

This file is used only by NestedQueryPushDownTest

This test wasn't previously being run by Travis (see the change in ci/travis_script_integration_part2.sh)

When I started running the test locally and in travis, I was getting test timeout errors because ingesting this task's data took too long, so I shrunk the input data by removing columns

The tests run group by (channel, user) queries, so I kept the "page" dimension in there to preserve some level of query time aggregation

nishantmonu51 · 2019-01-02T07:13:12Z

+      @Override
+      public String getExtraDatasourceNameSuffix()
+      {
+        return extraDatasourceNameSuffix;


Do you have a use case to make the suffix configurable ?
If not we can just have the suffix in the test datasource name itself, i believe it would simplify the changes. and each test can choose its own datasource name format.

My thinking was that the non-ASCII characters could make the tests fail for people who do not have the proper locales set up on their system. I had to tweak locale settings on the ubuntu containers used in the integration tests (IT setup uses a shared folder too so the host machine needs to be configured as well), and people may not have any need for such characters in their own development/use cases, so I felt it would be nice to have a way to disable the use of such characters in the tests.

Another reason for making it a "global" thing like this is because I wanted to easily test support across all the use cases being tested in the IT suite (e.g., if I wanted to test support for some other characters across the board, I can just edit this property instead of changing each test individually)

jon-wei · 2019-01-09T00:23:46Z

@nishantmonu51 Merged with master and fixed conflicts, can you take another look?

clintropolis

LGTM 👍

jon-wei added the Bug label Dec 20, 2018

jon-wei force-pushed the dsname-kafka branch 2 times, most recently from 543bf1b to 578f86b Compare December 20, 2018 00:51

jon-wei added 2 commits December 19, 2018 18:52

Fixes and tests for spaces/non-ASCII datasource names

72a3e67

Some unit test fixes

6e45599

jon-wei force-pushed the dsname-kafka branch from f4f1728 to 6e45599 Compare December 20, 2018 03:11

Fix ITRealtimeIndexTaskTest

6f0f16a

jon-wei force-pushed the dsname-kafka branch from 9950b00 to 6f0f16a Compare December 20, 2018 20:10

jon-wei added 2 commits December 20, 2018 12:40

Checkstyle

be39228

TeamCity

0877e4a

gianm assigned clintropolis Dec 20, 2018

fjy added this to the 0.14.0 milestone Dec 21, 2018

ciukstar mentioned this pull request Dec 21, 2018

Kafka Indexing Service - Issue when data-source name is in Cyrillic #6718

Closed

nishantmonu51 reviewed Jan 2, 2019

View reviewed changes

jon-wei added 2 commits January 2, 2019 19:21

PR comments

830b6c1

Merge master

2206779

clintropolis approved these changes Jan 9, 2019

View reviewed changes

fjy merged commit 8537a77 into apache:master Jan 15, 2019

jihoonson mentioned this pull request Apr 20, 2019

Fix encoded taskId check in chatHandlerResource #7520

Merged

This was referenced Nov 26, 2019

ensure chill URI escaping is done in all the places #8941

Open

S3 input source #8903

Merged

jon-wei mentioned this pull request Apr 29, 2020

Segments cannot be loaded from HDFS deep storage when datasource name has special characters #9788

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some fixes and tests for spaces/non-ASCII chars in datasource names#6761

Some fixes and tests for spaces/non-ASCII chars in datasource names#6761
fjy merged 7 commits intoapache:masterfrom
jon-wei:dsname-kafka

jon-wei commented Dec 20, 2018 •

edited

Loading

Uh oh!

nishantmonu51 Jan 2, 2019

Uh oh!

jon-wei Jan 3, 2019

Uh oh!

nishantmonu51 Jan 2, 2019

Uh oh!

jon-wei Jan 3, 2019

Uh oh!

nishantmonu51 Jan 2, 2019

Uh oh!

jon-wei Jan 3, 2019

Uh oh!

nishantmonu51 Jan 2, 2019

Uh oh!

jon-wei Jan 3, 2019 •

edited

Loading

Uh oh!

jon-wei commented Jan 9, 2019

Uh oh!

clintropolis left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jon-wei commented Dec 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nishantmonu51 Jan 2, 2019

Choose a reason for hiding this comment

Uh oh!

jon-wei Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 Jan 2, 2019

Choose a reason for hiding this comment

Uh oh!

jon-wei Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 Jan 2, 2019

Choose a reason for hiding this comment

Uh oh!

jon-wei Jan 3, 2019

Choose a reason for hiding this comment

Uh oh!

nishantmonu51 Jan 2, 2019

Choose a reason for hiding this comment

Uh oh!

jon-wei Jan 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jon-wei commented Jan 9, 2019

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jon-wei commented Dec 20, 2018 •

edited

Loading

jon-wei Jan 3, 2019 •

edited

Loading