Some fixes and tests for spaces/non-ASCII chars in datasource names#6761
Some fixes and tests for spaces/non-ASCII chars in datasource names#6761fjy merged 7 commits intoapache:masterfrom
Conversation
543bf1b to
578f86b
Compare
f4f1728 to
6e45599
Compare
9950b00 to
6f0f16a
Compare
|
|
||
| try { | ||
| // application/x-www-form-urlencoded encodes spaces as "+", but we use this to encode non-form | ||
| // data as well, so replace "+" with "%20". |
There was a problem hiding this comment.
nit: add this comment to javadoc for this method.
There was a problem hiding this comment.
Added this to method javadoc
| Assert.assertEquals(s1, "aaa%20bbb"); | ||
|
|
||
| String s2 = StringUtils.urlEncode("fff+ggg"); | ||
| Assert.assertEquals(s2, "fff%2Bggg"); |
There was a problem hiding this comment.
also do a decode and verify that the original string is read back ?
| "namespace", | ||
| "page", | ||
| "regionIsoCode", | ||
| "regionName", |
There was a problem hiding this comment.
why remove these dimensions ?
There was a problem hiding this comment.
The full story behind this change is:
- This file is used only by
NestedQueryPushDownTest - This test wasn't previously being run by Travis (see the change in
ci/travis_script_integration_part2.sh) - When I started running the test locally and in travis, I was getting test timeout errors because ingesting this task's data took too long, so I shrunk the input data by removing columns
- The tests run group by (channel, user) queries, so I kept the "page" dimension in there to preserve some level of query time aggregation
| @Override | ||
| public String getExtraDatasourceNameSuffix() | ||
| { | ||
| return extraDatasourceNameSuffix; |
There was a problem hiding this comment.
Do you have a use case to make the suffix configurable ?
If not we can just have the suffix in the test datasource name itself, i believe it would simplify the changes. and each test can choose its own datasource name format.
There was a problem hiding this comment.
My thinking was that the non-ASCII characters could make the tests fail for people who do not have the proper locales set up on their system. I had to tweak locale settings on the ubuntu containers used in the integration tests (IT setup uses a shared folder too so the host machine needs to be configured as well), and people may not have any need for such characters in their own development/use cases, so I felt it would be nice to have a way to disable the use of such characters in the tests.
Another reason for making it a "global" thing like this is because I wanted to easily test support across all the use cases being tested in the IT suite (e.g., if I wanted to test support for some other characters across the board, I can just edit this property instead of changing each test individually)
|
@nishantmonu51 Merged with master and fixed conflicts, can you take another look? |
This PR fixes URL encoding issues found in some internal API accesses when datasource names contain spaces or non-ASCII characters, and changes the indexing integration tests to use such datasource names.
Not fixed in this PR:
There may be other issues beyond what I mentioned above.