2.3.1 Release Proposal by janl · Pull Request #1908 · apache/couchdb

janl · 2019-02-07T10:25:11Z

Mainly to get a CI build status for this set of cherry-picked commits between 2.3.0 and master.

This avoids needlessly making cross-cluster fabric:update_docs(Db, [], Opts) calls.

- fix function_clause error on invalid DB security objects when the request body of PUT db/_security endpoint is not a correct json format Closes #1384

Previously `end_time` was generated converting the start_time to universal, then passing that to `httpd_util:rfc1123_date/1`. However, `rfc1123_date/1` also transates its argument from local to UTC time, that is it accepts input to be in local time format. Fixes #1841

There was a subtle bug when opening specific revisions in fabric_doc_open_revs due to a race condition between updates being applied across a cluster. The underlying cause here was due to the stemming after a document had been updated more than revs_limit number of times along with concurrent reads to a node that had not yet made the update. To illustrate lets consider a document A which has a revision history from `{N, RevN}` to `{N+1000, RevN+1000}` (assuming revs_limit is the default 1000). If we consider a single node perspective when an update comes in we added the new revision and stem the oldest revision. The docs the revisions on the node would be `{N+1, RevN+1}` to `{N+1001, RevN+1001}`. The bug exists when we attempt to open revisions on a different node that has yet to apply the new update. In this case when fabric_doc_open_revs could be called with `{N+1000, RevN+1000}`. This results in a response from fabric_doc_open_revs that includes two different `{ok, Doc}` results instead of the expected one instance. The reason for this is that one document has revisions `{N+1, RevN+1}` to `{N+1000, RevN+1000}` from the node that has applied the update, while the node without the update responds with revisions `{N, RevN}` to {N+1000, RevN+1000}`. To rephrase that, a node that has applied an update can end up returning a revision path that contains `revs_limit - 1` revisions while a node wihtout the update returns all `revs_limit` revisions. This slight change in the path prevented the responses from being properly combined into a single response. This bug has existed for many years. However, read repair effectively prevents it from being a significant issue by immediately fixing the revision history discrepancy. This was discovered due to the recent bug in read repair during a mixed cluster upgrade to a release including clustered purge. In this situation we end up crashing the design document cache which then leads to all of the design document requests being direct reads which can end up causing cluster nodes to OOM and die. The conditions require a significant number of design document edits coupled with already significant load to those modified design documents. The most direct example observed was a clustered that had a significant number of filtered replications in and out of the cluster.

COUCHDB-3226

This server admin-only endpoint forces an n-way sync of all shards across all nodes on which they are hosted. This can be useful for an administrator adding a new node to the cluster, after updating _dbs so that the new node hosts an existing db with content, to force the new node to sync all of that db's shards. Users may want to bump their `[mem3] sync_concurrency` value to a larger figure for the duration of the shards sync. Closes #1807

It has a fix to revert user socket buffer size to 8192 and also allow setting this buffer values directly (not necessarily via {recbuf, ...}). Fixes #1810 Warning: 2.19.0 blacklists a series of OTP releases: 21.2, 21.2.1, 21.2.2 This is done via a runtime check of the ssl application version. The blacklist seems valid as there is a bug which prevents data from being delivered on TSL sockets. That could affect either CouchDB server side (chttpd) or replication client side (ibrowse).

This restrict _purge and _purged_infos_limit to server admin in terms of the security level required to run them. Fixes #1799

This commit introduces a new option `snooze_period_ms` (measured in milliseconds), and deprecates `snooze_period` while still supporting it for obvious legacy reasons.

The Makefile target builds a python3 venv at .venv and installs black if possible. Since black is Python 3.6 and up only, we skip the check on systems with an older Python 3.x.

Closes #1053

janl · 2019-02-07T10:25:38Z

needs apache/couchdb-documentation#392

janl · 2019-02-07T10:29:39Z

and apache/couchdb-fauxton#1180

wohali · 2019-02-07T21:43:42Z

Hi @janl ,

c5d9cfe (#1766) <-- you missed this one
(you already pulled ~~33e3625~~)

Would you consider also cherry-picking these minor fixes? These are all small in scope but moderate in covering some corner cases, especially upgrade-related, so I think they'd be good fits for a patch release.

c68863a (#1808)
c6b095b (#1860, fixes mixed-cluster situation)
17f05b7 (#1874, reported by user in Slack/IRC)

#1794 which is:

#1824 which is:

janl · 2019-02-08T10:07:28Z

Heya @wohali most of these are in. Can you double check and dedupe? :)

wohali · 2019-02-08T17:41:54Z

@janl updated to dedupe, sorry about that, was going off the wrong info

janl · 2019-02-08T19:13:54Z

@wohali thanks for the dedupe, sorry it wasn’t clearer what was included already.

c6b095b (#1860, fixes mixed-cluster situation)

I may have read things wrong when skimming, but I assumed this was only relevant past the partitioned databases commit. Happy to reconsider if this is generally useful cc @davisp.

17f05b7 (#1874, reported by user in Slack/IRC)

I had ruled dep ups for other than critical things to be out of scope for a .1, but on reread I do agree we should do this one.

#1794 which is: …

felt a bit risqué for a .1, but happy to include.

#1824 which is: …

Also thought this was moving around things too much for a .1, but am happy to be convinced otherwise (cc @nickva)

Any I didn’t comment on I agree on adding. Will do so over the weekend while Fauxton gets into shape.

wohali · 2019-02-08T21:02:42Z

@jan thanks. The other we should think about is #1803, but @jaydoane may need help.

nickva · 2019-02-08T21:59:32Z

@janl

#1824 which is: …
Also thought this was moving around things too much for a .1, but am happy to be convinced otherwise (cc @nickva)

Most changes in the PR was a code move to copy streams logic to its own fabric module in a separate commit:

19048fd

The main logic was here:

41757cd

I think it would mostly affect larger clusters with many requests timing out and being cleaned up improperly so they'd leak their rexi workers. The other ones affect might be smaller embedded system with restrictive resource (low max_dbs_open value). But maybe not as critical for average CouchDB deployments and it's a bug that's been there for years, so I can see keeping it back to reduce the .1 commit set.

Oh and thank you for helping with 2.3.1!

This enables backwards compatbility with nodes still running the old version of fabric_rpc when a cluster is upgraded to master. This has no effect once all nodes are upgraded to the latest version.

This fixes inability to set keys with regex symbols in them

This adds an API call for looking up a single design doc regardless of whether the database is clustered or not.

janl · 2019-02-12T11:37:42Z

#1766 and #1824 end up being no trivial merges, so I’ll leave those out for now.

I’ve added everything else.

The underlying clustered _all_docs call can cause significant extra load during compaction.

janl · 2019-02-12T11:47:04Z

#1803 doesn’t look ready yet. cc @jaydoane @iilyak

I won’t have time to review it, but if it lands in master in the next ~48 hours, I can hold 2.3.1 until then.

jaydoane · 2019-02-12T23:00:25Z

@janl #1803 has landed in master

This ensures that admin password hashes are the same on all nodes when passwords are set directly on each node rather than through the coordinator node.

janl · 2019-02-17T14:29:20Z

@jaydoane merged, thanks!

nickva and others added 19 commits February 7, 2019 10:49

Filter out empty missing_revs results in mem3_rep

087d272

This avoids needlessly making cross-cluster fabric:update_docs(Db, [], Opts) calls.

Fix function_clause error

53dca95

- fix function_clause error on invalid DB security objects when the request body of PUT db/_security endpoint is not a correct json format Closes #1384

Support one purge request with more than 100 docid

c95a40d

COUCHDB-3226

Fix timeout in chttpd_purge_tests

d01dbc4

add default fabric request timeouts

bec41ac

restrict _purge to server admin

eb676ff

This restrict _purge and _purged_infos_limit to server admin in terms of the security level required to run them. Fixes #1799

Compaction: Add snooze_period_ms for finer tuning (#1880)

d681748

This commit introduces a new option `snooze_period_ms` (measured in milliseconds), and deprecates `snooze_period` while still supporting it for obvious legacy reasons.

Fix badarg crash on invalid rev for individual doc update

d06641b

Fix from_json_obj_validate crash when provided rev isn't a valid hex

bc31155

dev/run: do not create needless dev/data/ dir

aeb7772

Format and check all code using python black (#1776)

7c1ed3e

The Makefile target builds a python3 venv at .venv and installs black if possible. Since black is Python 3.6 and up only, we skip the check on systems with an older Python 3.x.

Fix python2 compatibility for couchup (#1868)

6d82d96

Closes #1053

fix couchup for python3 (#1905)

008ce52

Blacklist known bad Erlang releases, fixes #1857 (#1871)

9a8f0cc

run formatting check before time-consuming tests

f9fde69

drop Erlang 17 from travis

5ffd8b7

davisp and others added 2 commits February 12, 2019 12:36

Fix read repair in a mixed cluster environment

50bb9cc

This enables backwards compatbility with nodes still running the old version of fabric_rpc when a cluster is upgraded to master. This has no effect once all nodes are upgraded to the latest version.

Update config dependency to 2.1.5

18b59cb

This fixes inability to set keys with regex symbols in them

Add couch_db:get_design_doc/2

64d026a

This adds an API call for looking up a single design doc regardless of whether the database is clustered or not.

Avoid calls to fabric:design_docs/1

fc27f75

The underlying clustered _all_docs call can cause significant extra load during compaction.

update Fauxton to 1.1.20

8027f6a

Sync admin password hashes at cluster setup finish

bb99f30

This ensures that admin password hashes are the same on all nodes when passwords are set directly on each node rather than through the coordinator node.

janl merged commit d8c29c4 into 2.3.x Feb 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2.3.1 Release Proposal#1908

2.3.1 Release Proposal#1908
janl merged 26 commits into2.3.xfrom
2.3.1-draft

janl commented Feb 7, 2019

Uh oh!

janl commented Feb 7, 2019

Uh oh!

janl commented Feb 7, 2019

Uh oh!

wohali commented Feb 7, 2019 •

edited

Loading

Uh oh!

janl commented Feb 8, 2019

Uh oh!

wohali commented Feb 8, 2019

Uh oh!

janl commented Feb 8, 2019

Uh oh!

wohali commented Feb 8, 2019

Uh oh!

nickva commented Feb 8, 2019

Uh oh!

janl commented Feb 12, 2019

Uh oh!

janl commented Feb 12, 2019

Uh oh!

jaydoane commented Feb 12, 2019

Uh oh!

janl commented Feb 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Conversation

janl commented Feb 7, 2019

Uh oh!

janl commented Feb 7, 2019

Uh oh!

janl commented Feb 7, 2019

Uh oh!

wohali commented Feb 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janl commented Feb 8, 2019

Uh oh!

wohali commented Feb 8, 2019

Uh oh!

janl commented Feb 8, 2019

Uh oh!

wohali commented Feb 8, 2019

Uh oh!

nickva commented Feb 8, 2019

Uh oh!

janl commented Feb 12, 2019

Uh oh!

janl commented Feb 12, 2019

Uh oh!

jaydoane commented Feb 12, 2019

Uh oh!

janl commented Feb 17, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

wohali commented Feb 7, 2019 •

edited

Loading