Improve simplebridge data plane performance by sebymiano · Pull Request #56 · polycube-network/polycube

sebymiano · 2019-02-04T17:50:06Z

This PR introduces a huge performance increase in the Simplebridge service by removing some bottlenecks from the data path such as the call to the bpf_ktime_get_ns and the bpf_map_update used to update the timestamp.

In fact, after some tests, I discovered that this function introduces a huge performance degradation (more evident in the multi-core tests due the lock required by the update) while updating the value directly after the lookup does not impact on the overall performance.
With this commit, the multicore throughput (during forwarding) of the bridge increased from 3.5Mpps (with 64byte packets) to ~11.5Mpps (with XDP and the redirect_map changes proposed in the PR #52.

mauriciovasquezbernal · 2019-02-05T03:01:24Z

Thanks @sebymiano for this interesting PR.

It is interesting to notice that in the previous version we were using the bpf_map_update helper in all the cases, even when there was an entry present for that MAC address. hash maps are designed to be very fast on read, there are not locks in this case, but for write locks are needed, so this is one of the reasons why performance was not scaling on multicore systems.

I still have a doubt about possible race conditions, in the previous approach there were not race conditions because the bpf_map_update allocates a new node, copies the data to it and then using rcu makes it visible to the readers, in other words, concurrent readers will not see a partially updated version of the map entry.
In this version, the writer and the reader are operating on the same element (a pointer return by bpf_map_lookup), is this possible that a reader sees a partially updated version of the timestamp?, in other words, are 64bits stores atomic on ebpf?

Besides that, I think a 64 bits counter is not needed, a 32 bits counter will last for ~100 years and whole value structure could fit on the same cache line. Another question is, should we move something of this logic to polycubed?, have a single thread that updates a map that can be shared by all cubes?, I don't want to end with a solution with N different threads only for updating a counter.
Maybe we can provide a wrapper for this in polycube, a kind of pcn_get_timestamp().

src/services/pcn-simplebridge/src/Simplebridge.cpp

mauriciovasquezbernal · 2019-02-05T02:00:34Z

src/services/pcn-simplebridge/src/Simplebridge_dp.c

 } __attribute__((packed));

 BPF_TABLE("hash", __be64, struct fwd_entry, fwdtable, 1024);
+BPF_TABLE_SHARED("percpu_array", int, uint64_t, timestamp, 1);


Does this table need to be SHARED?

Could it be just a standard array?

We are not updating from datapath, so there should not be advantages on using a percpu map.

What about, if a control plane thread updates timestamp every second.
During the update lock will be per-cpu, not global.

But is also true that control plane thread updates all cpus values at the same time, by locking them all.
And it could be also matter of cache efficiency: if we have just an array instead of a percpu array.

I think it could be interesting to test if there is a performance difference with percpu/non-percpu versions.

In the particular case of arrays there are not locks on the update.

AFAIK percpu maps are important when the datapath updates elements, for example a counter, each cpu has its own and can update it without using atomic or synchronized operations. When the map is only read by datapath there is not any difference.

src/services/pcn-simplebridge/src/Simplebridge_dp.c

mbertrone

Thanks Sebastiano for this PR!

It is interesting to notice the cost of hash table update, especially on multicore architecture, where most of the time is spent in a stale status waiting for the lock on hashtable bucket to be released.

I'll try to address some comment from Mauricio.

eBFP has a 64bits architecture, so u64updates are atomic.
Agree on having a pcn function to get such timestamp.

mbertrone · 2019-02-05T09:30:42Z

src/services/pcn-simplebridge/src/Simplebridge_dp.c

 } __attribute__((packed));

 BPF_TABLE("hash", __be64, struct fwd_entry, fwdtable, 1024);
+BPF_TABLE_SHARED("percpu_array", int, uint64_t, timestamp, 1);


What about, if a control plane thread updates timestamp every second.
During the update lock will be per-cpu, not global.

But is also true that control plane thread updates all cpus values at the same time, by locking them all.
And it could be also matter of cache efficiency: if we have just an array instead of a percpu array.

I think it could be interesting to test if there is a performance difference with percpu/non-percpu versions.

This commit removes the bpf_ktime() helper from the Simplebridge used to remove old entries from the filtering database. This helper introduces a performance degradation that could be removed by using a custom thread that updates a percpu map with the timestamp at the same way it is done in the Iptables service. Signed-off-by: Sebastiano Miano <mianosebastiano@gmail.com>

This commit removes the bpf_map_update from the simplebridge standard path. In the learning phase, the filtering database is first checked with the src MAC address of the received packet; if the entry is not present the learning is performed, otherwise we just need to update the timestamp (or the src port). In the latter cases, the value of the entry is directly updated using the pointer returned from the lookup, instead of calling the update helper. In fact, after some tests, I discovered that this function introduces a huge performance degradation, while updating the value directly does not impact on the overall performance. With this commit, the multicore throughput (during forwarding) of the bridge increased from 3.5Mpps (with 64byte packets) to ~11.5Mpps. Signed-off-by: Sebastiano Miano <mianosebastiano@gmail.com>

acloudiator · 2019-02-06T23:06:31Z

I'll try to address some comment from Mauricio.

eBFP has a 64bits architecture, so u64updates are atomic.

Agree on having a pcn function to get such timestamp.

@mbertrone @sebymiano Do we plan to hold this PR until the above are addressed?

sebymiano · 2019-02-07T12:48:00Z

I think we are ready to merge it, @mauriciovasquezbernal what do you think?

mauriciovasquezbernal

LGTM.

We can merge it now and open an issue for implementing the time_get_sec() support directly in polycube in the future.

sebymiano requested review from frisso, mauriciovasquezbernal and mbertrone February 4, 2019 17:50

mauriciovasquezbernal reviewed Feb 5, 2019

View reviewed changes

mbertrone approved these changes Feb 5, 2019

View reviewed changes

sebymiano force-pushed the pr/fix_simplebridge_performance branch 2 times, most recently from ee3be91 to aa76132 Compare February 5, 2019 10:44

mauriciovasquezbernal force-pushed the pr/fix_simplebridge_performance branch 4 times, most recently from 71ddab3 to d951865 Compare February 6, 2019 17:09

sebymiano added 2 commits February 6, 2019 13:14

mauriciovasquezbernal force-pushed the pr/fix_simplebridge_performance branch from d951865 to 0531903 Compare February 6, 2019 18:14

acloudiator assigned sebymiano Feb 6, 2019

acloudiator added the under-review label Feb 6, 2019

mauriciovasquezbernal approved these changes Feb 7, 2019

View reviewed changes

sebymiano merged commit 5604cdc into polycube-network:master Feb 7, 2019

asimpleidea mentioned this pull request Jul 22, 2019

[BUG] Firewall creation fails if including more than fw's name in the JSON #184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve simplebridge data plane performance#56

Improve simplebridge data plane performance#56
sebymiano merged 2 commits intopolycube-network:masterfrom
sebymiano:pr/fix_simplebridge_performance

sebymiano commented Feb 4, 2019

Uh oh!

mauriciovasquezbernal commented Feb 5, 2019 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mauriciovasquezbernal Feb 5, 2019

Uh oh!

mauriciovasquezbernal Feb 5, 2019

Uh oh!

mbertrone Feb 5, 2019

Uh oh!

mauriciovasquezbernal Feb 7, 2019

Uh oh!

Uh oh!

mbertrone left a comment

Uh oh!

mbertrone Feb 5, 2019

Uh oh!

acloudiator commented Feb 6, 2019

Uh oh!

sebymiano commented Feb 7, 2019

Uh oh!

mauriciovasquezbernal left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sebymiano commented Feb 4, 2019

Uh oh!

mauriciovasquezbernal commented Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mauriciovasquezbernal Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

mauriciovasquezbernal Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

mbertrone Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

mauriciovasquezbernal Feb 7, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mbertrone left a comment

Choose a reason for hiding this comment

Uh oh!

mbertrone Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

acloudiator commented Feb 6, 2019

Uh oh!

sebymiano commented Feb 7, 2019

Uh oh!

mauriciovasquezbernal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mauriciovasquezbernal commented Feb 5, 2019 •

edited

Loading