Skip to content

Conversation

@ProjectMoon
Copy link

New version of #1450, opened against the 4.8 branch.

Original Description
This pull request fixes a concurrency issue when disabling static NAT on a bunch of IPs simultaneously. Under the old behavior, executing multiple disable requests would result in invalid IP associations being sent to the virtual router. This commit changes the behavior to apply an IP association for only the IP being added/released, which means that it is impossible for the virtual router to receive invalid data.

This was tested against a virtual router running on KVM and VMware. It would be nice to have some input how this change could affect redundant routers and other static NAT providers.

@ProjectMoon ProjectMoon changed the title Enable/disable static NAT associates only relevant IPs. CLOUDSTACK-9317: Enable/disable static NAT associates only relevant IPs. Aug 2, 2016
@blueorangutan
Copy link

RPM packages built and available at: http://packages.shapeblue.com/cloudstack/custom/github-1623.


for (StaticNat snat : staticNats) {
userIps.add(_ipAddressDao.findById(snat.getSourceIpAddressId()));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Executing queries in a for loop in unnecessarily expensive. Could you please refactor this for loop into a new method on IPAddressDao that retrieves a list of IP addresses for a list of IPs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, yes. Better to have one database query than many.

@jburwell
Copy link
Contributor

jburwell commented Aug 4, 2016

@ProjectMoon have you run the test_redundant_router_cleanups, test_redundant_router_services, and test_redundant_router_upgrades test cases for this PR?

Also, is there a JIRA ticket associated with this change?

@ProjectMoon
Copy link
Author

JIRA ticket is CLOUDSTACK-9317. Have not run the Marvin tests, but will do so.

@yadvr
Copy link
Member

yadvr commented Aug 5, 2016

@blueorangutan kick

@blueorangutan
Copy link

A Trillian-Jenkins job has been kicked to build packages and start testing. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian repo: http://packages.shapeblue.com/cloudstack/pr/1623

@jburwell
Copy link
Contributor

If possible, I would like to get this fix into 4.8.2.0/4.9.1.0, but I will be cutting the RC shortly.

@ProjectMoon could you please amend your commit message to include the JIRA ticket ID and an explanation of the motivation for the change?

@rhtyd @borisstoyanov can we push this PR through Trillian with smoke tests and the following test suites:

  • test_redundant_router_cleanups
  • test_redundant_router_services
  • test_redundant_router_upgrades

@murali-reddy Do you think we need to run any other test VR test cases for proper regression testing?

@blueorangutan
Copy link

Trillian test result (trillian-pr1623-38-vmware-55u3-cs48):
Test completed. 37 look ok, 12 have errors

Test Result Time
test_createRegion Success 0.035
test_DeployVmAntiAffinityGroup_in_project Success 267.303
test_01_router_internal_basic Success 0.606
test_02_router_internal_adv Success 1.027
test_03_restart_network_cleanup Success 140.944
test_04_restart_network_wo_cleanup Success 5.607
test_05_router_basic Success 0.029
test_06_router_advanced Success 0.046
test_07_stop_router Success 25.243
test_08_start_router Success 120.828
test_09_reboot_router Success 130.835
test_DeployVmAntiAffinityGroup Success 196.684
test_01_scale_vm Skipped 66.441
test_deploy_vgpu_enabled_vm Skipped 0.004
test_deploy_vm_from_iso Success 843.661
test_01_sys_vm_start Success 0.148
test_02_sys_template_ready Success 0.099
test_00_deploy_vm_root_resize Success 6.498
test_01_deploy_vm_root_resize Success 6.235
test_02_deploy_vm_root_resize Success 6.266
test_deployvm_firstfit Success 206.520
test_deployvm_userconcentrated Success 121.187
test_deployvm_userdispersing Success 55.752
test_01_create_service_offering Success 0.108
test_02_edit_service_offering Success 0.094
test_03_delete_service_offering Success 0.045
test_04_change_offering_small Success 97.309
test_deployvm_userdata Success 161.303
test_deployvm_userdata_post Success 20.425
test_01_create_disk_offering Success 0.113
test_02_create_sparse_type_disk_offering Success 0.075
test_04_create_fat_type_disk_offering Success 0.076
test_02_edit_disk_offering Success 0.048
test_03_delete_disk_offering Success 0.042
test_01_snapshot_root_disk Success 222.219
test_UpdateConfigParamWithScope Success 0.165
ContextSuite context=TestDedicateGuestVlanRange>:setup Error 0.000
test_01_list_sec_storage_vm Success 0.157
test_02_list_cpvm_vm Success 0.139
test_03_ssvm_internals Success 3.896
test_04_cpvm_internals Success 1.199
test_05_stop_ssvm Success 204.223
test_06_stop_cpvm Success 181.887
test_07_reboot_ssvm Success 158.813
test_08_reboot_cpvm Success 156.708
test_09_destroy_ssvm Success 233.966
test_10_destroy_cpvm Success 236.990
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Failure 441.264
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Failure 546.859
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Error 441.481
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Error 516.759
test_01_create_template Success 146.055
test_CreateTemplateWithDuplicateName Success 251.738
test_02_edit_template Success 90.149
test_03_delete_template Success 5.145
test_04_extract_template Success 10.191
test_05_template_permissions Success 0.040
test_06_copy_template Skipped 0.000
test_07_list_public_templates Success 0.025
test_08_list_system_templates Success 0.056
test_01_create_iso Success 66.332
test_02_edit_iso Success 0.126
test_03_delete_iso Success 95.195
test_04_extract_Iso Success 5.192
test_05_iso_permissions Success 0.048
test_06_copy_iso Skipped 0.000
test_07_list_default_iso Success 0.039
test_01_create_lb_rule_src_nat Success 187.843
test_02_create_lb_rule_non_nat Success 187.338
test_assign_and_removal_lb Success 133.652
login_test_saml_user Success 22.090
test_nic_secondaryip_add_remove Success 192.737
test_advZoneVirtualRouter Success 0.020
test_deploy_vm Success 0.018
test_deploy_vm_multiple Success 283.325
test_01_stop_vm Success 10.132
test_02_start_vm Success 20.214
test_03_reboot_vm Success 5.124
test_06_destroy_vm Success 5.108
test_07_restore_vm Success 0.102
test_08_migrate_vm Success 76.170
test_09_expunge_vm Success 125.236
test_10_attachAndDetach_iso Success 72.134
test_network_acl Success 201.902
test_delete_account Success 278.007
test_01_port_fwd_on_src_nat Success 111.800
test_02_port_fwd_on_non_src_nat Success 55.628
test_public_ip_admin_account Success 40.272
test_public_ip_user_account Success 10.259
test_reboot_router Success 625.125
test_releaseIP Success 238.215
test_network_rules_acquired_public_ip_1_static_nat_rule Success 124.311
test_network_rules_acquired_public_ip_2_nat_rule Success 61.609
test_network_rules_acquired_public_ip_3_Load_Balancer_Rule Success 66.731
test_01_test_vm_volume_snapshot Error 156.243
test_01_create_vm_snapshots Success 161.736
test_02_revert_vm_snapshots Success 194.222
test_03_delete_vm_snapshots Success 275.205
test_vm_nic_adapter_vmxnet3 Skipped 0.000
test_01_nic Success 868.217
test_01_create_volume Success 520.190
test_02_attach_volume Success 58.866
test_03_download_attached_volume Success 20.328
test_04_delete_attached_volume Success 15.258
test_05_detach_volume Success 100.311
test_06_download_detached_volume Success 60.581
test_07_resize_fail Skipped 10.258
test_08_resize_volume Skipped 10.193
test_09_delete_detached_volume Success 30.919
test_extendPhysicalNetworkVlan Error 0.022
test_UpdateStorageOverProvisioningFactor Success 0.123
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Success 1213.876
test_02_redundant_VPC_default_routes Success 579.689
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Success 759.643
test_04_rvpc_network_garbage_collector_nics Success 865.118
test_05_rvpc_multi_tiers Success 662.118
test_isolate_network_password_server Failure 166.672
test_createPortablePublicIPAcquire Success 15.658
test_createPortablePublicIPRange Success 15.391
test_01_VPC_nics_after_destroy Success 637.734
test_02_VPC_default_routes Success 393.037
test_01_primary_storage_iscsi Skipped 0.028
test_01_primary_storage_nfs Success 38.355
test_01_vpc_privategw_acl Failure 101.960
test_02_vpc_privategw_static_routes Failure 207.667
test_03_vpc_privategw_restart_vpc_cleanup Failure 242.957
test_04_rvpc_privategw_static_routes Failure 434.113
test_01_redundant_vpc_site2site_vpn Error 692.909
ContextSuite context=TestRVPCSite2SiteVpn>:teardown Error 698.010
test_01_vpc_remote_access_vpn Error 0.064
test_01_vpc_site2site_vpn Error 486.664
test_dedicatePublicIpRange Error 0.298
test_create_pvlan_network Success 5.216
test_01_quota Error 0.033
test_02_quota Error 0.029
test_03_quota Error 0.028
test_04_quota Error 0.029
test_05_quota Error 0.027
test_06_quota Error 0.032
test_07_quota Error 0.031
test_01_reset_vm_on_reboot Success 25.316
test_01_updatevolumedetail Success 0.122
test_router_dhcphosts Failure 166.785
ContextSuite context=TestRouterDHCPHosts>:teardown Error 187.075
test_02_routervm_iptables_policies Failure 231.410
test_01_single_VPC_iptables_policies Failure 371.973
test_01_isolate_network_FW_PF_default_routes_egress_true Success 258.519
test_02_isolate_network_FW_PF_default_routes_egress_false Success 268.553
test_01_RVR_Network_FW_PF_SSH_default_routes_egress_true Failure 410.379
test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false Failure 361.440
test_03_RVR_Network_check_router_state Success 369.925

Trillian env - trillian-pr1623-38-vmware-55u3-cs48, Job ID 38
Hypervisor: vmware-55u3 (x2), Advanced Zone
Mgmt host os - 6
Marvin logs at: http://packages.shapeblue.com/cloudstack/pr/1623/trillian/trillian-pr1623-38-vmware-55u3-cs48

@ProjectMoon
Copy link
Author

It's been amended and rebased. I still haven't had time to add a new DAO method to the IP address DAO. I can work on that now though. When is the RC being cut?

@jburwell
Copy link
Contributor

@ProjectMoon I am trying to cut the 4.8.2.0 RC ASAP. This PR looks like it is not going to make it given the test failures. Do you mind re-pointing it to 4.9? We will make this PR a high priority for 4.9.2.0.

@ProjectMoon
Copy link
Author

Yeah. Unfortunately I have not had any time to work on the pull requests lately. When I have time I will reopen it against 4.9.

@jburwell
Copy link
Contributor

@ProjectMoon I am extending the date for 4.8 to 25 Sept 2016. Therefore, if we can get the test failure fixed, we can get it into 4.8.2.0, 4.9.1.0, and 4.10.0.0. Also, could you please investigate the Travis and Jenkins build failures?

@yadvr
Copy link
Member

yadvr commented Oct 21, 2016

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-77

@ProjectMoon
Copy link
Author

This one hasn't been rebased in quite a while, so the packages produced will be rather old. Going to rebase this against latest 4.8.

@yadvr
Copy link
Member

yadvr commented Nov 20, 2016

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-207

The previous behavior would send a list of all IPs with a state of
add/revoke. This causes concurrency issues when multiple IPs were
being added or removed, and could leave networks with wrong IP
addresses assigned to the devices. By assigning and removing only the
IP being changed per-request, the concurrency issue is removed and no
real performance loss occurs.
@ProjectMoon
Copy link
Author

Updated to latest 4.8.

@yadvr
Copy link
Member

yadvr commented Nov 23, 2016

@ProjectMoon thanks
@murali-reddy can you help review this?
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-231

@yadvr
Copy link
Member

yadvr commented Nov 24, 2016

@blueorangutan test

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-420)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 28011 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1623-t420-kvm-centos7.zip
Test completed. 42 look ok, 1 have error(s)

Test Result Time (s) Test File
test_05_rvpc_multi_tiers Failure 255.12 test_vpc_redundant.py
ContextSuite context=TestVPCRedundancy>:teardown Error 565.30 test_vpc_redundant.py
test_01_vpc_site2site_vpn Success 202.68 test_vpc_vpn.py
test_01_vpc_remote_access_vpn Success 76.67 test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn Success 299.83 test_vpc_vpn.py
test_02_VPC_default_routes Success 344.28 test_vpc_router_nics.py
test_01_VPC_nics_after_destroy Success 604.80 test_vpc_router_nics.py
test_04_rvpc_network_garbage_collector_nics Success 1490.00 test_vpc_redundant.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Success 639.50 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Success 810.54 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Success 1364.84 test_vpc_redundant.py
test_09_delete_detached_volume Success 16.35 test_volumes.py
test_08_resize_volume Success 16.01 test_volumes.py
test_07_resize_fail Success 21.15 test_volumes.py
test_06_download_detached_volume Success 15.66 test_volumes.py
test_05_detach_volume Success 100.31 test_volumes.py
test_04_delete_attached_volume Success 10.25 test_volumes.py
test_03_download_attached_volume Success 15.42 test_volumes.py
test_02_attach_volume Success 74.83 test_volumes.py
test_01_create_volume Success 721.35 test_volumes.py
test_deploy_vm_multiple Success 366.43 test_vm_life_cycle.py
test_deploy_vm Success 0.03 test_vm_life_cycle.py
test_advZoneVirtualRouter Success 0.02 test_vm_life_cycle.py
test_10_attachAndDetach_iso Success 27.28 test_vm_life_cycle.py
test_09_expunge_vm Success 125.26 test_vm_life_cycle.py
test_08_migrate_vm Success 41.44 test_vm_life_cycle.py
test_07_restore_vm Success 0.26 test_vm_life_cycle.py
test_06_destroy_vm Success 126.82 test_vm_life_cycle.py
test_03_reboot_vm Success 126.19 test_vm_life_cycle.py
test_02_start_vm Success 10.26 test_vm_life_cycle.py
test_01_stop_vm Success 40.48 test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName Success 116.36 test_templates.py
test_08_list_system_templates Success 0.05 test_templates.py
test_07_list_public_templates Success 0.07 test_templates.py
test_05_template_permissions Success 0.12 test_templates.py
test_04_extract_template Success 5.26 test_templates.py
test_03_delete_template Success 5.14 test_templates.py
test_02_edit_template Success 90.18 test_templates.py
test_01_create_template Success 156.90 test_templates.py
test_10_destroy_cpvm Success 161.83 test_ssvm.py
test_09_destroy_ssvm Success 163.73 test_ssvm.py
test_08_reboot_cpvm Success 132.03 test_ssvm.py
test_07_reboot_ssvm Success 133.76 test_ssvm.py
test_06_stop_cpvm Success 162.50 test_ssvm.py
test_05_stop_ssvm Success 134.06 test_ssvm.py
test_04_cpvm_internals Success 1.17 test_ssvm.py
test_03_ssvm_internals Success 3.38 test_ssvm.py
test_02_list_cpvm_vm Success 0.12 test_ssvm.py
test_01_list_sec_storage_vm Success 0.13 test_ssvm.py
test_01_snapshot_root_disk Success 11.32 test_snapshots.py
test_04_change_offering_small Success 210.02 test_service_offerings.py
test_03_delete_service_offering Success 0.04 test_service_offerings.py
test_02_edit_service_offering Success 0.09 test_service_offerings.py
test_01_create_service_offering Success 0.29 test_service_offerings.py
test_02_sys_template_ready Success 0.12 test_secondary_storage.py
test_01_sys_vm_start Success 0.17 test_secondary_storage.py
test_09_reboot_router Success 45.57 test_routers.py
test_08_start_router Success 41.15 test_routers.py
test_07_stop_router Success 10.62 test_routers.py
test_06_router_advanced Success 0.06 test_routers.py
test_05_router_basic Success 0.06 test_routers.py
test_04_restart_network_wo_cleanup Success 5.89 test_routers.py
test_03_restart_network_cleanup Success 91.34 test_routers.py
test_02_router_internal_adv Success 1.15 test_routers.py
test_01_router_internal_basic Success 0.63 test_routers.py
test_router_dhcphosts Success 348.17 test_router_dhcphosts.py
test_01_updatevolumedetail Success 0.10 test_resource_detail.py
test_01_reset_vm_on_reboot Success 171.74 test_reset_vm_on_reboot.py
test_createRegion Success 0.11 test_regions.py
test_create_pvlan_network Success 5.60 test_pvlan.py
test_dedicatePublicIpRange Success 1.31 test_public_ip_range.py
test_04_rvpc_privategw_static_routes Success 667.24 test_privategw_acl.py
test_03_vpc_privategw_restart_vpc_cleanup Success 564.57 test_privategw_acl.py
test_02_vpc_privategw_static_routes Success 464.09 test_privategw_acl.py
test_01_vpc_privategw_acl Success 116.21 test_privategw_acl.py
test_01_primary_storage_nfs Success 36.26 test_primary_storage.py
test_createPortablePublicIPRange Success 15.27 test_portable_publicip.py
test_createPortablePublicIPAcquire Success 15.62 test_portable_publicip.py
test_isolate_network_password_server Success 87.15 test_password_server.py
test_UpdateStorageOverProvisioningFactor Success 0.17 test_over_provisioning.py
test_extendPhysicalNetworkVlan Success 15.53 test_non_contigiousvlan.py
test_01_nic Success 759.98 test_nic.py
test_releaseIP Success 356.34 test_network.py
test_reboot_router Success 437.86 test_network.py
test_public_ip_user_account Success 10.61 test_network.py
test_public_ip_admin_account Success 40.34 test_network.py
test_network_rules_acquired_public_ip_3_Load_Balancer_Rule Success 67.51 test_network.py
test_network_rules_acquired_public_ip_2_nat_rule Success 62.15 test_network.py
test_network_rules_acquired_public_ip_1_static_nat_rule Success 124.43 test_network.py
test_delete_account Success 345.77 test_network.py
test_02_port_fwd_on_non_src_nat Success 56.08 test_network.py
test_01_port_fwd_on_src_nat Success 112.51 test_network.py
test_nic_secondaryip_add_remove Success 236.12 test_multipleips_per_nic.py
login_test_saml_user Success 29.14 test_login.py
test_assign_and_removal_lb Success 135.14 test_loadbalance.py
test_02_create_lb_rule_non_nat Success 189.02 test_loadbalance.py
test_01_create_lb_rule_src_nat Success 210.45 test_loadbalance.py
test_07_list_default_iso Success 0.09 test_iso.py
test_05_iso_permissions Success 0.09 test_iso.py
test_04_extract_Iso Success 6.41 test_iso.py
test_03_delete_iso Success 95.13 test_iso.py
test_02_edit_iso Success 0.09 test_iso.py
test_01_create_iso Success 42.23 test_iso.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Success 401.55 test_internal_lb.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Success 219.33 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Success 625.53 test_internal_lb.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Success 518.04 test_internal_lb.py
test_dedicateGuestVlanRange Success 10.30 test_guest_vlan_range.py
test_UpdateConfigParamWithScope Success 0.34 test_global_settings.py
test_04_create_fat_type_disk_offering Success 0.28 test_disk_offerings.py
test_03_delete_disk_offering Success 0.16 test_disk_offerings.py
test_02_edit_disk_offering Success 0.43 test_disk_offerings.py
test_02_create_sparse_type_disk_offering Success 0.38 test_disk_offerings.py
test_01_create_disk_offering Success 0.89 test_disk_offerings.py
test_deployvm_userdispersing Success 21.42 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userconcentrated Success 45.96 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_firstfit Success 50.81 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userdata_post Success 56.43 test_deploy_vm_with_userdata.py
test_deployvm_userdata Success 56.41 test_deploy_vm_with_userdata.py
test_02_deploy_vm_root_resize Success 7.61 test_deploy_vm_root_resize.py
test_01_deploy_vm_root_resize Success 7.14 test_deploy_vm_root_resize.py
test_00_deploy_vm_root_resize Success 239.89 test_deploy_vm_root_resize.py
test_deploy_vm_from_iso Success 234.26 test_deploy_vm_iso.py
test_DeployVmAntiAffinityGroup Success 97.00 test_affinity_groups.py
test_03_delete_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_02_revert_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_01_test_vm_volume_snapshot Skipped 0.00 test_vm_snapshots.py
test_01_create_vm_snapshots Skipped 0.00 test_vm_snapshots.py
test_06_copy_template Skipped 0.00 test_templates.py
test_01_scale_vm Skipped 0.00 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.04 test_primary_storage.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_deploy_vgpu_enabled_vm Skipped 0.01 test_deploy_vgpu_enabled_vm.py

@yadvr
Copy link
Member

yadvr commented Nov 25, 2016

@abhinandanprateek @murali-reddy and others -- can we have review on this, thanks.

@yadvr
Copy link
Member

yadvr commented Nov 28, 2016

@abhinandanprateek @murali-reddy ping

@ProjectMoon
Copy link
Author

Closed in favor of #1908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants