Skip to content

Conversation

@kiwiflyer
Copy link
Contributor

No description provided.

@kiwiflyer
Copy link
Contributor Author

I've manually tested this on 4.7.1 and also on master.

Prior to the patch, the following exception was thrown:

016-03-03 12:00:13,204 INFO c.c.u.d.T.Transaction (logid:) Is Data Base High Availiability enabled? Ans : true
2016-03-03 12:00:13,239 INFO c.c.u.d.T.Transaction (logid:) The slaves configured for Cloud Data base is/are : localhost,localhost
2016-03-03 12:00:13,303 ERROR c.c.u.d.Merovingian2 (logid:) Unable to get a new db connection
java.sql.SQLException: Invalid load balancing strategy 'com.cloud.utils.db.StaticStrategy'.
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
at com.mysql.jdbc.Util.loadExtensions(Util.java:602)
at com.mysql.jdbc.LoadBalancingConnectionProxy.(LoadBalancingConnectionProxy.java:280)
at com.mysql.jdbc.FailoverConnectionProxy.(FailoverConnectionProxy.java:67)
at com.mysql.jdbc.NonRegisteringDriver.connectFailover(NonRegisteringDriver.java:433)
at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:346)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:215)
at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:75)
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1188)
at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106)
at com.cloud.utils.db.TransactionLegacy.getStandaloneConnectionWithException(TransactionLegacy.java:202)
at com.cloud.utils.db.Merovingian2.(Merovingian2.java:68)
at com.cloud.utils.db.Merovingian2.createLockMaster(Merovingian2.java:88)
at com.cloud.server.LockMasterListener.(LockMasterListener.java:33)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

After the patch, the following is observed in the logs

016-03-03 14:53:38,437 INFO c.c.u.d.T.Transaction (logid:) Is Data Base High Availiability enabled? Ans : true
2016-03-03 14:53:38,587 INFO c.c.u.d.T.Transaction (logid:) The slaves configured for Cloud Data base is/are : localhost,localhost
2016-03-03 14:53:39,332 DEBUG c.c.u.d.ConnectionConcierge (logid:) Registering a database connection for LockMaster1
2016-03-03 14:53:39,333 INFO c.c.u.d.Merovingian2 (logid:) Cleaning up locks for 2484520639032
2016-03-03 14:53:39,360 INFO c.c.u.d.Merovingian2 (logid:) Released 0 locks for 2484520639032
2016-03-03 14:53:39,447 INFO o.a.c.s.l.CloudStackExtendedLifeCycle (logid:) Running system integrity checker com.cloud.upgrade.DatabaseUpgradeChecker@71323382
2016-03-03 14:53:39,447 INFO c.c.u.DatabaseUpgradeChecker (logid:) Grabbing lock to check for database upgrade.
2016-03-03 14:53:39,634 DEBUG c.c.u.d.VersionDaoImpl (logid:) Checking to see if the database is at a version before it was the version table is created
2016-03-03 14:53:39,692 INFO c.c.u.DatabaseUpgradeChecker (logid:) DB version = 4.9.0 Code Version = 4.9.0-SNAPSHOT
2016-03-03 14:53:39,693 INFO c.c.u.DatabaseUpgradeChecker (logid:) DB version and code version matches so no upgrade needed.

@remibergsma
Copy link
Contributor

Hi @kiwiflyer, did you test this without MySQL HA as well? I doubt it will work without mysql-connector-java. Also, this most likely also needs to be fixed on other distributions?

@kiwiflyer
Copy link
Contributor Author

Hi Remi,

Yes, we tested it with HA disabled. David is going to upload the logs for
reference.

The mysql-connector is also referenced in the catalina.properties
common.loader for tomcat.

-Si

On Mon, Mar 7, 2016, 10:29 AM Remi Bergsma notifications@github.com wrote:

Hi @kiwiflyer https://github.com/kiwiflyer, did you test this without
MySQL HA as well? I doubt it will work without mysql-connector-java.
Also, this most likely also needs to be fixed on other distributions?


Reply to this email directly or view it on GitHub
#1428 (comment).

@dmabry
Copy link
Contributor

dmabry commented Mar 7, 2016

@remibergsma , We loaded up the management server with this patch and it appears to work just fine on 4.7.1 with db.ha.enable=false and db.cloud.slaves left blank. See logs below:

2016-03-07 13:09:47,721 INFO  [c.c.u.d.T.Transaction] (localhost-startStop-1:null) (logid:) Is Data Base High Availiability enabled? Ans : false
2016-03-07 13:09:47,867 DEBUG [c.c.u.d.ConnectionConcierge] (localhost-startStop-1:null) (logid:) Registering a database connection for LockMaster1
2016-03-07 13:09:47,867 INFO  [c.c.u.d.Merovingian2] (localhost-startStop-1:null) (logid:) Cleaning up locks for 233845178587501
2016-03-07 13:09:47,873 INFO  [c.c.u.d.Merovingian2] (localhost-startStop-1:null) (logid:) Released 0 locks for 233845178587501
2016-03-07 13:09:47,897 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (localhost-startStop-1:null) (logid:) Running system integrity checker com.cloud.upgrade.DatabaseUpgradeChecker@6fd77398
2016-03-07 13:09:47,898 INFO  [c.c.u.DatabaseUpgradeChecker] (localhost-startStop-1:null) (logid:) Grabbing lock to check for database upgrade.
2016-03-07 13:09:47,942 DEBUG [c.c.u.d.VersionDaoImpl] (localhost-startStop-1:null) (logid:) Checking to see if the database is at a version before it was the version table is created
2016-03-07 13:09:47,960 INFO  [c.c.u.DatabaseUpgradeChecker] (localhost-startStop-1:null) (logid:) DB version = 4.7.1 Code Version = 4.7.1
2016-03-07 13:09:47,961 INFO  [c.c.u.DatabaseUpgradeChecker] (localhost-startStop-1:null) (logid:) DB version and code version matches so no upgrade needed.
2016-03-07 13:09:47,961 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (localhost-startStop-1:null) (logid:) Configuring CloudStack Components
2016-03-07 13:09:47,963 INFO  [o.a.c.s.l.CloudStackExtendedLifeCycle] (localhost-startStop-1:null) (logid:) Done Configuring CloudStack Components

@kiwiflyer
Copy link
Contributor Author

@remibergsma In reference to other releases, yes this might also be broken on Ubuntu. We don't use Ubuntu, so having someone chime in who can test this would be nice.

@Slair1
Copy link
Contributor

Slair1 commented Mar 9, 2016

Thanks guys, FYI i applied these change to my 4.8 source, running on Centos7 and it fixed the problem for me also. Thanks again!!

@Slair1
Copy link
Contributor

Slair1 commented Mar 10, 2016

Looks like the cloudstack-usage server has a similar problem. it continuously restarts if db.ha.enable = true.

i got it to work in my environment with the change in this PR: #1433

@kiwiflyer
Copy link
Contributor Author

@Slair1 @dmabry @remibergsma

Could I get some review on this when you get a chance? I'd like to get this PR moving.

@Slair1
Copy link
Contributor

Slair1 commented Apr 29, 2016

this LGTM. we have been running this in our environment for over a month (ACS 4.8). This PR fixes the issue we had with cloudstack-mangaement not starting when sql ha is enabled.

On a related note, we may have another issue somewhere else. When cloudstack is failed over to the slave sql server, we cannot launch a console or issue a reboot of a systemvm from the web-interface. @kiwiflyer have you tested those two actions when failed over to the slave sql server? I just noticed this on our side yesterday while doing maintenance.

@kiwiflyer
Copy link
Contributor Author

@Slair1 Are you failing a ACS management server in addition to the MySQL instance?
If so, make sure your Load Balancer is also moving the traffic to the backup server. Host communication occurs on port 8250.

@Slair1
Copy link
Contributor

Slair1 commented Apr 29, 2016

@kiwiflyer No, we were not failing over our ACS mgmt servers. We still had both up and functional and were able to log directly into both (and of course through our load-balancer). The load-balancing of the ACS mgmt servers works great. I tried opening console windows through both mgmt servers.

when i issue a netstat from the mgmt servers, i see them trying to contact the offline mysql server, when i attempt opening a console. Although, i know most other functions we have tested work fine. In the netstat we see a slew of established connections with the slave mysql server (as it should). But, for some reason when we attempt to open a console it doesn't work and we see the mgmt server try to connect to the offline mysql server.

As soon as the primary mysql server is online, the console windows immediately work.

@kiwiflyer
Copy link
Contributor Author

Interesting. I'll go digging into the code. I'm not that familiar with the console proxy (yet).

@yadvr
Copy link
Member

yadvr commented May 2, 2016

tag:needlove

@Slair1
Copy link
Contributor

Slair1 commented May 2, 2016

@kiwiflyer and @rhtyd I've been troubleshooting the proxy issue and it was an issue in my environment! i have things worked out and everything works fine.

@dmabry
Copy link
Contributor

dmabry commented May 6, 2016

I ran this in our lab and tested the failover and works as expected. LGTM

@swill
Copy link
Contributor

swill commented May 6, 2016

@Slair1 is that confirmation that this code worked for you? It is a bit unclear what you were communicating. Thx.

@Slair1
Copy link
Contributor

Slair1 commented May 6, 2016

@swill it is confirmation the code worked for me

@swill
Copy link
Contributor

swill commented May 6, 2016

@Slair1 perfect, thanks for confirming. 👍

@dmabry
Copy link
Contributor

dmabry commented May 6, 2016

tag:mergeready

@yadvr
Copy link
Member

yadvr commented May 6, 2016

LGTM (code review only)

@swill
Copy link
Contributor

swill commented May 6, 2016

Ok, I will get CI run against this one to make sure that nothing else is broken. This is ready pending the CI run...

@swill
Copy link
Contributor

swill commented May 9, 2016

CI RESULTS

Tests Run: 85
  Skipped: 0
   Failed: 2
   Errors: 0
 Duration: 9h 09m 47s

Summary of the problem(s):

FAIL: Test redundant router internals
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_routers_network_ops.py", line 483, in test_02_RVR_Network_FW_PF_SSH_default_routes_egress_false
    "Attempt to retrieve google.com index page should be successful once rule is added!"
AssertionError: Attempt to retrieve google.com index page should be successful once rule is added!
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_7L4ZQC/results.txt
FAIL: test_04_rvpc_privategw_static_routes (integration.smoke.test_privategw_acl.TestPrivateGwACL)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 277, in test_04_rvpc_privategw_static_routes
    self.performVPCTests(vpc_off)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 324, in performVPCTests
    self.check_pvt_gw_connectivity(vm1, public_ip_1, vm2.nic[0].ipaddress)
  File "/data/git/cs1/cloudstack/test/integration/smoke/test_privategw_acl.py", line 559, in check_pvt_gw_connectivity
    "Ping to outside world from VM should be successful"
AssertionError: Ping to outside world from VM should be successful
----------------------------------------------------------------------
Additional details in: /tmp/MarvinLogs/test_network_7L4ZQC/results.txt

Associated Uploads

/tmp/MarvinLogs/DeployDataCenter__May_07_2016_07_10_28_C71Y5A:

/tmp/MarvinLogs/test_network_7L4ZQC:

/tmp/MarvinLogs/test_vpc_routers_RRH1O6:

Uploads will be available until 2016-07-09 02:00:00 +0200 CEST

Comment created by upr comment.

@swill
Copy link
Contributor

swill commented May 9, 2016

I think this one is ready. The failures are things that periodically fail in my environment and are unrelated to this code. Thanks...

@asfgit asfgit merged commit c22659d into apache:master May 11, 2016
asfgit pushed a commit that referenced this pull request May 11, 2016
Addresses CLOUDSTACK-9300 where the MySQL HA StaticStrategy class fai

* pr/1428:
  Addresses CLOUDSTACK-9300 where the MySQL HA StaticStrategy class fails to load successfully

Signed-off-by: Will Stevens <williamstevens@gmail.com>
aaronhurt pushed a commit to myENA/cloudstack that referenced this pull request Jul 12, 2016
Addresses CLOUDSTACK-9300 where the MySQL HA StaticStrategy class fails to load successfully
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants