Bxp Release 10
From All n One's bxp software Wixi
Contents
1 Overview
In December 2016, All n One signed with Sungard for a further 3 years and a technical infrastructure refresh. This refresh would draw on a number of experiences and also allow the expansion of the service to new capabilities.
bxp Release 10 went live on Sunday 26th of March 2017. The primary focus was to
- change the infrastructure's physical servers,
- server operating systems and
- to perform a database engine upgrade.
2 Background
2.1 Primary Notes
In June 2016, All n One retained the skills of Jason Ryan for the Infrastructure department. Whilst previously managed by Frameworks, and then Security departments, this move represented the focus and effort on ensuring and improving equipment and service availability.
Ignoring other elements of the bxp infrastructure, we start with the following.
Web1 and Web2 are Windows 2008 servers R2, with IIS running. bxp's functionality is delivered through these servers.
Currently there is a component MailBee which is used to retrieve emails from an inbox. https://www.afterlogic.com/mailbee/objects
Securely the Web servers use MyODBC connections using SSL to communicate with Db1.
Nightly data is copied from Db1 to Db2, meaning that in the event of a catastrophic failure of Db1, Db2 can take over very quickly.
The speed of take over varies with staff, experience and circumstance and so All n One are seeking to improve things on a number of fronts.
Db1 and Db2, were running MySQL 5.6 Community Server edition. The list of companies using MySQL is long and distinguished. https://www.mysql.com/customers/
2.2 Replication
When chatting with Denis Creighton and the technical team in FEXCO (who also use MySQL inhouse as well as bxp) we discovered that MySQL 5.7 had some distinct advantages over 5.6.
The primary being that replicate functionality was available in the Community Server edition rather than the Cluster server version. This functionality has also been extended to work on the MyISAM data table format.
All recent enough changes to warrant exploration of the upgrade.
http://dev.mysql.com/doc/refman/5.7/en/replication.html
The master / slave functionality now means that in real time Db2 can serve as a database server. It is also now capable of responding to queries, thus allowing it to become operational rather than just a backup server waiting in the wings.
2.3 Operational challenges
As bxp has grown we have bumped into some specific areas of operational challenge. We have met most of them but some have been long standing and are being addressed.
Service growth has seen the number of requests to our servers drastically increase in a year.
| Type | 2015 | 2016 |
|---|---|---|
| Total Hits | 110,815,438 | 263,223,969 |
| Visitor Hits | 110,787,219 | 263,199,942 |
| Spider Hits | 28,219 | 24,027 |
| Average Hits per Day | 303,603 | 719,191 |
| Average Hits per Visitor | 7,899.8 | 22,740.6 |
| Cached Requests | 45,995,799 | 147,889,230 |
| Failed Requests | 41,191 | 342,970 |
| Total Page Views | 45,580,189 | 97,936,690 |
| Average Page Views per Day | 124,877 | 267,586 |
| Average Page Views per Visitor | 3,250.2 | 8,461.8 |
2.4 Success Stories
On June 16th 2016, we rolled out bxp R9 SER1. This resulted in a massive operational headache from which our business continuity plans were tested. The resulting operational challenges whilst met and frustrating for our clients did demonstrate the ability to withstand difficult circumstances and still deliver service.
From that incident a number of clients implemented checking mechanisms all of which now verify the availability and reliability of the bxp service.
Also as a result of that challenge, the cpu and ram of the key servers was upgrade which has seen the service from an administrative point of view mean capacity planning was implemented seamlessly for clients.
We have added more and more functionality to bxp especially in the area of “outcome capabilities” and no impact on server performance.
We had more functionality roll out and delivered far more results than ever before with no user observed slow down.
2.5 IIS and Memory Cycling
Internet Information Server is Microsoft's tried and tested webserver. It has according to various sources got 32% of the market in terms of web servers, not far behind Apache.
| Feature | IIS | Apache |
|---|---|---|
| Supported OS | Windows | Linux, Unix, Windows, Mac OS |
| User support & fixes | Corporate support | Community Support |
| Cost | Free, but bundled with Windows | Completely free |
| Development | Closed, proprietary | Open source |
| Security | Excellent | Good |
| Performance | Good | Good |
| Market Share | 32% | 42% |
One line in particular for IIS “Closed, proprietary” poses a challenge. IIS on both our web servers has the ability to memory recycle. https://technet.microsoft.com/en-us/library/cc753179(v=ws.10).aspx
When we attempt to use this service, in time with MailBee, IIS doesn’t recycle. In fact it causes IIS to become unavailable. Whilst not very often it is an unacceptable state to happen.
The solution is where we set up a scheduled timed restart of IIS using DOS instead of the inbuilt recycle.
This harder “kill it restart approach” has lead to more hamsters for clients pulling large reports. i.e. if you’re in the middle of generating a long report and the restart kicks in, it kills the report generation. We monitor these very closely under our Mint hamsters.
2.6 Speed of Recovery
Whilst understanding that the high availability and high reliability of the equipment in use is good for everyone, ISO 27001 planning does not accept leaving a risk where better could be found.
Also as a team of perfectionists the operations team at All n One want to be able to provide seamless fail over if possible.
For this reason we need to find the better solution for Db2 and “replication” seems to be the solution.
We had looked at MongoDb recently as a solution but given recent security challenges, our internal thinking is better to stick to the solution you know.
http://www.information-age.com/major-security-alert-40000-mongodb-databases-left-unsecured-internet-123459001/
2.7 KeyStats and Large Data Set reports
In database terminology an OLTP (online transaction processing server) is one such as a till at Tesco’s. Every time an item is scanned, beep, an item is added to the database. This means that there is a lot of small, but very active additions to the database.
When you run a report, the table containing the data is locked to allow the totals to be calculated. Even though this can be done quickly when the dataset becomes very large this means things could slow down. Imagine all the tills in Tesco becoming unavailable while someone runs a report for last year on all the sales of strawberries.
Instead the solution for Tesco is to move all the data into a data warehouse. This data warehouse can then have advanced reporting applied. OLAP or on-line analytical processing is specific reporting tools for analysing data sets.
KeyStats when it was built didn’t consider the datasets for some clients being so large. For one client they’re running reports every 5 minutes on a dataset with > 2 million entries. Whilst the servers can do this processing, before fixes were put in place, the servers were performing this report 10 times per second, with an updated report being requested before previous ones had even been served. This is why KeyStat mirroring was introduced.
3 Physical Servers
The physical infrastructure changed from
to
Key Notes
- bxp API is available only through the firewall but for ease of diagram is posted differently above.
- USB1 is conceptual for off-site data removal procedures. Ports restricted and bitlocker required on any connection.
- Db2, will become an active server, rather than a passive backup server.
- The hard drives in the Db servers are being changed from 7200 rpm drives to Solid State Drives, which will see throughput increase over 10 fold.
- The cost of rental of course will increase for All n One, but this reinvestment reflects our commitment to quality service delivery.
- A new web server will be introduced (W3) but not as a public facing server but instead as a processing server for large cost output.
- By allowing W3 to perform high CPU based calculations and to do advanced data processing tasks, we have to put the solution onto a server that won’t affect the OLTP nature of our webservers, W1 and W2 effectively.
- Previously if we’d used Db1, the tables would lock and the tills still would have been unusable. Instead, we’re able to allow W3 to chat to Db2, which is being kept up to date in real time. This means the tills aren’t affected and yet we can do very memory intensive operations.
- This change addresses KeyStats generation speed issues, as they will be created on W3.
4 Server Operating Systems
The operating systems driving the servers was previous Windows Server 2008 R2 which has been upgraded to Windows Server 2012 R2.
- Windows 2008 R2 SP1 has an end of life Jan/14/2020
- Windows 2012 R2 has an end of life Jan/10/2023
http://www.mirazon.com/a-comprehensive-guide-to-microsoft-end-of-support/
The option for not selecting Windows Server 2016 was that the release date of 2016 was September 26, 2016. At time of planning, the risk of early adoption was considered too high to implement. Supporting software, , primarily MySQL, was not approved for Windows Server 2016 compatibility during planning.
All n One have always positioned ourselves as Late Majority for security reasons
5 Database engine upgrade
The MySQL engine was upgraded from v5.6 to v5.7. https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
This change also saw the introduction of live data replication.
https://dev.mysql.com/doc/refman/5.6/en/replication.html
Server replication down time was reduced from 4 hour turn around to 180 second swap over in worst case scenarios.
6 Schedule notes
All n One are operationally dependant on Sungard implementing their solution. As soon as firm dates are supplied All n One can supply those to you our clients.
We’re not resting on our laurels though and have gone with very guesstimated timelines.
| Date | Notes |
|---|---|
| 2017-01-14 | MySQL 5.7 installed on Db2 and confirmed working |
| 2017-01-21 | Demonstration instance of bxp will be rerouted to Db2 in a live environment |
| 2017-01-28 | Db1 upgrade to MySQL 5.7 |
| 2017-02-04 | Live replication from Db1 to Db2 and change to all instances to allow live swap in the event of Db1 unavailable. |
| 2017-02-05 | Test failure of Db1, to simulate transfer |
| 2017-02-11 | Live setup of new infrastructure |
| 2017-02-18 | Migration of services from current infrastructure to new infrastructure (Actually moved 26th March 2017 due to staff availability) |
| 2017-02-25 | KeyStat generation and all scheduled processes to be migrated to Web3 |
| 2014-03-04 | Migration of Mailbee to W3 process |
If the new live server infrastructure becomes available earlier or later this schedule will alter.
