Bxp Release 10

From All n One's bxp software Wixi

Jump to: navigation, search

1 Overview

In December 2016, All n One signed with Sungard for a further 3 years and a technical infrastructure refresh. This refresh would draw on a number of experiences and also allow the expansion of the service to new capabilities.


bxp Release 10 went live on Sunday 26th of March 2017. The primary focus was to

  • change the infrastructure's physical servers,
  • server operating systems and
  • to perform a database engine upgrade.


2 Background

2.1 Primary Notes

In June 2016, All n One retained the skills of Jason Ryan for the Infrastructure department. Whilst previously managed by Frameworks, and then Security departments, this move represented the focus and effort on ensuring and improving equipment and service availability.


Ignoring other elements of the bxp infrastructure, we start with the following.


Web1 and Web2 are Windows 2008 servers R2, with IIS running. bxp's functionality is delivered through these servers.


Currently there is a component MailBee which is used to retrieve emails from an inbox. https://www.afterlogic.com/mailbee/objects


Securely the Web servers use MyODBC connections using SSL to communicate with Db1.


Nightly data is copied from Db1 to Db2, meaning that in the event of a catastrophic failure of Db1, Db2 can take over very quickly.


The speed of take over varies with staff, experience and circumstance and so All n One are seeking to improve things on a number of fronts.


Db1 and Db2, were running MySQL 5.6 Community Server edition. The list of companies using MySQL is long and distinguished. https://www.mysql.com/customers/


2.2 Replication

When chatting with Denis Creighton and the technical team in FEXCO (who also use MySQL inhouse as well as bxp) we discovered that MySQL 5.7 had some distinct advantages over 5.6.


The primary being that replicate functionality was available in the Community Server edition rather than the Cluster server version. This functionality has also been extended to work on the MyISAM data table format.


All recent enough changes to warrant exploration of the upgrade. http://dev.mysql.com/doc/refman/5.7/en/replication.html


The master / slave functionality now means that in real time Db2 can serve as a database server. It is also now capable of responding to queries, thus allowing it to become operational rather than just a backup server waiting in the wings.

2.3 Operational challenges

As bxp has grown we have bumped into some specific areas of operational challenge. We have met most of them but some have been long standing and are being addressed.


Service growth has seen the number of requests to our servers drastically increase in a year.

Type 2015 2016
Total Hits 110,815,438 263,223,969
Visitor Hits 110,787,219 263,199,942
Spider Hits 28,219 24,027
Average Hits per Day 303,603 719,191
Average Hits per Visitor 7,899.8 22,740.6
Cached Requests 45,995,799 147,889,230
Failed Requests 41,191 342,970
Total Page Views 45,580,189 97,936,690
Average Page Views per Day 124,877 267,586
Average Page Views per Visitor 3,250.2 8,461.8


2.4 Success Stories

On June 16th 2016, we rolled out bxp R9 SER1. This resulted in a massive operational headache from which our business continuity plans were tested. The resulting operational challenges whilst met and frustrating for our clients did demonstrate the ability to withstand difficult circumstances and still deliver service.


From that incident a number of clients implemented checking mechanisms all of which now verify the availability and reliability of the bxp service.


Also as a result of that challenge, the cpu and ram of the key servers was upgrade which has seen the service from an administrative point of view mean capacity planning was implemented seamlessly for clients. We have added more and more functionality to bxp especially in the area of “outcome capabilities” and no impact on server performance.


We had more functionality roll out and delivered far more results than ever before with no user observed slow down.


2.5 IIS and Memory Cycling

Internet Information Server is Microsoft's tried and tested webserver. It has according to various sources got 32% of the market in terms of web servers, not far behind Apache.

Feature IIS Apache
Supported OS Windows Linux, Unix, Windows, Mac OS
User support & fixes Corporate support Community Support
Cost Free, but bundled with Windows Completely free
Development Closed, proprietary Open source
Security Excellent Good
Performance Good Good
Market Share 32% 42%

One line in particular for IIS “Closed, proprietary” poses a challenge. IIS on both our web servers has the ability to memory recycle. https://technet.microsoft.com/en-us/library/cc753179(v=ws.10).aspx


When we attempt to use this service, in time with MailBee, IIS doesn’t recycle. In fact it causes IIS to become unavailable. Whilst not very often it is an unacceptable state to happen.


The solution is where we set up a scheduled timed restart of IIS using DOS instead of the inbuilt recycle.


This harder “kill it restart approach” has lead to more hamsters for clients pulling large reports. i.e. if you’re in the middle of generating a long report and the restart kicks in, it kills the report generation. We monitor these very closely under our Mint hamsters.


2.6 Speed of Recovery

Whilst understanding that the high availability and high reliability of the equipment in use is good for everyone, ISO 27001 planning does not accept leaving a risk where better could be found.


Also as a team of perfectionists the operations team at All n One want to be able to provide seamless fail over if possible.


For this reason we need to find the better solution for Db2 and “replication” seems to be the solution.


We had looked at MongoDb recently as a solution but given recent security challenges, our internal thinking is better to stick to the solution you know. http://www.information-age.com/major-security-alert-40000-mongodb-databases-left-unsecured-internet-123459001/


2.7 KeyStats and Large Data Set reports

In database terminology an OLTP (online transaction processing server) is one such as a till at Tesco’s. Every time an item is scanned, beep, an item is added to the database. This means that there is a lot of small, but very active additions to the database.


When you run a report, the table containing the data is locked to allow the totals to be calculated. Even though this can be done quickly when the dataset becomes very large this means things could slow down. Imagine all the tills in Tesco becoming unavailable while someone runs a report for last year on all the sales of strawberries.


Instead the solution for Tesco is to move all the data into a data warehouse. This data warehouse can then have advanced reporting applied. OLAP or on-line analytical processing is specific reporting tools for analysing data sets.


KeyStats when it was built didn’t consider the datasets for some clients being so large. For one client they’re running reports every 5 minutes on a dataset with > 2 million entries. Whilst the servers can do this processing, before fixes were put in place, the servers were performing this report 10 times per second, with an updated report being requested before previous ones had even been served. This is why KeyStat mirroring was introduced.


3 Physical Servers

The physical infrastructure changed from

BER8SER2Infrastructure.png

to

bxpRelease10Infrastructurev1-0.png


Key Notes

  • bxp API is available only through the firewall but for ease of diagram is posted differently above.
  • USB1 is conceptual for off-site data removal procedures. Ports restricted and bitlocker required on any connection.
  • Db2, will become an active server, rather than a passive backup server.
  • The hard drives in the Db servers are being changed from 7200 rpm drives to Solid State Drives, which will see throughput increase over 10 fold.
  • The cost of rental of course will increase for All n One, but this reinvestment reflects our commitment to quality service delivery.
  • A new web server will be introduced (W3) but not as a public facing server but instead as a processing server for large cost output.
  • By allowing W3 to perform high CPU based calculations and to do advanced data processing tasks, we have to put the solution onto a server that won’t affect the OLTP nature of our webservers, W1 and W2 effectively.
  • Previously if we’d used Db1, the tables would lock and the tills still would have been unusable. Instead, we’re able to allow W3 to chat to Db2, which is being kept up to date in real time. This means the tills aren’t affected and yet we can do very memory intensive operations.
  • This change addresses KeyStats generation speed issues, as they will be created on W3.

4 Server Operating Systems

The operating systems driving the servers was previous Windows Server 2008 R2 which has been upgraded to Windows Server 2012 R2.


  • Windows 2008 R2 SP1 has an end of life Jan/14/2020
  • Windows 2012 R2 has an end of life Jan/10/2023

http://www.mirazon.com/a-comprehensive-guide-to-microsoft-end-of-support/


The option for not selecting Windows Server 2016 was that the release date of 2016 was September 26, 2016. At time of planning, the risk of early adoption was considered too high to implement. Supporting software, , primarily MySQL, was not approved for Windows Server 2016 compatibility during planning.


All n One have always positioned ourselves as Late Majority for security reasons

Release10Adoptersv1-0.png


5 Database engine upgrade

The MySQL engine was upgraded from v5.6 to v5.7. https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html


This change also saw the introduction of live data replication. https://dev.mysql.com/doc/refman/5.6/en/replication.html


Server replication down time was reduced from 4 hour turn around to 180 second swap over in worst case scenarios.


6 Schedule notes

All n One are operationally dependant on Sungard implementing their solution. As soon as firm dates are supplied All n One can supply those to you our clients.


We’re not resting on our laurels though and have gone with very guesstimated timelines.


Date Notes
2017-01-14 MySQL 5.7 installed on Db2 and confirmed working
2017-01-21 Demonstration instance of bxp will be rerouted to Db2 in a live environment
2017-01-28 Db1 upgrade to MySQL 5.7
2017-02-04 Live replication from Db1 to Db2 and change to all instances to allow live swap in the event of Db1 unavailable.
2017-02-05 Test failure of Db1, to simulate transfer
2017-02-11 Live setup of new infrastructure
2017-02-18 Migration of services from current infrastructure to new infrastructure (Actually moved 26th March 2017 due to staff availability)
2017-02-25 KeyStat generation and all scheduled processes to be migrated to Web3
2014-03-04 Migration of Mailbee to W3 process

If the new live server infrastructure becomes available earlier or later this schedule will alter.