Lenovo ThinkSystem SR650 Server – Universal Soldier

In this whitepaper, we will introduce you to one of the world’s best-selling servers, the Lenovo ThinkSystem SR650 Server, and review the stress test results.

Lenovo ThinkSystem SR650 is a 2U 2-socket rack server that is suitable for a wide range of small, medium and large enterprises such as: DBMS, virtualization and cloud, virtual desktop infrastructure (VDI), various enterprise applications, and business intelligence. and big data.


Speed, reliability, safety and ease of management

The SR650 has many features to enhance your productivity. This is primarily achieved through the use of the second generation Intel Xeon Scalable processor family with up to 28 cores per processor. In total, the server supports two processors and, accordingly, up to 56 cores.

The server has 24 slots for RAM, and its maximum capacity per server is up to 3 TB with a memory frequency of up to 2933 MHz.

The SR650 Disk Subsystem offers flexible and scalable internal storage in a form factor with 24 2.5 “and 2 3.5” drives for performance optimized configurations or up to 14 3.5 “drives for capacity optimized configurations, providing a wide Choice of SAS / SATA HDD / SSD and PCIe NVMe SSD. An important plus is that it is possible to use SAS, SATA or NVMe PCIe drives in the same drive bays with the AnyBay design.


In terms of power and cooling availability, the SR650 has redundant hot-swappable power supplies and hot-swappable fans on board.

The back panel of the server provides the ability to install up to five PCI-e, two of which can be used for graphics adapters to organize graphics VDI.

Additionally, a LOM card is provided, which can provide 4 more ports of 1 or 10 Gbit Ethernet.

Also, the SR650 server is extremely convenient in terms of modification and maintenance, the server cover can be removed without using a screwdriver, which significantly saves the administrator’s time.

Powerful management features based on the XClarity management software suite simplify both local and remote administration of the SR650.

The XClarity Controller provides advanced management, monitoring and alerting functionality.

XClarity Controller allows you to continuously monitor system parameters and automatically trigger alerts to perform recovery actions in the event of a failure.

Built-in XClarity Provisioning Manager simplifies system setup, configuration, and updates.

To manage a large fleet of servers, Lenovo XClarity Administrator is supported, which provides comprehensive centralized management from anywhere – not only from a computer, but also using the Lenovo XClarity mobile application.


For integration with other popular management systems, Lenovo XClarity Integrators is provided, which supports VMware vCenter and Microsoft System Center, extending the XClarity Administrator functionality to virtualization management software tools and allowing the administrator to deploy and manage infrastructure from start to finish.

Fight check. Load test results

In March 2019, TPC Benchmark-E (TPC-E) conducted stress testing of the SR650 server as a server for the high-load MS SQL Server DBMS.

About the TPC-E benchmark

TPC Benchmark E (TPC-E) is an online transaction processing (OLTP) workload. It is a mixed load of read and update-intensive transactions that mimic complex OLTP application environments. The database schema, data populations, transactions, and test implementation rules were designed to provide a broad overview of the workloads in modern OLTP systems. The benchmark examines a wide range of system components associated with such environments, which are characterized by:

  • Simultaneous execution of several types of transactions of varying complexity;
  • Balanced combination of disk I / O and processor utilization;
  • Integrity of transactions;
  • Combination of unified and heterogeneous data access;
  • Databases with realistic content, consisting of many tables with a wide variety of sizes, attributes and relationships between them.

For this test, the TPC-E test simulates the OLTP workload of a brokerage company. The benchmark focuses on a central database that executes transactions associated with the company’s customer accounts. For the purpose of measuring the performance characteristics of a database system, the benchmark does not attempt to measure the complex flow of data between multiple application systems that might exist in a real environment.

The various types of transactions simulate a company’s interactions with its customers and business partners. Different types of transactions have different execution time requirements.

The test determines:

  • Two types of transactions to simulate customer-to-business and business-to-business transactions (ie, interaction between business partners);
  • Several transactions for each type of transaction;
  • Different execution profiles for each type of transaction;
  • A special combination of execution times for all defined transactions.

For example, a database will concurrently execute transactions generated by systems that interact with customers, along with transactions that are generated by systems that interact with financial markets as well as administrative systems. The benchmarking system will interact with a set of applications that simulate different sources of transactions.

The TPC-E performance metric is a measure of “business throughput”, which reflects the number of completed transactions with the results of business (trade) activities processed per second, as well as the cost of one transaction in dollars. Multiple transactions are used to simulate transaction processing business activity, and each transaction requires a response. The performance metric for the benchmark is expressed in Transactions Per Second-E (tpsE).

Test stand composition

Server – Client DBMS: Lenovo ThinkSystem SR650:

  • 2xXeon Platinum 8168 2.7 GHz (2 CPUs / 48 cores / 96 threads)
  • 96 GB RAM
  • 2x300GB SAS HDD RAID-1

DBMS Server: Lenovo ThinkSystem SR650:

  • 2xXeon Platinum 8260 2.7 GHz (2 CPUs / 56 cores / 112 threads)
  • 1536 GB RAM
  • 2x800GB SAS SSD RAID-1
  • 6x800GB SAS SSD RAID-10
  • 4xLenovo Storage D1224 (12 Gbs SAS disk shelves, 74×800 GB SAS SSD configured in two RAID groups: 4×17 RAID-5, 1×6 RAID-10)

Servers are interconnected using 4 links 10 GbE

A more detailed stand configuration is shown below.

Detailed data on hardware configuration, software, as well as testing methodology can be viewed directly in the TPC-E report, which is in the public domain: tpc.org/4084

Test results

Test results consist of three groups of tests:

  • Normal operating mode
  • Data availability
  • Disaster recovery

Regular mode

Routine testing consists of two stages. At the first stage, the system is “warmed up” to bring it to a state of stable workload, the measurement of which, in turn, is the goal of a regular test.

In steady state benchmark results, the SR650 maintains a rate of just over 7000 transactions per second-E.

The results of routine testing are shown in the graph below.

Data availability

When measuring data availability, various operations are performed that simulate disk subsystem failures.

Data availability is demonstrated by the fact that an application can support database operations with full data access after permanent fatal failures of any single disk containing database tables, recovery log data, or database metadata.

Data availability tests are performed by disabling disks that store various types of data while monitoring application access to the data.

Below are the types of disk arrays that store different types of data.

During the data availability test, the following steps were taken:

  • A disk failure in the database log array (disk physically removed from the server) caused.
  • After 5 minutes, a second disk failure is triggered in the same way, which in this case is running on the tempdb array.
  • After another 5 minutes, a third disk failure is caused, which works directly with the DBMS data.

Since all arrays are protected by different RAID levels, there was no stoppage of data access, and, apart from a short-term performance drop, there was no effect.

A few minutes later, three new disks were installed in succession to replace the “failed” ones, and the process of restoring the data array began. The rebuilding process has dramatically reduced performance. This is normal behavior, because until all datasets are fully recovered, some of the I / O resources will be spent on rebuilding and not on DBMS operations.

Below is a graph for testing data availability.

Disaster recovery

The final test for disaster recovery is the process of restoring the system as a whole after a serious accident that completely disabled the database server. Disaster recovery is considered successful when the workload returns to normal ~ 7000 tpsE.

The following steps were taken to test the disaster recovery:

  • All power cords were removed from the DBMS server, causing it to immediately shut down. All contents of the main memory and server caches have been lost. All of the RAID disk controllers inside the server were running without batteries, so all disk controller cache content was lost as well.
  • Power cables are connected and the DBMS server is turned on.
  • Removed all data and log files for tempdb.
  • SQL Server started. It automatically started restoring the database. The timestamp in the SQL Server error log of the first message associated with the database tpce is considered the start of database recovery.
  • The SQL Server error log “Recovery complete” is considered the end of the database recovery. In total, the data recovery process took just over 15 minutes.

Since there was a time interval between the end of the database recovery and the beginning of the recovery of all applications, and a number of transactions had to be restarted (and not just continued), these transactions began to be performed only after the database was restored (see the red line on the graph), which took more about 10 minutes.

Thus, the end of disaster recovery is the complete recovery of the workload of all applications, i.e. the point in time on the graph where the blue and red lines will reach the nominal value of ~ 7000 tpsE.

Total:

  • The time to restore the database is 00:15:33.
  • Application recovery time is 00:10:06.
  • Full disaster recovery time is 00:25:39.
  • The summary of the report broken down by transaction type is presented below:

The final conclusion with the tpsE score and the cost of one transaction is presented below:

7012.53 tpsE with a transaction value of $ 90.99 ranked second in the TPC-E Top Performance Results, with the top-end Lenovo ThinkSystem SR860 V2 ( tpc.org/tpce/results/tpce_perf_results5.asp?resulttype=all ), as well as the third place in the TPC-E Top Price / Performance Results, where SR860 is also in the first place, and in the second – competitor’s decision

This is a very decent indicator. The end result is a powerful, flexible, manageable and reliable server that, like other Lenovo products, is also reasonably competitively priced. It is this combination of qualities that has made the Lenovo ThinkSystem SR650 the best-selling Lenovo server in the world. You can apply for the Lenovo ThinkSystem SR650 server by link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *