configure VMs with Multiple Socket or Multiple Cores per Socket? How NUMA configuration affects VM performance

Socket or Multiple Core Assignment in Virtual Machine:

There is always confusion in student’s mind while assigning number of socket vs number of cores to VM during Virtual Machine Creation.
Surprisingly most of the students didn’t even notice this option while assigning vCPUs to VM? Few had doubt about does it really matter assigning more no of cores instead of more no of sockets to VM? Does Socket or Core configuration does make any difference in VM performance?

In this article, I’ll try my best to answer all your queries & your feedback will be really appreciable.
Before Digging directly into VMware Implementation about virtual sockets or core. Let’s clear the basics first about Socket vs Core.

What Is Socket?
Its a Physical socket on motherboard where physical processor fits in. Socket can contain multiple Cores.

What Is Core?
Physical core within a Physical Processor which actually performs computational work or code execution. Its a complete private set of
registers, execution units and queues required to execute programs.

SocketCoreImage

Earlier days, we used to have 1 core per socket which means single processing unit to perform all computational tasks.

1 Socket = 1 Core = 1 Physical CPU

Drawbacks of Single Socket Core Processor
Single Socket Core configuration doesn’t work well in multi-threaded environment and all multi thread gets executed in sequential manner due to 1 Core available on physical processor.
As a result, System becomes slower.

Evolution of Multiple Core Processor to Support Multithreaded Environment
To support multi threaded environment & increasing system performance, CPU manufacturers added additional “cores,” or central processing units on single physical processor. For ex. A quad-core CPU will have four central processing units so called “cores” on single chip.
These four cores will appear as 4 CPUs to operating system and multi-threaded applications can be scheduled simultaneously on these 4 Cores by operating system. As a result,different processes will be running on each core at the same time. This speeds up system performance and gives us multitasking executional environment.
1 Socket = Multiple Cores = Multiple Processing Units

Socket Limitations of General Purpose Operating System
Few of the Operating systems are hard limited to run on a fixed number of CPUs. Operating systems vendors restricts to use limited no of physical CPUs even though more Physical CPUs are available due to socket limitations of Operating system.
For example: Windows server 2003 standard edition is limited to run on upto 4 cpus. If we install this operating system on 8 socket Physical box, it will run only on 4 of the CPUs.
But the catch here is these OS vendor restricts the number of physical CPU ( sockets) and not the number of Logical CPU ( known as Cores).

How Socket Limitation Issue got resolved by CPU vendors
Industry vendor started adding more cores to single socket & operating system took advantage of multi-core CPUs. For ex: Now, if we install Windows Server 2003 standard edition on quad core dual socket processor system. (2 Socket * 4 core) = 8 Physical CPUs.
Operating system will be able scheduled instructions on all the 8 Physical CPUs because sockets still being used 2 but now no of cores had been increased per socket to avoid OS Socket Limitations.
For ex: If any general purpose OS has limitation of 2 socket, but application requires to use 8 PCPUs.
Now with multi-core implementation, we could expose 8 PCPUs to application using 2 socket * quad-core system.
2 Socket * 4 Core = 8 PCPUs.
I hope now you would have understood what is OS socket limitations and how it was overcome in physical world.

How VMware Addressed Socket Limitation Issues at Virtual Machine Guest OS level VirtualSocketBlogImage06   

Prior to vSphere 4.1, there were no specific options called socket or core for vCPU assignment to VM.The only option which was available was “No of logical processors” which internally translates to no of sockets.

Prior to 4.1, by default VMkernel used to create 1 core per socket for each vCPU assigned to Guest OS.
For ex: An vSphere admin needs to create VM with 4 vCPU, he used to specify “no of logical processor -> 4”. VMkernel used to create 4 Virtual Sockets & each Virtual Socket will have 1 core assigned to it.
Below is example of vCPU assignment to Guest OS prior to ESXi4.1
16 vCPU -> 16 Socket * 1 Core
10 vCPU -> 10 Socket * 1 Core
8 vCPU -> 8 Socket * 1 Core
6 vCPU -> 6 Socket * 1 Core
VMware implementation of assigning 1 socket for each core restricts few Operating system vendors to use limited no of physical CPUs even though vCPUs are more due to socket limitations of Operating system.

Like Physical Environment, VMware also implemented multi Cores per socket to overcome Guest OS socket limitations. Now Virtual Machine running win2003 standard edition configured with 1 Virtual Socket and 8 Cores per socket allows operating system to utilize all the 8 vCPUs.

Just to show you guys how it works, I initially configured VM with 8 vCPUs and each core presented as single Socket

VirtualSocketBlogImage01

When reviewing the CPU configuration inside the guest OS, the task manager shows 4 vCPUs.

VirtualSocketBlogImage02

Reconfigured the machine with 8 vCPUs containing 1 Socket and 8 cores per Socket

VirtualSocketBlogImage03

After powering on the virtual machine, Guest OS sees 8 vCPUs.

VirtualSocketBlogImage04

Does MultipleCoresperSocket affects Virtual Machine Performance?
Now we could understand that how Guest OS socket limitations overcome by assigning more cores to virtual Sockets in VMware world. But still questions remains the same. Does it make any impact on Virtual Machine Performance to use more sockets vs more cores?

In brief, The Answer is NO.
There is no performance impact between using Virtual Sockets or Virtual Cores while assigning vCPUs to Virtual Machine. But Above statement is only true as long as the total size of Virtual Machine does not exceed Physical NUMA Node.
However as soon as vNUMA used, core per socket can have a real impact.
Why Virtual Socket and Virtual Core doesn’t have real impact on Virtual Machines? Why vNUMA impact Virtual Machine Socket configuration but pNUMA doesn’t? I’ll be covering all the details about pNUMA and vNUMA in my upcoming articles.

I hope this article will help you to understand socket vs core configuration done during vCPU assignment to any Virtual Machine.

Please don’t forget to share your comments and rating for this article.

 

In my last blog, we discussed about Virtual Sockets, Virtual Cores, Guest OS socket limitations & how VMware addresses these limitations in vSphere Environment.

Let me ask you same question again posted in my last blog:

Question:
Below are the setup details:

ESX Server configuration is : 2 Socket * 2 Core per Socket
VM1 Configuration is : 1 Socket * 4 Core Per Socket
VM2 Configuration is : 4 Socket * 1 Core Per Socket
Here Assumption is: VMs doesn’t have any Socket Limitations?
Both the VMs are running CPU & Memory intensive workloads.

The question is which VM will perform better and why?

Answer: Assuming you all guessed correctly based on our earlier discussion, Both VMs will perform equally. In other words, no of sockets & no of cores allocation doesn’t impact VM performance at all. There is no performance impact using virtual sockets or virtual cores.

Why VM performance doesn’t get impacted with virtual socket or core allocation:

VM remains intact because of the power of abstraction layer. Virtual Socket and Virtual Core are logical entities defined by VMkernel for vCPU configuration at VM level. When we run a operating system, Guest OS detects hardware layout within Virtual Machine like no of socket and core available at Guest OS level & it schedules instructions accordingly. For ex, In case of Guest OS socket limitations, it will try to exercise more no of cores rather than using more sockets.

As I said, the scope of virtual sockets and virtual core is only at Guest OS level. The VMkernel schedules a VMM process for every vCPU assigned to Virtual Machine.
The vCPU configuration from VMKernel perspective is sum = core * number of sockets. For ex: in above scenario, VM1 would require = 1 * 4 = 4 vCPUs
VM2 would requires = 4 * 1 = 4 vCPUs
In Conclusion, from VMkernel perspective, both the VMs requires equal amount of vCPUs regardless of no of socket or no of cores per socket allocated to Virtual Machine.

Virtual Sockets & Virtual Core scope is limited to Guest OS level. At VMKernel level, Total no of sockets & Cores gets translated into no of vCPUs which gets mapped to Physical CPUs done by CPU scheduler.

Let’s explore the example of 2 virtual socket 2 virtual core configuration

virtualSocketVirtualCore

The light blue box shows the configuration the virtual machine presents to the guest OS. When a CPU instruction leaves the virtual machine it get picked up the Vmkernel. For each vCPU the VMkernel schedules a VMM world. When a CPU instruction leaves the virtual machine it gets picked up by a vCPU VMM world. Socket configurations are transparent for the VMkernel

There is another tweak in the story, If your VM is configured with 8 vCPUs or greater than that in such cases, no of virtual sockets will impact Virtual Machine Performance because of vNUMA gets activated.

Again, if VM is configured with more than 8 vCPUs then by default in vSphere 5.0, vNUMA gets activated and VMkernel presents Physical NUMA topology like NUMA client and NUMA node directly to Guest OS for better scheduling decision in the Guest OS. In vSphere 5.0, vNUMA is enabled by default on VMs greater than 8 vCPUs.

In such vNUMA scenarios, Virtual Machine performance directly depends on No of Sockets presents in the Guest OS. It is because NUMA node creation happens on the basis of no of sockets populated to Operating Systems. More Sockets more NUMA nodes and more NUMA nodes means better performance.

Let’s Deep-Dive into NUMA Architecture Concepts.

 

WHAT IS NUMA?

Definition from WikiPedia: 
“Non-Uniform Memory Access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to a processor. Under NUMA, a processor can access its own local memory faster than non-local memory, that is, memory local to another processor or memory shared between processors.”

NUMA architecture is a shared memory architecture that describes the placement of main memory modules with respect to processors in a multiprocessor system.

“Ignorance of NUMA can result in a applicaton performance issues”

Background Of NUMA Architecture:

 

UMA ( Uniform Memory Access)

Perhaps the best way to understand NUMA is to compare it with its cousin UMA, or Uniform Memory Access. In the UMA memory architecture, all processors access shared memory through a bus (or another type of interconnect) as seen in the following diagram:

UMA

UMA gets its name from the fact that each processor must use the same shared bus to access memory, resulting in a memory access time that is uniform across all processors. Note that access time is also independent of data location within memory. That is, access time remains the same regardless of which shared memory module contains the data to be retrieved.

 

NUMA ( Non-Uniform Memory Access)

In the NUMA shared memory architecture, each processor has its own local memory module that it can access directly with a distinctive performance advantage. At the same time, it can also access any memory module belonging to another processor using a shared bus (or some other type of interconnect) as seen in the diagram below:

NUMA

Why NUMA is better than UMA

In NUMA, As its name implied, Non-Uniform access of memory introduce different memory access time with the location of the data to be accessed.
If data resides in local memory, access is fast.
If data resides in remote memory, access is slower.
The advantage of the NUMA architecture as a hierarchical shared memory scheme is its potential to improve average case access time through the introduction of fast, local memory.
NUMA-1
In Conclusion, NUMA stands for Non-Uniform Memory Access, which translates into a variance of memory access latencies. Both AMD Opteron and Intel Nehalem are NUMA architectures. A processor and memory form a NUMA node. Access to memory within the same NUMA node is considered local access, access to the memory belonging to the other NUMA node is considered remote access.

How NUMA Node gets created

NUMA node creation is based on number of sockets & memory for each NUMA node gets calculated by dividing total system memory with No of NUMA nodes.
If physical system configured with 4 Sockets * 4 Cores per Socket and Total Memory is 12GB.

In this case, Total NUMA node created = 4
memory allocated to each NUMA node = 12/4 = 3GB

Case Study 1 : OS is not NUMA aware

Physical system configured with 4 Socket * 4 Core per socket and memory 12Gb
Multi threaded SQL applications along with some general purpose applications running on OS installed on above mentioned system.
Since OS is not NUMA aware, In worst case scenario of CPU allocation, multiple threads ( 4 thread) of SQL application can be scheduled on 4 different cores of 4 different NUMA node. In this case, lot of data will be accessed through remote memory over interconnect link which in result cause increase in memory latencies and reduce overall application performance.
refer below diagram:

Non-NUMA-CPU-Placement

Case Study 2: OS in NUMA aware

Now Since OS is NUMA aware and had complete view of NUMA nodes of Physical System so it will try it best to schedule multiple threads of same application in single NUMA node to avoid Remote Access Memory & using Local Memory of that node as much as it can for better performance.
In this example, all the 4 threads of SQL application will be scheduled on 4 cores of single NUMA node decided by NUMA aware CPU scheduler of OS. Since all the threads will be accessing local memory assigned to NUMA node so no data will be accessed over remote memory which in result improvise overall application perfomance.

Refer Below Diagram:

NUMA-CPU-Placement

That’s the reason, NUMA plays very important role & it can seriously influence performance of memory intensive workloads.

I hope this article helps you guys to understand the basics of NUMA architecture and how NUMA influence workload performance.

In my upcoming articles, I will be covering few more details about NUMA w.r.t ESXi Environment like:

How ESXi NUMA scheduler works? How pNUMA is different than vNUMA? How vCPU sizing impact NUMA scheduler in ESXi environment? Understanding NUMA stats using esxtop command?

Please Feel Free to post your queries if you have anything. I would be happy to answer your queries. Please don’t forget to write comments or feedback about this article.

http://www.govmlab.com/should-i-configure-vms-with-multiple-socket-or-multiple-core-per-socket/

http://www.govmlab.com/how-numa-configuration-affects-vm-performance/

 

10 Virtual Machine Files every admin needs to know

Its been more than 10+ years working in Virtualization industry and didn’t find any single day without dealing with Virtual Machine. Virtual Machine is one of the most important and critical entity of entire datacenter.
If Virtual machines are not working or becomes inaccessible, every other issue whether its Networking, Storage, Cluster etc.. becomes second priority.

I thought of sharing my knowledge w.r.t Virtual Machines on this blog.
I know many of you would already be aware about these files and their importance, but for those who are new to Virtualization and Vmware Technology, looking out to understand Virtual Machine and Its bundle of strange files might find this blog useful.

I’ll be covering Virtual Machine basics & Its anatomy, Files makes up Virtual Machine and its options.

 

WHAT IS VIRTUAL MACHINE

I have seen most of the IT professional, admin refer Windows or Linux Operating system running on ESXi host as “Virtual Machine”. In true sense, Its not true 100% percent correct.
Let me ask you 1 question,  What do we call Physical Machine having all the required hardware but no OS installed in it.
Answer is, We call it Physical Machine or Physical Host.

Similarly, In Virtual Environment, Virtual Machine refers to a software container made up of Virtual Hardware pieces required to run instance of Guest Operating system like Windows, Linux.

“Virtual Machine is just a software container made up of different Virtual hardware components to run the instance of Guest OS installed in this software container

 

WHAT KIND OF HARDWARE MAKES UP A VM

By default, Vmware ESXi presents the following generic hardware to VM:

  • BIOS
  • Intel Motherboard
  • LSI, BUS or PVSCSI SAS Controller
  • CD-ROM and Floppy Drive
  • AMD or INTEL CPU depending upon the underlying hardware
  • Intel E1000, VMXNet3 driver
  • Standard VGA adapter
  • USB controller

virtualmachinehardware

Just like Phyiscal Machine, a VM is VM before the installation of Guest Operating System( the term “Guest operating system” is used to denote an OS installed in a VM)

From the ESXi Host pespective, a Virtual Machine is just bunch of files stored on supported storage.

 

FILES THAT MAKES UP A VM

vmxvmdkfiles

  • VMX file – VM configuration File
  • VMDK files – Consists Disk geometry information
  • Flat-VMDK Files – Consists actual OS files
  • VSWP File – Swap Memory File
  • VMSD – Snapshot Descriptor File
  • NVRAM – BIOS information
  • VMSS – Memory information of Suspended VM
  • .log – VM logging information
  • VMSN – Memory State of Snapshot VM
  • delta.vmdk – Disk state of Snapshot VM

VMX FILE:

vmx file is configuration file of a virtual machine. All the Virtual Machine hardware that resides in a VM define by this file. For ex. Entry for network configuration details of VM

guestOS = “windows8srv-64”
memSize = “3072”
ethernet0.virtualDev = “e1000e”
ethernet0.networkName = “VLAN-25”
ethernet0.generatedAddress = “00:50:56:8b:7f:44”

Each time when we create a Virtual machine using New Virtual Machine Wizard, vmx file appended with each questions answered regarding the guest OS size, name, network or disk configuration.
Whenever we edit Virtual machine settings, this files gets updated to reflect those changes.

This file consists below details:

  • No of processors
  • Memory Size
  • No Of Network Adapters
  • Network driver details
  • MAC address
  • SCSI Controller
  • SCSI driver
  • VC UUID
  • BIOS UUID
  • Virtual Machine Hardware Version

VMDK File

This is Virtual Machine Disk File describes the geometry of the Virtual Disk Drive. This descriptor file is header file contains only configuration information and pointers to the Flat-.vmdk file. VMDK consists hard drive information like disk sectors, no of cylinders, Head and adapter type.

ddb.adapterType = “lsilogic”
ddb.geometry.cylinders = “5221”
ddb.geometry.heads = “255”
ddb.geometry.sectors = “63”

FLAT VMDK FILE

This Flat VMDK file contains actual data for the Virtual Hard disk. Naturally, this means that the VMDK header file is typically very small hardly in Kbs while the VMDK flat file could be as large as the configured Virtual Disk Drive defined at the time of VM creation.
For ex : a 40GB virtual Disk could mean a 40GB FLAT-.VMDK file.
VMDK file is plain text file and human readable but FLAT VMDK file is binary file which can’t be read.

VSWP FILE

This is Virtual Machine Paging file and its size is equal to memory assigned to VM in its VM settings. For ex: if VM is configured with 4GB Virtual memory then vswp file size will be 4GB.

VMSD FILE

This file is snapshot descriptor file used for storing metadata information about the snapshots of virtual machine. This file consists of snapshot name, snapshot memory state and snapshot disk state.

VMSS FILE

This file gets created when virtual machine gets into suspended state and it preserves the memory contents of VM when VM was put into suspend state. When VM resumes it, memory content preserved in this file written back to Host memory.

VMSN FILE

This file is used to store the exact state of the Virtual Machine when snapshot was taken. Using this snapshot file, Virutal Machine memory state can be stored and switch back to state when snapshot was taken.

DELTA-VMDK File

These Virtual Disk file gets created when we take a snapshot of VM. Its also known as redo-logs or differntial Disk file. This delta file gets created everytime when snapshot was taken and contains incremental disk changes.

NVRAM FILE

This File contains BIOS information of Virtual Machine

LOG FILE

These are Virtual Machine logging file captures all the logging information of VM and widely used for troubleshooting purpose. The current log file will always be named as vmware.log and upto six older files will be retained with a number at the end of their names ( vmware-1.log, vmware-2.log)

 

http://www.govmlab.com/10-virtual-machine-files-every-admin-needs-to-know/

 

Fully/Para-Virtualization

Full virtualization using binary translation

 

Full Virtualization is a technique that provides entire simulation of the underlying hardware. Certain protected instructions must be trapped and handled by the VMM (Virtual Machine Monitor) because the guest OS believes that it owns the hardware but in fact the hardware is shared through the VMM. To overcome this, binary translation is employed which translates the kernel code so that instructions that cannot be virtualized are replaced with new instructions that will have the same effect on the virtual hardware. Another technique used in Full Virtualization is direct execution, in which the user level code is executed directly on the processor so that higher performance can be achieved.

2-virtual-server-rings-and-levels

A result of this approach is the Guest OS is fully abstracted from the underlying hardware by the virtualization layer, therefore the Guest OS does not know that it is being virtualized and thus, it does not need any modifications (Figure 1).  Full virtualization is the only out of the server virtualization techniques that does not require hardware or operating system assistance because the VMM translates all the instructions and it allows the user level applications to run unmodified at native speed.

Advantages

  • Full virtualization provides complete isolation of the virtual machines
  • Operating systems can be installed without any modification
  • Provides near-native CPU and memory performance
  • It offers flexibility because many different operating systems and versions from different vendors  can be installed and run.
  • Because the guest OS remains unmodified, migration and portability is very easy.

Disadvantages

  • Requires the correct combination of hardware and software elements
  • Performance can be affected because of the trap-and-emulate techniques of x86 protected instructions.

Example 

  • VMware
  • Hyper-V

OS assisted virtualization or paravirtualization

Paravirtualization is the virtualization technique in which the guest OS is modified so that it can communicate with the hypervisor (VMM). In paravirtualization the kernel of the OS is modified to replace instructions that cannot be virtualised with hypercalls that can communicate directly with the virtualization layer hypervisor (VMware, 2007b).  The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management and interrupt handling. In this technique some but not all of the underlying hardware are simulated.

The_Paravirtualiz_ation_approach_to_x86

The guest OS in paravirtualization knows that it is being virtualised in contrast to full virtualization and therefore it achieves greater performance than full virtualization because the guest OS communicates directly with the hypervisor so overheads needed for emulation are reduced.

Advantages

  • Easier to implement than full virtualization where no hardware assistance is available.
  • Greater performance because overheads from emulation are reduced.

 Disadvantages

  • Modification required for the guest OS
  • The modification of the guest OS results in poor portability and compatibility.

Example 

  • Citrix-XEN Server

Hardware Assisted Virtualization

Hardware-Assisted-Virtualization

First generation enhancements include Intel Virtualization Technology (VT-x) and AMD’s AMD-V which both target privileged instructions with a new CPU execution mode feature that allows the VMM to run in a new root mode below ring 0. As depicted in Figure 7, privileged and sensitive calls are set to automatically trap to the hypervisor, removing the need for either binary translation or paravirtualization. The guest state is stored in Virtual Machine Control Structures (VT-x) or Virtual Machine Control Blocks (AMD-V).

Example 

  • Intel-VT
  • AMD-V

 

http://www.govmlab.com/news-section-2/

 

Why Virtualization?

In Traditional x86 hardware, there was was a rigid 1-1 mapping between hardware, an instance of an operating system and a single software application. That rigid model lead to  tremendous under-utilization of hardware resources. The industry statistic is that in this traditional model, servers are utilized only 5-15%. This is a huge problem for companies – having a very large pool of resources that stays idle most of the time.

But the story doesn’t end there – the server sprawl and the associated underutilization of resources have ripple through effects for the entire environment – server sprawl means not only wasted investment in hardware, but also unsustainable power, cooling, and real estate costs. This tremendous complexity means that it is hard to provision new infrastructure and to respond to changing business needs. IT departments are stuck wasting cycles on repetitive tasks, and don’t have time to focus on what really matters.

For example, in most companies a single sys admin can support only up to 20 servers, and the time for provisioning a new server is often 6-8 weeks.

Refer below image describing the problem faced in Physical Environment

Collages1

 

WHAT IS VIRTUALIZATION

When people talk about Virtualization, they’re usually referring to Server Virtualization, which means partitioning one physical server into several virtual servers, or machines.

In another words, Virtualization refer to technologies designed to provide a layer of abstraction between computer hardware systems and the software applications running on them.

Virtualization enables you to run multiple virtual machines on a single piece of hardware, allowing for numerous operating systems and applications to be run on a single server.

For Ex. Running Windows & Linux on the same Hardware.

 

vmware-640x310

 

  BENEFITS OF VIRTUALIZATION

Virtualization can increase IT agility, flexibility, and scalability while creating significant cost savings. Workloads get deployed faster, performance and availability increases and operations become automated, resulting in IT that’s simpler to manage and less costly to own and operate.

  • Reduce capital and operating costs.
  • Deliver high application availability.
  • Minimize or eliminate downtime.
  • Increase IT productivity, efficiency, agility and responsiveness.
  • Speed and simplify application and resource provisioning.
  • Support business continuity and disaster recovery.
  • Enable centralized management.
  • Build a true Software-Defined Data Center.

benefits-of-virtualization

HardwareUtilization

In a Nutshell, Virtualization is a proven way to reduce the complexity of your IT network which simplifies operations and on-going maintenance.  Deploying a dynamic, virtualized system considerably lowers costs and resources that are currently necessary to support your existing IT network.

By consolidating existing applications onto a fewer number of servers, your enterprise will be able to reduce capital expenditures of hardware, decrease the amount of time for routine administrative tasks by IT personnel and diminish electrical energy usage.

TheResultsAreTransformational

Virtualization provides built-in agility to manage today’s network with increased application availability and data recoverability, while being nimble to quickly adapt to the future IT needs necessary to grow your business.

http://www.govmlab.com/whyvirtualization/