Official guide I do not remember.This is for performance troubleshooting:
http://www.vmware.com/resources/techresources/10066This is the official course: VMware vSphere: Troubleshooting V4 - http://mylearn.vmware.com/mgrreg/courses.cfm?ui=www_edu&a=one&id_subject=17829
Thursday, June 3, 2010
ESX Troubleshooting
Have a look at:Vmware ESX Troubleshooting Guidehttp://www.petri.co.il/5-critical-vmware-esx-cli-network-commands.htm
And for documentshttp://download3.vmware.com/vmworld/2006/mdc9694.pdfhttp://download3.vmware.com/vmworld/2006/mdc9807.pdfhttp://download3.vmware.com/vmworld/2006/tac0028.pdfhttp://download3.vmware.com/vmworld/2006/tac9689-b.pdf
xtravirt published a HA troubleshooting whitepaperhttp://xtravirt.com/xd10005#
There is also a new VMware 3 days course specific for troubleshooting.
And for documentshttp://download3.vmware.com/vmworld/2006/mdc9694.pdfhttp://download3.vmware.com/vmworld/2006/mdc9807.pdfhttp://download3.vmware.com/vmworld/2006/tac0028.pdfhttp://download3.vmware.com/vmworld/2006/tac9689-b.pdf
xtravirt published a HA troubleshooting whitepaperhttp://xtravirt.com/xd10005#
There is also a new VMware 3 days course specific for troubleshooting.
Wednesday, June 2, 2010
Thursday, May 13, 2010
Public Vs. Private Vs. Hybrid
As if cloud computing weren’t a hard enough concept to grasp, there are gradations emerging that make the concept even more complicated. Relax. It’s a natural evolution, as the capabilities behind the cloud concept become more explicit. What’s important is to realize the potential and start planning for it.
Public cloud computing offerings are those you with which you are most likely already familiar. Applications delivered over the Internet in the software-as-a-service model, and computing resources such as storage or compute cycles delivered in the infrastructure-as-a-service model, are the most common forms of public cloud computing.
A private cloud, also known as a corporate cloud, uses cloud-like infrastructure and technology, such as virtualized servers in a scalable architecture, to run applications behind the corporate firewall.
A hybrid model takes advantages of both of these types of structures. An organization may choose, for example, to run its e-mail system in the public cloud while keeping highly sensitive, customer-oriented applications behind the firewall.
What model you choose may depend on several factors: size of organization, IT resources, time to market (speed of implementation), security requirements. For instance, SaaS in the public cloud provides organizations with limited resources a way to implement a needed application quickly and with low upfront costs. A private cloud, on the other hand, requires significant initial investment but offers behind-the-firewall security assurance.
Organizations of all sizes make use of infrastructure-as-a-service resources to boost capacity or support new systems. But are they making the best use of cloud computing’s potential? Not according to Doug Hauger, general manager of Microsoft’s cloud infrastructure group, as quoted in a recent InformationWeek article. “If you’ve got a hairball in your data center, and you move that hairball to an infrastructure-as-a-service, and you don’t rework it, it’s still just a hairball,” Hauger said.
Which brings us to another important point: writing applications optimized for the cloud computing architecture. That’s where platform-as-a-service comes in. Oh, I didn’t mention that there’s another form of cloud computing coming to the fore? Relax. The more the merrier, right?
Public cloud computing offerings are those you with which you are most likely already familiar. Applications delivered over the Internet in the software-as-a-service model, and computing resources such as storage or compute cycles delivered in the infrastructure-as-a-service model, are the most common forms of public cloud computing.
A private cloud, also known as a corporate cloud, uses cloud-like infrastructure and technology, such as virtualized servers in a scalable architecture, to run applications behind the corporate firewall.
A hybrid model takes advantages of both of these types of structures. An organization may choose, for example, to run its e-mail system in the public cloud while keeping highly sensitive, customer-oriented applications behind the firewall.
What model you choose may depend on several factors: size of organization, IT resources, time to market (speed of implementation), security requirements. For instance, SaaS in the public cloud provides organizations with limited resources a way to implement a needed application quickly and with low upfront costs. A private cloud, on the other hand, requires significant initial investment but offers behind-the-firewall security assurance.
Organizations of all sizes make use of infrastructure-as-a-service resources to boost capacity or support new systems. But are they making the best use of cloud computing’s potential? Not according to Doug Hauger, general manager of Microsoft’s cloud infrastructure group, as quoted in a recent InformationWeek article. “If you’ve got a hairball in your data center, and you move that hairball to an infrastructure-as-a-service, and you don’t rework it, it’s still just a hairball,” Hauger said.
Which brings us to another important point: writing applications optimized for the cloud computing architecture. That’s where platform-as-a-service comes in. Oh, I didn’t mention that there’s another form of cloud computing coming to the fore? Relax. The more the merrier, right?
Inactive VM to Activate VM On ESX
This Doc. Doesn’t carry any Client Environment screen shots or any related information.
Make sure you have done with below steps from GUI.
1. Connect VMware Infrastructure (VI) Client to the Virtual Center Server. Right-click on the virtual machine and click Power off.
2. Connect VI Client directly to the ESX host. Right-click on the virtual machine and click Power off.
*If this does not work, you must use the command line method.
Powering off the virtual machine using the vmware-cmd command
This procedure uses the ESX command line tool, and attempts to gracefully power off the virtual machine. It works if the virtual machine's process is running properly and is accessible. If unsuccessful, the virtual machine's process may not be running properly and may require further troubleshooting.
Connect via PuTTy
1. From the Service Console of the ESX host, run the following command: vmware-cmd stop Note: is the complete path to the configuration file, as determined in the previous section. To verify that it is stopped, run the command: # vmware-cmd getstate
2. From the Service Console of the ESX host, run the command: # vmware-cmd stop hard Note: is the complete path to the configuration file, as determined in the previous section. To verify that it is stopped, run the command: # vmware-cmd getstate
3. If the virtual machine is still inaccessible, proceed to the Option -2 section below.
Option -2
Powering off an unresponsive virtual machine on an ESX host via Command line.
Make sure you have the below available before you start.
1. Connect VMware Infrastructure (VI) Client to the Virtual Center Server. Right-click on the virtual machine and click Power off.
2. Connect VI Client directly to the ESX host. Right-click on the virtual machine and click Power off. If this does not work, you must use the command line method.
Determining the virtual machine's state
1. Determine the host on which the virtual machine is running. This information is available in the virtual machine's Summary tab when viewed in the VI Client page.
2. Log in as root to the ESX host using an SSH client.
3. Run the following command to verify that the virtual machine is running on this host: # vmware-cmd -l The output of this command returns the full path to each virtual machine running on the ESX host. Verify that the virtual machine is listed, and record the full path for use in this process. For example: # /vmfs/volumes///.vmx
4. Run the following command to determine the state in which the ESX host believes the virtual machine to be operating: # vmware-cmd getstate If the output from this command is getstate() = on, the VirtualCenter Server may not be communicating with the host properly. This issue must be addressed in order to complete the shutdown process. If the output from this command is getstate() = off, the ESX host may be unaware it is still running the virtual machine. This article provide additional assistance in addressing this issue.
Powering off the virtual machine while collecting diagnostic information using the vm-support script
Use the following procedure when you want to investigate the cause of the issue. This command attempts to power off the virtual machine while collecting diagnostic information. Perform these steps in order, as they are listed in order of potential impact to the system if performed incorrectly.
Perform these steps first:
1. Determine the WorldID with the command: # vm-support -x
2. Kill the virtual machine by using the following command in the home directory of the virtual machine: # vm-support -X This can take upwards of 30 minutes to terminate the virtual machine. Note: This command uses several different methods to stop the virtual machine. When attempting each method, the command waits for a pre-determined amount of time. The timeout value can be configured to be 0 by adding -d0 to switch to the vm-support command.
If the preceding steps fail, perform the following steps for an ESX 3.x host:
1. List all running virtual machines to find the VMID of the affected virtual machine with the command: # cat /proc/vmware/vm/*/names
2. Determine the master world ID with the command: # less -S /proc/vmware/vm/####/cpu/status
3. Scroll to the right with the arrow keys until you see the group field. It appears similar to: Group vm.####
4. Run the following command to shut the virtual machine down with the group ID: # /usr/lib/vmware/bin/vmkload_app -k 9 ####
Make sure you have done with below steps from GUI.
1. Connect VMware Infrastructure (VI) Client to the Virtual Center Server. Right-click on the virtual machine and click Power off.
2. Connect VI Client directly to the ESX host. Right-click on the virtual machine and click Power off.
*If this does not work, you must use the command line method.
Powering off the virtual machine using the vmware-cmd command
This procedure uses the ESX command line tool, and attempts to gracefully power off the virtual machine. It works if the virtual machine's process is running properly and is accessible. If unsuccessful, the virtual machine's process may not be running properly and may require further troubleshooting.
Connect via PuTTy
1. From the Service Console of the ESX host, run the following command: vmware-cmd
2. From the Service Console of the ESX host, run the command: # vmware-cmd
3. If the virtual machine is still inaccessible, proceed to the Option -2 section below.
Option -2
Powering off an unresponsive virtual machine on an ESX host via Command line.
Make sure you have the below available before you start.
1. Connect VMware Infrastructure (VI) Client to the Virtual Center Server. Right-click on the virtual machine and click Power off.
2. Connect VI Client directly to the ESX host. Right-click on the virtual machine and click Power off. If this does not work, you must use the command line method.
Determining the virtual machine's state
1. Determine the host on which the virtual machine is running. This information is available in the virtual machine's Summary tab when viewed in the VI Client page.
2. Log in as root to the ESX host using an SSH client.
3. Run the following command to verify that the virtual machine is running on this host: # vmware-cmd -l The output of this command returns the full path to each virtual machine running on the ESX host. Verify that the virtual machine is listed, and record the full path for use in this process. For example: # /vmfs/volumes/
4. Run the following command to determine the state in which the ESX host believes the virtual machine to be operating: # vmware-cmd
Powering off the virtual machine while collecting diagnostic information using the vm-support script
Use the following procedure when you want to investigate the cause of the issue. This command attempts to power off the virtual machine while collecting diagnostic information. Perform these steps in order, as they are listed in order of potential impact to the system if performed incorrectly.
Perform these steps first:
1. Determine the WorldID with the command: # vm-support -x
2. Kill the virtual machine by using the following command in the home directory of the virtual machine: # vm-support -X
If the preceding steps fail, perform the following steps for an ESX 3.x host:
1. List all running virtual machines to find the VMID of the affected virtual machine with the command: # cat /proc/vmware/vm/*/names
2. Determine the master world ID with the command: # less -S /proc/vmware/vm/####/cpu/status
3. Scroll to the right with the arrow keys until you see the group field. It appears similar to: Group vm.####
4. Run the following command to shut the virtual machine down with the group ID: # /usr/lib/vmware/bin/vmkload_app -k 9 ####
Building Virtual Machine using CLI
Steps :-
1- Create a Directory on vmfs partition (mkdir directory name).
2- Change to directory (cd directoryname)
3- Create .VMX file (vi vmname.vmx)
a. Use the vmname.txt as sample (Refer vi.txt for some basic commands to work with vi editor)
b. Use either of the following to map the CDROM
i. To use server attached CDROM drive, add the following line in vmname.vmx file at the end of file.
ide0:0.fileName = "/dev/hda"
ii. To use ISO image as CDROM drive, add the following line in vmname.vmx file at the end of file
ide0:0.fileName = ""
4- Create vmdk file (vmkfstools -c 10g vmname.vmdk -a lsilogic)
5- Register vmx to vmware (vmware-cmd -s register vmname.vmx)
6- Power on VM (vmware-cmd vmname.vmx start)
7- Open Console (Select Create UID)
1- Create a Directory on vmfs partition (mkdir directory name).
2- Change to directory (cd directoryname)
3- Create .VMX file (vi vmname.vmx)
a. Use the vmname.txt as sample (Refer vi.txt for some basic commands to work with vi editor)
b. Use either of the following to map the CDROM
i. To use server attached CDROM drive, add the following line in vmname.vmx file at the end of file.
ide0:0.fileName = "/dev/hda"
ii. To use ISO image as CDROM drive, add the following line in vmname.vmx file at the end of file
ide0:0.fileName = "
4- Create vmdk file (vmkfstools -c 10g vmname.vmdk -a lsilogic)
5- Register vmx to vmware (vmware-cmd -s register vmname.vmx)
6- Power on VM (vmware-cmd vmname.vmx start)
7- Open Console (Select Create UID)
VI web access issue
Location : Noida Division/Service Line : RIM Project ID : WINTEL-FREESCALE-NA Category : Wintel Source : VMware Statement related to the tips : VI web access issue Description :Issue : Web service unavailable. ( http://tx32vc01.am.freescale.net )
Figure 1.1
Port configured for Web Service in Virtual Center Managemnet Server configuration is “8080”.(Refer figure 1.2)
Figure 1.2
File Name : proxy.propertiesLocation : “C:\Program Files\VMware\Infrastructure\VirtualCenter Server\tomcat\webapps\ui\WEB-INF\classes\” on tx32vc01.
….carries “80” as the port number… “proxy.service.url = http://localhost:80/sdk “...
File Name : login.propertiesLocation :” C:\Program Files\VMware\Infrastructure\VirtualCenter Server\tomcat\webapps\ui\WEB-INF\classes\” on tx32vc01
Carries no port number “login.webServiceUrl.defaultValue=http://localhost/sdk”
Fix Given in Link as wellhttp://vmware-blog.blogspot.com/2008/08/unable-to-access-vmware-virtual.html
Figure 1.1
Port configured for Web Service in Virtual Center Managemnet Server configuration is “8080”.(Refer figure 1.2)
Figure 1.2
File Name : proxy.propertiesLocation : “C:\Program Files\VMware\Infrastructure\VirtualCenter Server\tomcat\webapps\ui\WEB-INF\classes\” on tx32vc01.
….carries “80” as the port number… “proxy.service.url = http://localhost:80/sdk “...
File Name : login.propertiesLocation :” C:\Program Files\VMware\Infrastructure\VirtualCenter Server\tomcat\webapps\ui\WEB-INF\classes\” on tx32vc01
Carries no port number “login.webServiceUrl.defaultValue=http://localhost/sdk”
Fix Given in Link as wellhttp://vmware-blog.blogspot.com/2008/08/unable-to-access-vmware-virtual.html
Insufficient disk space on datastore in service console
Insufficient disk space on datastore in service console:-
Errors:- There are errors during the remediation operation. Insufficient disk space on datastore 'service console'
Use below service console commands to troubleshooting this problem:-
1. df
2. du -h - -max-depth=1 (To check which directory is consuming max. disk space, you will come to know which directory is consuming max. space. For example- vmimages is consuming space is gb and that's the culprit of this issue)
3. cd /vmimages
4. ls
5. cd tools-isoimages
6. ls --l
7. rm RHEL4.7-X86_64-ES-DVD.iso (Remember, this is where the VMware tools images are placed. We need to be more disciplined while removing ISOs from here)
8. df (see root space utilization came down to 31% after deleting RHEL4.7-X86_64-ES-DVD.iso file !)
This is how you can look for other filled partitions and folders as well.
Errors:- There are errors during the remediation operation. Insufficient disk space on datastore 'service console'
Use below service console commands to troubleshooting this problem:-
1. df
2. du -h - -max-depth=1 (To check which directory is consuming max. disk space, you will come to know which directory is consuming max. space. For example- vmimages is consuming space is gb and that's the culprit of this issue)
3. cd /vmimages
4. ls
5. cd tools-isoimages
6. ls --l
7. rm RHEL4.7-X86_64-ES-DVD.iso (Remember, this is where the VMware tools images are placed. We need to be more disciplined while removing ISOs from here)
8. df (see root space utilization came down to 31% after deleting RHEL4.7-X86_64-ES-DVD.iso file !)
This is how you can look for other filled partitions and folders as well.
Thursday, May 6, 2010
Tuesday, May 4, 2010
Advanced Technical Design Guide
Chapter 1 – Virtualization Overview
This book is an updated version of the original ESX Server Advanced Technical
Design Guide that was written for ESX 2.x. It has been updated, added to, and rewritten
to bring it up to speed for ESX 3.0 and Virtual Center 2.0. As was the
case with the original book, this book is designed to provide you with practical,
real-world information about the use and design of VMware ESX Server systems.
In order to study the advanced technical details, we need to ensure that
you have a good baseline understanding of VMware ESX features, VMware’s
other products, and of course virtualization concepts as we know them today.
Even if you’ve already read and understand our previous book or if you’re very
familiar with previous versions of ESX or VMware server, we still recommend
that you at least skim through this chapter so that you can familiarize yourself
with the specific terms and phrasing that we’ll use throughout this book.
As you read through this book, you should keep in mind that each chapter was
written separately and that it’s okay for you to read them out-of-order. Of
course if you’re new to VMware and virtualization technologies, the chapter
sequence will guide you through a logical progression of the components. We’ll
begin in this chapter with an overview of VMware software solutions.
Virtualization - Today's Favorite Buzz Word
Standing at the RapidApp booth at VMworld this year (VMware’s annual conference),
I had one of the attendee’s approach me and ask “so do you guys
really do something with virtualization technology, or are you just another company
that throws the word virtual in front of the product and come here?” I
laughed. It was funny and any of you that have watched virtualization grow over
the past few years understands how funny that really is. Virtualization is the hottest
topic around the IT space right now. Virtualize your servers, virtualize your
network, virtualize your storage, virtualize your boss… well that last one isn’t
here yet, but we’re working on it.
18
This book deals with the market leading product ESX Server 3. In this chapter
we will give you an overview of where ESX (or as VMware marketing people
like to call the whole solution “Virtual Infrastructure 3”) fits into the server virtualization
space, the type of virtualization it does, and we will try to keep away
from all the other “virtual” products.
VMware ESX Server allows for the “virtualization” of x86-based servers. The
basic concept is that a single piece of physical hardware is used to host multiple
logical or “virtual” servers. The ability to host multiple virtual servers on a single
piece of hardware, while simple to understand in theory, can be extremely complex
in execution. It should be noted that the idea of virtual servers is nothing
new. Look to the UNIX space and you will hear terms like “Frames” and
LPARs” and can equate those with Hosts and Virtual Machines. VMware’s
place in IT shops is driven in the x86 space, where hardware is cheap (compared
to UNIX) and generally underutilized.
When VMware technology is used in an environment, a single host allows multiple
virtual servers to share the host’s physical resources. These resources include
processor, memory, network cards, and storage (We’ll refer to these four
physical resources as the “core four” throughout this book.) This architecture
also gives you the ability to “partition” your host to allow for the server’s resources
to be fully utilized while you migrate multiple physical servers to a single
VMware server running multiple virtual servers. (We’ll discuss partitioning and
migration strategies in much more depth throughout this book.)
So what does this mean to your environment or your business? It depends on
what your business looks like and which product from VMware you’re thinking
about using. Obviously if you’re reading this book then you’re at least contemplating
an ESX Server implementation. But even with ESX you have a number
of alternatives on how to deploy and utilize the system. Also, when investigating
VMware technology it is unfair to simply look at ESX and why it is used. You
can only decide to use ESX instead of GSX or Workstation after you really understand
what each product does and where they truly fit. Outside of the
VMware suite of products you may also have to look at other virtualization solutions,
like those based on XenSource. And to make an informed decision
about these solutions you need a better understanding of virtualization technology
for x86 servers.
19
Hardware Virtualization vs. Software
The word virtualization is thrown around a lot these days, and everyone claims
his way of virtualizing is better than everyone else’s. But to understand what you
are getting into you need to understand some basics. The first (and most important)
concept you need to grasp is the difference between hardware virtualization
and software. Most people in IT are aware that Intel and AMD have
released “processor” virtualization technologies for their processor. There is a
lot of talk about “moving this or that out of Ring 0 and into a Ring 1.” There is
also talk about how Intel is working on hardware hypervisors for their network
cards, and HBA manufacturers are also creating hypervisors… So what does it
all mean?
To make it simple let’s look at the processor and two major virtualization
“products.” First let’s look at XenSource (pronounced zen) and Xen based
products like Virtual Iron. Xen is basically a free virtual machine monitor (a
chunk of software that allows for multiple operating systems to run on a single
piece of hardware) that grew out of the Linux community and originally would
require a modification of the guest operating system to run properly. There is
more detail and strategy on this product in chapter 2, but for now let's just take
the basics.
The reason this modification was required was due to the x86 processor architecture
and its concepts of security rings (ring 0 through 3). In virtualization the
host operating system generally will run at ring level 0, the most secure part of
the processor if you will, with Virtual Machines running at ring level 3. Of
course operating systems (like Windows or Linux) are written so that they require
full control of ring level 0. So when these OS’s are run as a guest virtual
machine they need to be modified so that the guest OS is not making ring level
0 calls (or in Windows system/kernel calls) to areas of the processor to which it
doesn’t have access. With the processor virtualization technology that has been
introduced, it is basically a hardware change to allow for a “virtual Ring 0” if
you will, labeled ‘Ring 1.’ The idea is that the manufacturer can make changes to
move the required instructions up a level (to ring 1) and then the guest OS’s are
fooled into thinking they have full control of ring 0 all the time.
Before these processors were released, using Xen required that you modified
the kernel in the guest OS and host. For Windows shops this was not an option
and slowed the adoption of Xen as a virtualization solution. With Xen out,
20
most Windows environments adopted ESX server since it allows the VM to
execute against the processors (direct execution) without modifications to the
guest OS at all.
ESX did (and does in 3.0) what I like to think of as processor thunking. Remember
when you ran 16 bits apps on a 32 bit system, and the process for remapping
the 16 bit addresses to 32 bit addresses was termed as “thunking.”
Well I like to think of ESX doing the same thing, but on the processor side.
ESX essentially allows the guest OS to see the processor as it is. It then allows
direction execution on the processor (even to ring 0) from the guest OS. The
trick here is that the ESX kernel also runs at ring level 0, and each time a call
from the guest comes down, the kernel essentially has stopped executing itself
on ring 0, allowing the execution of the command from the guest (there is a
binary translation involved) then sending the results back to the guest and resuming
its operations.
This may sound like a ton of overhead, but it’s not. Sure there is some, but essentially
both forms of virtualization perform about the same, and in either case
there is still a need for some type of virtual machine monitor/hypervisor to
manage the guests and their access to resources.
Which brings us to our next point, no matter what the hardware manufacturers
do (processor, NIC, HBA or Memory), there will, for the foreseeable future, be
a hypervisor like Xen or ESX. A software layer will be required to manage all
these hardware devices and their unique one-use hypervisors. The decision you
have to make is which piece of software to use. Xen and VMware ESX are the
two “big name” players with Microsoft Virtual Server following behind until
their Longhorn hypervisor is publicly available. We’ll get into the difference between
hosted and bare metal architectures in the following pages, but at least
now you can put to rest the arguments from the guy two cubes over about how
you should just wait for the hardware hypervisors to come out and explain the
reality of virtualization to him. Since this book is an ESX book we will continue
to focus on that from here out.
So what is “virtualization?”
Simply stated, VMware (the company) provides virtualization technology. All of
their products (Workstation, ESX Server, and VMware Server) work in more21
or-less the same way. There is a host computer that runs a layer of software that
provides or creates virtual machines that x86 operating systems can be installed
to. These virtual machines are really just that—complete virtual machines.
Within a virtual machine you have hardware like processors, network cards,
disks, COM ports, memory, etc. This hardware is presented through BIOS and
is configurable just like physical hardware.
If we were to jump ahead a little and look at a virtual Windows server, you
would be able to go into device manager and view the network cards, disks,
memory, etc…, just like a physical machine. As a matter of fact the whole idea
is that the virtualized operating system has no idea that it is on virtual hardware.
It just sees hardware as it normally would.
In Chapter 2 we’ll go into detail on how virtualization actually takes place with
ESX Server, but for now it’s just important to understand that from a high level
there are a few components that make up any virtualization environment:
1. A host machine/host hardware
2. Virtualization software that provides and manages the virtual environment
3. The virtual machine(s) themselves (or “VM’s”—virtual hardware
presented to the guest
4. The guest operating system that is installed on the virtual machine
22
Figure 1.1: The basic virtualization model
Knowing that each of these four elements must be in place in a virtual environment
and understanding that they are distinctly different resources will allow
you to understand where different virtualization software is used. Let’s examine
each of these components a bit before moving into the breakdown of the different
virtualization technologies available from VMware and other vendors.
The Host Machine
The host machine in a virtual environment provides the resources to which the
virtual machines will eventually have access. Obviously, the more of the resources
available the more virtual machines you can host. Or put more directly,
“the bigger the host machine, the more virtual machines you can run on it.”
This really makes sense if you look at Figure 1.1. In Component 1 of the diagram,
the host machine has processors, some RAM, a set of disks, and network
cards. Assuming that the host is going to use some of each of these core four
resources for its own operation, you have what’s leftover available for the virtual
machines you want to run on it.
For example, let’s assume that the host is using 10% of the available processor
for itself. That leaves 90% of the CPU available for the virtual machines. If
you’re only running a single virtual machine (VM) then this may be more than
enough. However, if you’re trying to run 30 VM’s on that host then the hosts
“extra” 90% CPU availability will probably lead to a bottleneck.
23
The other challenge is that since all of the VM’s are sharing the same resource
(the CPU’s in this case), how do you keep them from stepping on each other?
This is actually where Component 2 from Figure 1 comes in—the virtualization
software.
The Virtualization Software
The virtualization software layer provides each virtual machine access to the
host resources. It’s also responsible for scheduling the physical resources among
the various VM’s. This virtualization software is the cornerstone of the entire
virtualization environment. It creates the virtual machines for use, manages the
resources provided to the VM’s, schedules resource usage when there is contention
for a specific resource, and provides a management and configuration interface
for the VM’s.
Again, we can’t stress enough that this software is the backbone of the system.
The more robust this virtualization software is the better it is at scheduling and
sharing physical resources. This leads to more efficient virtual machines.
VMware provides three versions of this virtualization software. The first two—
“VMware Workstation” and “VMware Server”—are virtualization software
packages that install onto an existing operating system on a host computer. The
third version (“VMware ESX Server”) is a full operating system in-and-of itself.
We’ll explore some of the reasons to help you choose which product you
should use later in this chapter. The important idea here is to understand that
ESX Server is both its own operating system and also the virtualization software
(Components 1 and 2 in the model we’ve been referencing), while VMware
Server and Workstation are virtualization software packages that are installed on
and rely upon other operating systems.
The Virtual Machine
The term “virtual machine” is often incorrectly used to describe both the virtual
machine (Component 3) and the guest operating system (Component 4). For
clarity in this book we will not mix the two. The virtual machine is actually the
virtual hardware (or the combined virtual hardware and the virtual BIOS) presented
to the guest operating systems. It’s the software-based virtualization of
24
physical hardware. The guest operating systems that we install into these “machines”
are (in most cases) unaware that the hardware they see is virtual. All that
the guest OS knows is that it sees this type of processor, that type of network
card, this much memory, etc.
It’s important to understand that the virtual machine is not the OS but instead
the hardware and configurations that are presented to the guest OS.
The Guest Operating System
In case it’s not clear by now, the guest OS is the x86-based operating system
(Windows, Linux, Novell, DOS, whatever) that’s running on a VM. Again, understanding
that the guest OS (or “guest machine” or simply “guest”) is simply
the software (Component 4) that’s installed onto a VM (Component 3) will
make your life easier when it comes to understanding and troubleshooting your
environment.
Once all four of these components are in place you’ll have a virtual environment.
How this virtual environment performs, is managed, and the types of
functionality available in your virtual environment are all dependent on the type
of software you’re using to provide your virtual environment.
Which Virtualization product which should I use?
Well that’s the question isn’t it? Since this book is about ESX Server we can
probably assume that you’ve already made your decision. Then again you might
be using this book to help make your decision, so we’ll go ahead and look at the
products and how they fit into the IT world.
Most VMware folks started out using VMware Workstation (simply called
“Workstation” by VMware geeks in the community). Workstation allows us to
create virtual workstations and servers on our own PC. This lets us create small
test environments that we can use to test scripts, new software packages, upgrades,
etc. VMware Workstation is perfect for this.
Figure 1.2: Where do the VMware and other Virtualization products fit?
25
Of course Workstation has its limitations. Probably the biggest one is that VM’s
can only run while you’re logged into your host workstation. Log off and the
VM's shutdown. Also, VMware Workstation is pretty much a local user tool
which means that there are really no remote administration capabilities whatsoever.
These limitations keep Workstation on the desktops of developers, engineers,
and traveling salespeople who have to give multi-server demos off their
laptops. No one uses Workstation for production environments.
VMware Server (formerly called “GSX” for short) is a step up from Workstation.
Server is basically a software package that installs onto an existing host
operating system (either Linux or Windows Server). It offers remote management
and remote console access to the VM’s, and the various VM’s can be configured
to run as services without any console interaction required. Its limitation
is really that it has to use resources from the host hardware through the host
OS. This really limits the scalability and performance of Server.
The reason for this is that with Server, VM’s do not have direct access to the
hardware. Let’s look at an example to see how this can cause a problem. We’ll
look at memory use. Let’s assume you configure 384MB of memory for a VM
running on VMware Server for Windows. The “catch” here is that since
VMware Server is “just” another Windows application, the VM doesn’t get direct
access to 384MB of memory. Instead it requests 384MB of memory from
Windows and is dependent on the Windows scheduling mechanism.
26
Sure you’ll see the host’s memory utilization go up by 384MB (plus some for
overhead) when you turn on the VM, but the guest OS has to send all memory
requests to Windows. In this case you’ll have a host OS managing the “physical”
memory for the guest OS. This is on top of the guest OS managing its own
memory within the VM.
While this is just a simplified example, it points out some of the inherent limitations
with VMware Server that aren’t seen in ESX. Does this mean VMware
Server isn’t a good product? Not at all. It just means that it has limitations
stemming from the fact the virtualization software runs on top of a host OS.
VMware Server is still used in plenty of environments—especially those that
don’t require enterprise class scalability for their VM’s, those that have a limited
numbers of VM’s, and those that do not require maximum performance.
VMware Server is also frequently found in corporate test labs and is used to
allow administrators to get the benefits of a “virtual” test environment without
the them needing to know all the ins and outs of a full virtual server OS. Finally,
many companies use VMware Server when they don’t have the budget to buy
ESX-certified hardware or when the VMware champions can’t win the political
“everything has to run on Windows” battle.
What makes ESX different than VMware Server, Workstation, or even Microsoft’s
Virtual Server in its current revision?
VMware ESX Server is its own operating system. Unlike VMware Server or
Microsoft Virtual Server, ESX is not a software package that installs into a host
OS- ESX is the host OS. Engineered from the beginning to be nothing more
than a VM host, ESX Server is completely designed to give the VM’s the best
performance possible and to allow you (the admin) to control and shape the
way the host resources are shared and utilized.
So what does using ESX instead of VMware Server or Workstation get you?
The answer is simple: performance (more management and recovery features
and reliability).
• Performance. ESX Server provides a level of performance for your
VM’s that simply cannot be found in VMware Server or Workstation.
It also allows for more advanced resource allocation, fine tun27
ing of performance, a better VM-to-processor ratio, and more advanced
resource sharing.
• Management and Recovery Features. ESX, when used with Virtual
Center, allows you to manage the load on your systems by moving
a VM to any host in a cluster without downtime. It also has an
automated mode where it will move and shift load (by moving
VM’s) to different hosts in the cluster automatically. In the event of
a server failure in the cluster, the VM’s will be restarted on the remaining
hosts and brought back online within minutes.
• Reliability. VMware published an ESX Server Hardware Compatibility
List (HCL). If the hardware you’re using for ESX is on the
HCL, then you can be confident that everything will work as expected.
ESX also lets you get rid of any problems that exist in the
host OS since host OS’s don’t exist with ESX.
In short if you’re looking to implement virtual machines on an enterprise level
or if you’re looking to host a lot of production servers as VM’s, then ESX is a
great choice.
A 60-second Overview of the ESX Server Architecture
An ESX Server is made up of two core components:
• The ESX Server kernel (called “VMkernel”)
• The Service Console
The term “ESX Server” is usually used to describe all of this stuff together.
28
Figure 1.3: ESX Server Simplified
There is quite a bit of confusion in regards to what the Service Console and the
VMkernel really are. The service console is a customized Linux 2.4 kernel based
on a Redhat Enterprise Linux distribution. The key to understanding the Service
Console’s relationship to the kernel is that the Console is really just a high priority
virtual machine that allows you (though tools) to interact with the VMkernel
and manage its configurations. The VMkernel on the other hand is the hypervisor
and the real guts of ESX.
In Chapter 2 we’ll go into great (and sometimes painful) detail about the
VMkernel and the Service Console. For now we just want you to understand
the basic architecture so you see how ESX is different from VMware’s other
two main products.
Referring to Figure 1.3 you can see that the service console is what allows us to
interact with this server. This operating system allows us Secure Shell access,
supports a web based management console, and allows us to manage the server.
But, the service console is not ESX itself, it does not schedule resources or
manage hardware access, and basically would be a simple Linux server if it
wasn’t for the VMkernel.
The VMkernel is what manages/schedules access to specific hardware resources
on the host. It is the VMkernel that provides the Virtual Machines into which
guest operating systems can be installed. This kernel is what makes ESX different
from the other software packages available. The VMkernel allows direct
29
hardware access to the core 4 resources. It manages memory for the VM’s,
schedules processor time for the VM’s, maintains virtual switches for VM network
connectivity and schedules access to local and remote storage.
This kernel has been specifically built for this task. Unlike Windows or Linux
hosts that have been built to be multi-purpose servers, this kernel’s whole purpose
is to share and manage access to resources. This makes it extremely light
yet extremely powerful. Overhead in VMware ESX is estimated at 3-8%, while
overhead for the host in these other OS’s is generally 10-20% and sometimes as
high as 30% depending on configurations.
The reduction in overhead basically comes from ESX being a “bare metal”
product. Unlike the technologies used in workstation, Server or the current
*Microsoft products, ESX makes the most of your hardware and has been built
from the ground up to provide superb VM performance. Contrast this to the
GSX, Workstation and Microsoft Virtual Server products that are really add-ons
to operating systems that are built to handle numerous tasks and are not focused
on providing high end VM performance.
*- a note here on the Microsoft virtualization product: At the time of writing
Microsoft’s publicly available virtualization product is MS Virtual Server 2005
R2. This product is an application that runs in a standard Windows 2003 Server
environment. Microsoft’s road map for virtualization includes a lightweight,
bare-metal hypervisor like ESX Server, but at this point it is a road map available
to the public and only beta code, so when comparing products we are using
currently available technology
This book is an updated version of the original ESX Server Advanced Technical
Design Guide that was written for ESX 2.x. It has been updated, added to, and rewritten
to bring it up to speed for ESX 3.0 and Virtual Center 2.0. As was the
case with the original book, this book is designed to provide you with practical,
real-world information about the use and design of VMware ESX Server systems.
In order to study the advanced technical details, we need to ensure that
you have a good baseline understanding of VMware ESX features, VMware’s
other products, and of course virtualization concepts as we know them today.
Even if you’ve already read and understand our previous book or if you’re very
familiar with previous versions of ESX or VMware server, we still recommend
that you at least skim through this chapter so that you can familiarize yourself
with the specific terms and phrasing that we’ll use throughout this book.
As you read through this book, you should keep in mind that each chapter was
written separately and that it’s okay for you to read them out-of-order. Of
course if you’re new to VMware and virtualization technologies, the chapter
sequence will guide you through a logical progression of the components. We’ll
begin in this chapter with an overview of VMware software solutions.
Virtualization - Today's Favorite Buzz Word
Standing at the RapidApp booth at VMworld this year (VMware’s annual conference),
I had one of the attendee’s approach me and ask “so do you guys
really do something with virtualization technology, or are you just another company
that throws the word virtual in front of the product and come here?” I
laughed. It was funny and any of you that have watched virtualization grow over
the past few years understands how funny that really is. Virtualization is the hottest
topic around the IT space right now. Virtualize your servers, virtualize your
network, virtualize your storage, virtualize your boss… well that last one isn’t
here yet, but we’re working on it.
18
This book deals with the market leading product ESX Server 3. In this chapter
we will give you an overview of where ESX (or as VMware marketing people
like to call the whole solution “Virtual Infrastructure 3”) fits into the server virtualization
space, the type of virtualization it does, and we will try to keep away
from all the other “virtual” products.
VMware ESX Server allows for the “virtualization” of x86-based servers. The
basic concept is that a single piece of physical hardware is used to host multiple
logical or “virtual” servers. The ability to host multiple virtual servers on a single
piece of hardware, while simple to understand in theory, can be extremely complex
in execution. It should be noted that the idea of virtual servers is nothing
new. Look to the UNIX space and you will hear terms like “Frames” and
LPARs” and can equate those with Hosts and Virtual Machines. VMware’s
place in IT shops is driven in the x86 space, where hardware is cheap (compared
to UNIX) and generally underutilized.
When VMware technology is used in an environment, a single host allows multiple
virtual servers to share the host’s physical resources. These resources include
processor, memory, network cards, and storage (We’ll refer to these four
physical resources as the “core four” throughout this book.) This architecture
also gives you the ability to “partition” your host to allow for the server’s resources
to be fully utilized while you migrate multiple physical servers to a single
VMware server running multiple virtual servers. (We’ll discuss partitioning and
migration strategies in much more depth throughout this book.)
So what does this mean to your environment or your business? It depends on
what your business looks like and which product from VMware you’re thinking
about using. Obviously if you’re reading this book then you’re at least contemplating
an ESX Server implementation. But even with ESX you have a number
of alternatives on how to deploy and utilize the system. Also, when investigating
VMware technology it is unfair to simply look at ESX and why it is used. You
can only decide to use ESX instead of GSX or Workstation after you really understand
what each product does and where they truly fit. Outside of the
VMware suite of products you may also have to look at other virtualization solutions,
like those based on XenSource. And to make an informed decision
about these solutions you need a better understanding of virtualization technology
for x86 servers.
19
Hardware Virtualization vs. Software
The word virtualization is thrown around a lot these days, and everyone claims
his way of virtualizing is better than everyone else’s. But to understand what you
are getting into you need to understand some basics. The first (and most important)
concept you need to grasp is the difference between hardware virtualization
and software. Most people in IT are aware that Intel and AMD have
released “processor” virtualization technologies for their processor. There is a
lot of talk about “moving this or that out of Ring 0 and into a Ring 1.” There is
also talk about how Intel is working on hardware hypervisors for their network
cards, and HBA manufacturers are also creating hypervisors… So what does it
all mean?
To make it simple let’s look at the processor and two major virtualization
“products.” First let’s look at XenSource (pronounced zen) and Xen based
products like Virtual Iron. Xen is basically a free virtual machine monitor (a
chunk of software that allows for multiple operating systems to run on a single
piece of hardware) that grew out of the Linux community and originally would
require a modification of the guest operating system to run properly. There is
more detail and strategy on this product in chapter 2, but for now let's just take
the basics.
The reason this modification was required was due to the x86 processor architecture
and its concepts of security rings (ring 0 through 3). In virtualization the
host operating system generally will run at ring level 0, the most secure part of
the processor if you will, with Virtual Machines running at ring level 3. Of
course operating systems (like Windows or Linux) are written so that they require
full control of ring level 0. So when these OS’s are run as a guest virtual
machine they need to be modified so that the guest OS is not making ring level
0 calls (or in Windows system/kernel calls) to areas of the processor to which it
doesn’t have access. With the processor virtualization technology that has been
introduced, it is basically a hardware change to allow for a “virtual Ring 0” if
you will, labeled ‘Ring 1.’ The idea is that the manufacturer can make changes to
move the required instructions up a level (to ring 1) and then the guest OS’s are
fooled into thinking they have full control of ring 0 all the time.
Before these processors were released, using Xen required that you modified
the kernel in the guest OS and host. For Windows shops this was not an option
and slowed the adoption of Xen as a virtualization solution. With Xen out,
20
most Windows environments adopted ESX server since it allows the VM to
execute against the processors (direct execution) without modifications to the
guest OS at all.
ESX did (and does in 3.0) what I like to think of as processor thunking. Remember
when you ran 16 bits apps on a 32 bit system, and the process for remapping
the 16 bit addresses to 32 bit addresses was termed as “thunking.”
Well I like to think of ESX doing the same thing, but on the processor side.
ESX essentially allows the guest OS to see the processor as it is. It then allows
direction execution on the processor (even to ring 0) from the guest OS. The
trick here is that the ESX kernel also runs at ring level 0, and each time a call
from the guest comes down, the kernel essentially has stopped executing itself
on ring 0, allowing the execution of the command from the guest (there is a
binary translation involved) then sending the results back to the guest and resuming
its operations.
This may sound like a ton of overhead, but it’s not. Sure there is some, but essentially
both forms of virtualization perform about the same, and in either case
there is still a need for some type of virtual machine monitor/hypervisor to
manage the guests and their access to resources.
Which brings us to our next point, no matter what the hardware manufacturers
do (processor, NIC, HBA or Memory), there will, for the foreseeable future, be
a hypervisor like Xen or ESX. A software layer will be required to manage all
these hardware devices and their unique one-use hypervisors. The decision you
have to make is which piece of software to use. Xen and VMware ESX are the
two “big name” players with Microsoft Virtual Server following behind until
their Longhorn hypervisor is publicly available. We’ll get into the difference between
hosted and bare metal architectures in the following pages, but at least
now you can put to rest the arguments from the guy two cubes over about how
you should just wait for the hardware hypervisors to come out and explain the
reality of virtualization to him. Since this book is an ESX book we will continue
to focus on that from here out.
So what is “virtualization?”
Simply stated, VMware (the company) provides virtualization technology. All of
their products (Workstation, ESX Server, and VMware Server) work in more21
or-less the same way. There is a host computer that runs a layer of software that
provides or creates virtual machines that x86 operating systems can be installed
to. These virtual machines are really just that—complete virtual machines.
Within a virtual machine you have hardware like processors, network cards,
disks, COM ports, memory, etc. This hardware is presented through BIOS and
is configurable just like physical hardware.
If we were to jump ahead a little and look at a virtual Windows server, you
would be able to go into device manager and view the network cards, disks,
memory, etc…, just like a physical machine. As a matter of fact the whole idea
is that the virtualized operating system has no idea that it is on virtual hardware.
It just sees hardware as it normally would.
In Chapter 2 we’ll go into detail on how virtualization actually takes place with
ESX Server, but for now it’s just important to understand that from a high level
there are a few components that make up any virtualization environment:
1. A host machine/host hardware
2. Virtualization software that provides and manages the virtual environment
3. The virtual machine(s) themselves (or “VM’s”—virtual hardware
presented to the guest
4. The guest operating system that is installed on the virtual machine
22
Figure 1.1: The basic virtualization model
Knowing that each of these four elements must be in place in a virtual environment
and understanding that they are distinctly different resources will allow
you to understand where different virtualization software is used. Let’s examine
each of these components a bit before moving into the breakdown of the different
virtualization technologies available from VMware and other vendors.
The Host Machine
The host machine in a virtual environment provides the resources to which the
virtual machines will eventually have access. Obviously, the more of the resources
available the more virtual machines you can host. Or put more directly,
“the bigger the host machine, the more virtual machines you can run on it.”
This really makes sense if you look at Figure 1.1. In Component 1 of the diagram,
the host machine has processors, some RAM, a set of disks, and network
cards. Assuming that the host is going to use some of each of these core four
resources for its own operation, you have what’s leftover available for the virtual
machines you want to run on it.
For example, let’s assume that the host is using 10% of the available processor
for itself. That leaves 90% of the CPU available for the virtual machines. If
you’re only running a single virtual machine (VM) then this may be more than
enough. However, if you’re trying to run 30 VM’s on that host then the hosts
“extra” 90% CPU availability will probably lead to a bottleneck.
23
The other challenge is that since all of the VM’s are sharing the same resource
(the CPU’s in this case), how do you keep them from stepping on each other?
This is actually where Component 2 from Figure 1 comes in—the virtualization
software.
The Virtualization Software
The virtualization software layer provides each virtual machine access to the
host resources. It’s also responsible for scheduling the physical resources among
the various VM’s. This virtualization software is the cornerstone of the entire
virtualization environment. It creates the virtual machines for use, manages the
resources provided to the VM’s, schedules resource usage when there is contention
for a specific resource, and provides a management and configuration interface
for the VM’s.
Again, we can’t stress enough that this software is the backbone of the system.
The more robust this virtualization software is the better it is at scheduling and
sharing physical resources. This leads to more efficient virtual machines.
VMware provides three versions of this virtualization software. The first two—
“VMware Workstation” and “VMware Server”—are virtualization software
packages that install onto an existing operating system on a host computer. The
third version (“VMware ESX Server”) is a full operating system in-and-of itself.
We’ll explore some of the reasons to help you choose which product you
should use later in this chapter. The important idea here is to understand that
ESX Server is both its own operating system and also the virtualization software
(Components 1 and 2 in the model we’ve been referencing), while VMware
Server and Workstation are virtualization software packages that are installed on
and rely upon other operating systems.
The Virtual Machine
The term “virtual machine” is often incorrectly used to describe both the virtual
machine (Component 3) and the guest operating system (Component 4). For
clarity in this book we will not mix the two. The virtual machine is actually the
virtual hardware (or the combined virtual hardware and the virtual BIOS) presented
to the guest operating systems. It’s the software-based virtualization of
24
physical hardware. The guest operating systems that we install into these “machines”
are (in most cases) unaware that the hardware they see is virtual. All that
the guest OS knows is that it sees this type of processor, that type of network
card, this much memory, etc.
It’s important to understand that the virtual machine is not the OS but instead
the hardware and configurations that are presented to the guest OS.
The Guest Operating System
In case it’s not clear by now, the guest OS is the x86-based operating system
(Windows, Linux, Novell, DOS, whatever) that’s running on a VM. Again, understanding
that the guest OS (or “guest machine” or simply “guest”) is simply
the software (Component 4) that’s installed onto a VM (Component 3) will
make your life easier when it comes to understanding and troubleshooting your
environment.
Once all four of these components are in place you’ll have a virtual environment.
How this virtual environment performs, is managed, and the types of
functionality available in your virtual environment are all dependent on the type
of software you’re using to provide your virtual environment.
Which Virtualization product which should I use?
Well that’s the question isn’t it? Since this book is about ESX Server we can
probably assume that you’ve already made your decision. Then again you might
be using this book to help make your decision, so we’ll go ahead and look at the
products and how they fit into the IT world.
Most VMware folks started out using VMware Workstation (simply called
“Workstation” by VMware geeks in the community). Workstation allows us to
create virtual workstations and servers on our own PC. This lets us create small
test environments that we can use to test scripts, new software packages, upgrades,
etc. VMware Workstation is perfect for this.
Figure 1.2: Where do the VMware and other Virtualization products fit?
25
Of course Workstation has its limitations. Probably the biggest one is that VM’s
can only run while you’re logged into your host workstation. Log off and the
VM's shutdown. Also, VMware Workstation is pretty much a local user tool
which means that there are really no remote administration capabilities whatsoever.
These limitations keep Workstation on the desktops of developers, engineers,
and traveling salespeople who have to give multi-server demos off their
laptops. No one uses Workstation for production environments.
VMware Server (formerly called “GSX” for short) is a step up from Workstation.
Server is basically a software package that installs onto an existing host
operating system (either Linux or Windows Server). It offers remote management
and remote console access to the VM’s, and the various VM’s can be configured
to run as services without any console interaction required. Its limitation
is really that it has to use resources from the host hardware through the host
OS. This really limits the scalability and performance of Server.
The reason for this is that with Server, VM’s do not have direct access to the
hardware. Let’s look at an example to see how this can cause a problem. We’ll
look at memory use. Let’s assume you configure 384MB of memory for a VM
running on VMware Server for Windows. The “catch” here is that since
VMware Server is “just” another Windows application, the VM doesn’t get direct
access to 384MB of memory. Instead it requests 384MB of memory from
Windows and is dependent on the Windows scheduling mechanism.
26
Sure you’ll see the host’s memory utilization go up by 384MB (plus some for
overhead) when you turn on the VM, but the guest OS has to send all memory
requests to Windows. In this case you’ll have a host OS managing the “physical”
memory for the guest OS. This is on top of the guest OS managing its own
memory within the VM.
While this is just a simplified example, it points out some of the inherent limitations
with VMware Server that aren’t seen in ESX. Does this mean VMware
Server isn’t a good product? Not at all. It just means that it has limitations
stemming from the fact the virtualization software runs on top of a host OS.
VMware Server is still used in plenty of environments—especially those that
don’t require enterprise class scalability for their VM’s, those that have a limited
numbers of VM’s, and those that do not require maximum performance.
VMware Server is also frequently found in corporate test labs and is used to
allow administrators to get the benefits of a “virtual” test environment without
the them needing to know all the ins and outs of a full virtual server OS. Finally,
many companies use VMware Server when they don’t have the budget to buy
ESX-certified hardware or when the VMware champions can’t win the political
“everything has to run on Windows” battle.
What makes ESX different than VMware Server, Workstation, or even Microsoft’s
Virtual Server in its current revision?
VMware ESX Server is its own operating system. Unlike VMware Server or
Microsoft Virtual Server, ESX is not a software package that installs into a host
OS- ESX is the host OS. Engineered from the beginning to be nothing more
than a VM host, ESX Server is completely designed to give the VM’s the best
performance possible and to allow you (the admin) to control and shape the
way the host resources are shared and utilized.
So what does using ESX instead of VMware Server or Workstation get you?
The answer is simple: performance (more management and recovery features
and reliability).
• Performance. ESX Server provides a level of performance for your
VM’s that simply cannot be found in VMware Server or Workstation.
It also allows for more advanced resource allocation, fine tun27
ing of performance, a better VM-to-processor ratio, and more advanced
resource sharing.
• Management and Recovery Features. ESX, when used with Virtual
Center, allows you to manage the load on your systems by moving
a VM to any host in a cluster without downtime. It also has an
automated mode where it will move and shift load (by moving
VM’s) to different hosts in the cluster automatically. In the event of
a server failure in the cluster, the VM’s will be restarted on the remaining
hosts and brought back online within minutes.
• Reliability. VMware published an ESX Server Hardware Compatibility
List (HCL). If the hardware you’re using for ESX is on the
HCL, then you can be confident that everything will work as expected.
ESX also lets you get rid of any problems that exist in the
host OS since host OS’s don’t exist with ESX.
In short if you’re looking to implement virtual machines on an enterprise level
or if you’re looking to host a lot of production servers as VM’s, then ESX is a
great choice.
A 60-second Overview of the ESX Server Architecture
An ESX Server is made up of two core components:
• The ESX Server kernel (called “VMkernel”)
• The Service Console
The term “ESX Server” is usually used to describe all of this stuff together.
28
Figure 1.3: ESX Server Simplified
There is quite a bit of confusion in regards to what the Service Console and the
VMkernel really are. The service console is a customized Linux 2.4 kernel based
on a Redhat Enterprise Linux distribution. The key to understanding the Service
Console’s relationship to the kernel is that the Console is really just a high priority
virtual machine that allows you (though tools) to interact with the VMkernel
and manage its configurations. The VMkernel on the other hand is the hypervisor
and the real guts of ESX.
In Chapter 2 we’ll go into great (and sometimes painful) detail about the
VMkernel and the Service Console. For now we just want you to understand
the basic architecture so you see how ESX is different from VMware’s other
two main products.
Referring to Figure 1.3 you can see that the service console is what allows us to
interact with this server. This operating system allows us Secure Shell access,
supports a web based management console, and allows us to manage the server.
But, the service console is not ESX itself, it does not schedule resources or
manage hardware access, and basically would be a simple Linux server if it
wasn’t for the VMkernel.
The VMkernel is what manages/schedules access to specific hardware resources
on the host. It is the VMkernel that provides the Virtual Machines into which
guest operating systems can be installed. This kernel is what makes ESX different
from the other software packages available. The VMkernel allows direct
29
hardware access to the core 4 resources. It manages memory for the VM’s,
schedules processor time for the VM’s, maintains virtual switches for VM network
connectivity and schedules access to local and remote storage.
This kernel has been specifically built for this task. Unlike Windows or Linux
hosts that have been built to be multi-purpose servers, this kernel’s whole purpose
is to share and manage access to resources. This makes it extremely light
yet extremely powerful. Overhead in VMware ESX is estimated at 3-8%, while
overhead for the host in these other OS’s is generally 10-20% and sometimes as
high as 30% depending on configurations.
The reduction in overhead basically comes from ESX being a “bare metal”
product. Unlike the technologies used in workstation, Server or the current
*Microsoft products, ESX makes the most of your hardware and has been built
from the ground up to provide superb VM performance. Contrast this to the
GSX, Workstation and Microsoft Virtual Server products that are really add-ons
to operating systems that are built to handle numerous tasks and are not focused
on providing high end VM performance.
*- a note here on the Microsoft virtualization product: At the time of writing
Microsoft’s publicly available virtualization product is MS Virtual Server 2005
R2. This product is an application that runs in a standard Windows 2003 Server
environment. Microsoft’s road map for virtualization includes a lightweight,
bare-metal hypervisor like ESX Server, but at this point it is a road map available
to the public and only beta code, so when comparing products we are using
currently available technology
Monday, May 3, 2010
Killing a non responsive VM
Troubleshooting (Various things ran into within our environment and how to resolve them or investigate)Open a case with Vmware -1-877-4VMWAREU
VConverter -
Conversion Stops with Disk corrupted error around 4-6%This is usually due to something using the disk not allowing VConvert to access the file or bit. This is commonly due to an application lock (ie: Symantec or SQL) - Stop all non Microsoft services and run the converter again - Use the CloneCD option (boot from CD to convert) Conversion Stops at 97%The Conversion is complete. The converter has an issue writing the final configurations to the VM. Run the VConverter again and choose the Configure Machine option. This will take some time. It will basically reconfig the VM with no drivers. After VM boot it will then start to install the rest of the drivers for NICs, Procs, etc...
Killing a non responsive VM (VM boot wont turn off or on via VCenter)
open WINSCPConnect to AZ84ESXDEV01account vmadminenter passwordopen a Putty shellenter password (uses the same account "vmadmin")Type su -enter root account passwordType ps -ef grepThe second column is your pid Type kill -9 Type ps -ef grep again to verify no processesCheck VM state again, it should now be offTo Power On the VM you can use the VI client orType vmware-cmd //server.vmx start to power on VM
VConverter -
Conversion Stops with Disk corrupted error around 4-6%This is usually due to something using the disk not allowing VConvert to access the file or bit. This is commonly due to an application lock (ie: Symantec or SQL) - Stop all non Microsoft services and run the converter again - Use the CloneCD option (boot from CD to convert) Conversion Stops at 97%The Conversion is complete. The converter has an issue writing the final configurations to the VM. Run the VConverter again and choose the Configure Machine option. This will take some time. It will basically reconfig the VM with no drivers. After VM boot it will then start to install the rest of the drivers for NICs, Procs, etc...
Killing a non responsive VM (VM boot wont turn off or on via VCenter)
open WINSCPConnect to AZ84ESXDEV01account vmadminenter passwordopen a Putty shellenter password (uses the same account "vmadmin")Type su -enter root account passwordType ps -ef grep
Wednesday, April 28, 2010
Virtual Infrastructure products being removed by May 2010
Virtual Infrastructure products being removed by May 2010:
ESX 3.5 versions 3.5 GA, Update 1, Update 2, Update 3 and Update 4
ESX 3.0 versions 3.0 GA, 3.0.1, 3.0.2 and 3.0.3
ESX 2.x versions 2.5.0 GA, 2.5.1, 2.5.2, 2.1.3, 2.5.3, 2.0.2, 2.1.2 and 2.5.4
Virtual Center 2.5 GA, 2.5 Update 1, 2.5 Update 2, 2.5 Update 3, 2.5 Update 4 and 2.5 Update 5
Virtual Center 2.0
Virtual Infrastructure products remaining for Extended Support:
These versions will be the baseline for ongoing support during the Extended Support phase. All subsequent patches issued will be based solely upon the releases below.
ESX 3.5 Update 5 will remain throughout the duration of Extended Support
ESX 3.0.3 Update 1 will remain throughout the duration of Extended Support
Virtual Center 2.5 Update 6 expected in early 2010
Customers may stay at a prior version, however VMware’s patch release program during Extended Support will be continued with the condition that all subsequent patches will be based on the latest baseline. In some cases where there are release dependencies, prior update content may be included with patches.
http://www.vmware.com/support/policies/lifecycle/vi/faq.html
ESX 3.5 versions 3.5 GA, Update 1, Update 2, Update 3 and Update 4
ESX 3.0 versions 3.0 GA, 3.0.1, 3.0.2 and 3.0.3
ESX 2.x versions 2.5.0 GA, 2.5.1, 2.5.2, 2.1.3, 2.5.3, 2.0.2, 2.1.2 and 2.5.4
Virtual Center 2.5 GA, 2.5 Update 1, 2.5 Update 2, 2.5 Update 3, 2.5 Update 4 and 2.5 Update 5
Virtual Center 2.0
Virtual Infrastructure products remaining for Extended Support:
These versions will be the baseline for ongoing support during the Extended Support phase. All subsequent patches issued will be based solely upon the releases below.
ESX 3.5 Update 5 will remain throughout the duration of Extended Support
ESX 3.0.3 Update 1 will remain throughout the duration of Extended Support
Virtual Center 2.5 Update 6 expected in early 2010
Customers may stay at a prior version, however VMware’s patch release program during Extended Support will be continued with the condition that all subsequent patches will be based on the latest baseline. In some cases where there are release dependencies, prior update content may be included with patches.
http://www.vmware.com/support/policies/lifecycle/vi/faq.html
CloudComputing
CloudComputing -
Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that's often used to represent the Internet in flow charts and diagrams.
A cloud service has three distinct characteristics that differentiate it from traditional hosting. It is sold on demand, typically by the minute or the hour; it is elastic -- a user can have as much or as little of a service as they want at any given time; and the service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access). Significant innovations in virtualization and distributed computing, as well as improved access to high-speed Internet and a weak economy, have accelerated interest in cloud computing.
A cloud can be private or public. A public cloud sells services to anyone on the Internet. (Currently, Amazon Web Services is the largest public cloud provider.) A private cloud is a proprietary network or a data center that supplies hosted services to a limited number of people. When a service provider uses public cloud resources to create their private cloud, the result is called a virtual private cloud. Private or public, the goal of cloud computing is to provide easy, scalable access to computing resources and IT services.
Infrastructure-as-a-Service like Amazon Web Services provides virtual server instances with unique IP addresses and blocks of storage on demand. Customers use the provider's application program interface (API) to start, stop, access and configure their virtual servers and storage. In the enterprise, cloud computing allows a company to pay for only as much capacity as is needed, and bring more online as soon as required. Because this pay-for-what-you-use model resembles the way electricity, fuel and water are consumed, it's sometimes referred to as utility computing.
Platform-as-a-service in the cloud is defined as a set of software and product development tools hosted on the provider's infrastructure. Developers create applications on the provider's platform over the Internet. PaaS providers may use APIs, website portals or gateway software installed on the customer's computer. Force.com, (an outgrowth of Salesforce.com) and GoogleApps are examples of PaaS. Developers need to know that currently, there are not standards for interoperability or data portability in the cloud. Some providers will not allow software created by their customers to be moved off the provider's platform.
In the software-as-a-service cloud model, the vendor supplies the hardware infrastructure, the software product and interacts with the user through a front-end portal. SaaS is a very broad market. Services can be anything from Web-based email to inventory control and database processing. Because the service provider hosts both the application and the data, the end user is free to use the service from anywhere.
Cloud computing is a general term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). The name cloud computing was inspired by the cloud symbol that's often used to represent the Internet in flow charts and diagrams.
A cloud service has three distinct characteristics that differentiate it from traditional hosting. It is sold on demand, typically by the minute or the hour; it is elastic -- a user can have as much or as little of a service as they want at any given time; and the service is fully managed by the provider (the consumer needs nothing but a personal computer and Internet access). Significant innovations in virtualization and distributed computing, as well as improved access to high-speed Internet and a weak economy, have accelerated interest in cloud computing.
A cloud can be private or public. A public cloud sells services to anyone on the Internet. (Currently, Amazon Web Services is the largest public cloud provider.) A private cloud is a proprietary network or a data center that supplies hosted services to a limited number of people. When a service provider uses public cloud resources to create their private cloud, the result is called a virtual private cloud. Private or public, the goal of cloud computing is to provide easy, scalable access to computing resources and IT services.
Infrastructure-as-a-Service like Amazon Web Services provides virtual server instances with unique IP addresses and blocks of storage on demand. Customers use the provider's application program interface (API) to start, stop, access and configure their virtual servers and storage. In the enterprise, cloud computing allows a company to pay for only as much capacity as is needed, and bring more online as soon as required. Because this pay-for-what-you-use model resembles the way electricity, fuel and water are consumed, it's sometimes referred to as utility computing.
Platform-as-a-service in the cloud is defined as a set of software and product development tools hosted on the provider's infrastructure. Developers create applications on the provider's platform over the Internet. PaaS providers may use APIs, website portals or gateway software installed on the customer's computer. Force.com, (an outgrowth of Salesforce.com) and GoogleApps are examples of PaaS. Developers need to know that currently, there are not standards for interoperability or data portability in the cloud. Some providers will not allow software created by their customers to be moved off the provider's platform.
In the software-as-a-service cloud model, the vendor supplies the hardware infrastructure, the software product and interacts with the user through a front-end portal. SaaS is a very broad market. Services can be anything from Web-based email to inventory control and database processing. Because the service provider hosts both the application and the data, the end user is free to use the service from anywhere.
Tuesday, April 27, 2010
VMware Infrastructure 3
VMware Infrastructure 3
- ESX 3.5 and 3i (3i is embedded)
- Virtual SMP (multi processor support for VMs, up to 4)
- HA (High Availability)
- Vmotion
- DRS (Distributed Resource Scheduler)
- VMFS (VMware proprietary file system)
- VCB (consolidated backup
- Update Manager
- Storage Vmotion
- Virtual Center/Server
Blueprint for becoming VCP and VI3 certified http://mylearn1.vmware.com/portals/certification
Four main disk files
- Configuration File
- Virtual Disk File
- NVRAM file
- Log File
An operating system that has been virtualized is called a “Guest” operating system
Hosted vs. Bare Metal Hypervisor Architecture
- Hosted installs and runs the virtualization layer as an application on top of an operating system
- VMware Server is a free application that can be installed on a supported Windows or Linux system and provides host-based virtualization
Hypervisor (bare metal) – installs the virtualization layer directly on a clean x86 based system.
White paper for fully understanding Full Virtualization, Paravirtualization and Hardware Assist http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf
Compatibility & System Requirements
- Supported on Intel processors, Xeon and above or AMD Opteron (32-bit mode) processors
System Compatibility http://www.vmware.com/support/pubs/vi_pubs.html
Virtual Infrastructure components
- Virtual Center Management Server
- Virtual Center Database
- Virtual Center License Server
- Virtual Center Agent – On each managed host, software that collects, communicates and executes the actions received from the Virtual Center Server. Installed the first time any host is added to the Virtual Center Inventory
Similar product offerings
- VMware Virtual Desktop infrastructure – using VMs on large server as desktops for thin clients
- WMware Lab Manager
VMware Technology Network (VMTN)
http://www.vmware.com/vmtn
ESX supported boot methods
- local SCSI
- local IDE/ATA or SATA
- network boot
ESX Server Hardware Requirements
- (2) processors 1500MHz or higher x86
- (1) GB RAM (256 GB Max)
- (1) or more NICS (10Gig E supported)
- Disk Storage, local or SAN
Partitions created on Install
/boot = 100 MB
/ = 5GB (15GB in our install)
(Swap) = 544 MB (1.6GB in our install)
/var/log = 2GB recommended, 500 minimum (4GB in our install)
(VMcore) = 100 MB
/vmfs/volumes = Left over disk space for VMs and ISO images
Locations for storing ISO images
- VMFS datastore
- NFS datastore
- /vmimages directory on service console
New install will reformat the boot disk and install new software and config files. Upgrade will allow preservation of an existing ESX Server install and maintain all current config files and directories
Always disconnect SAN before install
Make sure the same disc (dell is Sda) is chosen at first warning prompt, when selecting how you want to partition the disks for the system and when selecting the boot device.
If you select “Crate a default network for virtual machines” during the install, your virtual machines will share a network adapter with the service console which is not recommended.
Licensing
Evaluation Mode – Full featured for 60 days from install. Does not require any licensing configuration
Serial Number – Used to license 3i only, not 3.x
License Server – Centralized licensing, only (1) license file
Host License – License files stored on individual ESX server. (1) license per host. Types are foundation, standard and enterprise
NTP configuration
- For accurate graphs
- For accurate log time stamps
- For accurate time in VM systems
- Doesn’t sync to Server Hardware Clock
SSH Access
- By default, the service console does not allow the root user account to log in using an SSH client. However, it does permit normal user account login access using secure shell and you can then “su” to become root user. This requires creating a none root user account vi the Virtual Center interface
Main reason for ESX server problems
- Defective Hardware (memory, cpu)
- Misconfig during install (wrong nic chosen, wrong boot device)
- Inadequate planning
Compatability guides http://www.vmware.com/support/pubs/vi_pubs.html
Purple Screen of Death (PSOD)
- Machine check exception (usually with CPU problem)
- NMI ECC or Parity Error (memory failures)
What to do if ESX Server Crashes
- screen grab PSOD
- Check Environment (heat, etc…)
- Check for detached hardware
- Gather logs
o Via service console run “vm-support”
o Via Virtual Center
§ File -> Export -> Export Diagnostic Data
The VMkernal allows the virtual machines as well as the service console access to the system’s hardware.
Networking
Virtual Switches allow access to the service console, VM network connectivity and access to IP storage
A virtual switch provides connections for VMs to communicate with each other, whether they are on the same host or different host
Each Service console port and each VMkernal port must be configured with its won IP address, netmask and gateway. There can be multiple of each, but they have to be on different networks
Default ports = 56 (64 – 8)
Default for vswitch0 = 24 (created during install)
Max ports = 1016
The MAC address of a physical NIC is not used at all. Instead, each VM’s virtual NIC has its own MAC address
Three Port Types
- Service Console Port – access to ESX Server management network
- VMkernal port – access to VMotion, iSCSI and/or NFS/NAS networks
- Virtual Machine Port Group – access to VM networks
Virtual NIC on service console is vswif0
All virtual switches are known as vSwitch#
Console command to determine PCI address of a NIC
- “esxcfg-nics –l”
Physical NICs are assigned at the virtual switch level
Three Network policies (configurable tabs)
- Security
- Traffic shaping
- NIC teaming
- Configured at the switch and port group level
- Port group level overrides switch level
VLAN benifits
Improved Security – switch only presents frames to those stations in the right VLANS
Improved Performance – each VLAN is its own broadcast domain
Lower cost – less hardware required for multiple LANs
ESX Server supports IEEE 802.1Q VLAN Tagging
Packets from a VM are tagged as they exit the virtual switch and are cleared (untagged) as they return to the VM
Network Policy - Security
Configured at both port group and vswitch level
Promiscous Mode – Use for IDS (intrusion detection), default is reject
MAC Address Changes – required by clusters and some software firewalls. Default is accept, but we set to reject.
Forged Transmitts – when set to reject drops any frames which the guest sends where the source address field contains a MAC address other then the assigned virtual NIC Mac address. Default is accept, but we set to reject.
Network Policy – Traffic Shaping
- controlling outbound network traffic only
- Average Rate - Kbps
- Peak Rate - Kbps
- Burst Rate - KB
Average bandwidth and peak bandwidth are specified in Kbps (kilobits per second) while burst size is specified in KB (Kilobytes).
Network traffic shaping is off by default
Network Policy – NIC Teaming
Provides Load Balancing and Network Failure Detection
Three Load Balancing Methods
1.) vSwitch Port- Based
a. Route based on the originating Port ID (default)
b. VMs outbound traffic is mapped to a specific physical NIC
c. Simple, fast and does not require the VMkernel to examine the frame
d. Spreads out traffic evenly across the physical NICs
e. No single-NIC VM will ever get more bandwidth than can be provided by a single physical adapter
2.) Source MAC-Based
a. Each VMs outbound traffic is mapped to a specific physical NIC based on the virtual NICs MAC address
b. Low overhead
c. Compatible with all switches
d. Might not spread traffic out evenly across the physical NICS
e. No single-NIC VM will ever get more bandwidth than can be provided by a single physical adapter
3.) IP – Based
a. NIC for each outbound packet is chosen based on its source and destination IP address
b. Higher CPU overhead
c. Not compatible with all switches
d. Has better distribution of traffic across physical NICs
e. A single-NIC VM might use the bandwidth of multiple physical adapters
Detecting and Handling Network Failure
- Link state only
- Link state + beaconing
Load Balancing option – use explicit failover order or override
Rolling Failover – how a physical adapter is returned to active duty after recovering
When using explicit failover order, always use the highest order uplink from the list of Active adapters which passes failover detection criteria
If rolling failover is set to no, the adapter is returned to active duty immediately upon recovery. If rolling is set to Yes, a failed adapter is left inactive even after recovery until another currently active adapter fails.
Storage
Fibre Channel switches interconnect multiple nodes to form the “fabric” in a Fibre Channel SAN
Fibre Channel encapsulates SCSI commands
Fibre Channel SAN consists of:
- storage system
- LUN
- SP (Storage Processor(s))
- HBA (host bus adapater)
- FC Swithces
WWN = Word-Wide Name - Unique, 64 bit address assigned to Fibre Channel node
Example: 50:06:01:60:10:20:AD:87
Controlling hosts’ access to LUNs
- Soft zoning – done on a Fibre Channel switch, controls LUN visibility on a per
WWN basis
- Hard zoning – control of storage-processor visibility ona per switch port basis.
Word Wide Names (WWNs) are assigned by the manufacturer of the SAN equipment. HBAs and SPs have WWNs. WWNs are used by SAN administrators to identify your equipment for zoning purposes.
LUN masking can be done within the ESX Server, but is normally performed at the storage processor level and with newer switches at the fabric level
VMKernal Disk partition addressing scheme
Example: vmhba0:0:11 or vmhba1:1:12:3
Vmhba:::
Vmhba – standard label that identifies a physical host bus adapter
Adapter – Adapter ID assigned to each HBA
Target – SCSI target that the Storage Processor presents
LUN – Logical unit number
Partition – partition on the lun
Fibre channel storage adapter is recognized by the VMkernal during the boot sequence
VMkernel scans up to 256 LUNs (only first 128 during ESX install)
Rescan to find newly assigned LUNs
Troubleshooting SAN connectivity http://www.vmware.com/pdf/vi3_san_design_deploy.pdf
iSCSI
Boot ESX server from iSCSI only with Hardware initiator
LUN masking in iSCSI works like it does in Fibre Channel. Ethernet switches do not implement zoning like Fibre Channel switches, instead, you can create zones using vlans
iSCSI Target Naming
iqn.1992-08.com.netapp:stor1
- iqn – all start with this
- date code – specifying the year and month in which the organization registered the
domain or sub-domain name used as the naming authority string
- The organizational naming authority string which consists of a valid reversed domain or subdomain name
- optionally, a “:” followed by an assigned alias
Example: iqn-2008-06.com.tangosoftware:vmfs1
Target Discover Methods
- Static Configuration – only available for Hardware initiators. IP address, TCP port and the iSCSI target name manually defined at hardware level
- Send Targets – Uses Targets IP address and port to query available targets
CHAP Authentication
Optional, stands for Challenge-Handshake Authentication Protocol
It is best practice to create a separate isolated IP network fro iSCSI traffic since transmitted data is unencrypted.
An isolated network is the only way VMware supports iSCSI
The software initiator is a port of the Cisco iSCI Initiator Command Reference implementation.
The Service Console and VMkernal NICs both need access to the iSCSI storage since the iSCSI daemon initiates the session and handles login and authentication. The actual I/O goes through he VMkernal
ESX Server does not support both hardware and software initiators running simultaneously
Main steps to setting up iSCSI
1.) Add the VMKernal for iSCSI traffic
2.) Enable iSCSI Traffic Through the Service Console Firewall (CHAP as well if used)
3.) Enable the iSCSI initiator in the storage section
4.) Input the IP address of iSCSI SAN in the send targets under Dynamic Discovery
iSCSI software initiator is identified as vmhba32
iSCSI hardware initiators are identified by bus location just like Fibre Channel HBAs.
Example: vmhba2
VMFS Datastores
Each virtual machine’s files are located in its own subdirectory
Repository for other files:
- templates
- Iso Images
VMFS volumes are addressed by a volume label a datastore name and physical address
Example: vmhba1:0:0:1
VMFS volumes are accessible in the service console underneiath /vmfs/volumes
Features of VMFS-3
- Distributed journaling
- Faster file system recovery, independent of volume size or number of hosts connected
- Scalable distributed locking-survives short and long SAN interruptions much better
- Support for small files- small files allocated from sub-block resource pool
32 - Max hosts accessing VMFS at same time
30720 – Max files per directory
30720 – Max files per Volume
VMware only supports a single VMFS on a single partition on a LUN
A single LUN VMFS must be at least 1.2 GB in size, but due to a limitation of the SCSI-2 protocol, a VMFS cannot exceed 2 TBs in size
A VMFS volume can be >2 TBs by adding extents in up to 2TB size each
VMFS can have up to 32 physical extents for a maximum size of about 64TB
Multipathing with Fibre Channel
- allows continued access to SAN LUNs in the event of hardware failure
- Exactly one path is active (in use) to any LUN at any time
Two Multipathing policies exist:
- MRU – Most Recently Used
- Fixed – Preferred Path
ESX server automatically sets the multipathing policy according to the make and model of the array it detects, so it should be changed manually
ESX Server multipathing is only supported for failover, not automatic load balancing. However, manual load balancing can also be achieved
NAS Storage and NFS Datastores
Two Key NAS protocols
- NFS (Network File System)
- SMB (Windows networking, also known as CIFS – Common Internet File System
ESX Server supports NFS version 3 over TCP
NFS volumes are treated just like VMFS volumes in Fibre Channel or iSCSI storage and thus can utilizes
- VMotion
- Create virtual machines
- Boot Virtual machines
- Mount ISO files
/etc/exports – defines the systems allowed to access the shared directory on the NAS
Options used in file are:
- Name of shared directory
- Subnet(s) allowed to access the share
- “rw:” Allows Read and Write
- “no_root_squash” – allows VMkernel to access the NFS volume using UID0 as root which usually is not allowed
- “sync:” – means all file wries must be committed to the disk before the write request by the client is actually copleted
NFS requires a VMKernal port just like iSCSI
Virtual Center installation
Highly recommended to install VMware License Server on same windows server as the virtual center server
Most critical component of Virtual Center is the Database
Security for the VirtualCenter Server is built on windows security. Active directory if in a domain or local computer if not in domain
Default ports needed
- 27000 and 27010 (only if license server is not on same machine)
- 1521 – Oracle DB
- 1433 – SQL DB
- 443 and 80 – Web Access and SDK Clients
- 902 – Virtual Center to managed hosts
- 443 – VIClient to Virtual Center
Require Separate License
- VMotion
- HA
- DRS
Virtual Center Architecture
Core Services – Management of Virtual machines and resources, task scheduler, statistics logging, alarms and events. VM Provisioning and host and VM configuration
Distributed Services – Vmotion, HA, DRS
Additional Services/Modules – Vmware Update Manager, Converter Enterprise
Database Interface
ESX Server Management – Virtual Center Agent automatically installed when host is added to virtual center
Active Directory Interface
VI API and VI SDK – allow third party applications
Hardware Requirements
Processor = 2 GHz Intel or AMD x86
Memory = 2GB
Storage = 560 MB minimum, Recommended 2GB
Network = 10/100 (1GB recommended)
Software Requirements
- 32 bit OS only
- Windows 2000 Server SP4 w/Update Rollup1
- Windows XP Pro SP2
- Windows 2003 Server SP1
- Windows 2003 Server R2
Database Requirements
Oracle 9iR2
Oracle 10gR1 (versions 10.1.0.3 and higher)
Oracle 10gR2
SQL Server 2000 SP4 or Enterprise Ed.
SQL Server 2005 (Enterprise SP1 or SP2)
SQL Server 2005 Express – Eval/Demo only
VI Client provides a database estimation calculator in which you enter the number of hosts and virtual machines in your inventory. This will give an estimated space size needed.
Microsoft SQL
- user needs to be sysadmin server role or db_owner fixed database role
- create ODBC connection of type DSN
- SQL server Authentication needs to be used unless DB is on same server
VMware License Server
- Licenses are stored on a license server
- Single host and centralized licenses can be combined
- 14-day grace period
Two virtual center editions
Foundation – up to 3 hosts
Virtual Center – Max number of hosts
14-Day Grace Period Details
After expiration you can’t do the following things
- Power on Virtual Machine
- Use VMotion
- Use HA
- Use DRS
These things are not supported During or after grace period
- All the above
- Add an ESX Server Host to Inventory
- Move an ESX Server Host into or out of Cluster
- Add or Remove License Keys
- Upgrade
Virtual Center communicates with ESX Server via vpxa
VI Client communicates with ESX Server via vmware-hostd
Web client connects to Web Access which is on both the Virtual Center Server and the ESX Host
443 and 902 are ports used for traffic
A Single Virtual Center Server can manage ESX Servers that are located in different geographical locations but connected by WAN link or VPN link
Virtual Center Server log files
- vpxd-#.log (10 kept, 0-9)
- vpxd-index.log
- located C:\windows\temp\vpx
If License server does not start, it is likely due to incorrect or corrupt license file
LMTOOLS program used to manage licenses
Virtual Center Inventory Hierarchy
The root folder is either “Hosts and Clusters” or “Virtual Machines and Templates” depending on the view selected. These are the two most common views. (others are Networks view and Datastores view)
Under the root folder is one or more data center objects
A Datacenter object is the primary organizational structure in the inventory
VirtualCenter Server can contain multiple datacenters
Folders can be added to help organize inventory items like servers by function or processor type. Don’t add two many folders and make too complex fo a hierarchy
Hosts that are grouped together are referred to as a cluster
VirtualCenter Server can support VMware DRS and VMware HA clusters which contain up to 32 ESX Servers
If you can’t add a host to virtual center you may need to check that vmware-hostd is running on the ESX server
Lockdown mode prevents use of VI Client when VirtualCenter server is installed, this is only available in 3i not 3.x
Scheduled tasks tab allows to schedule creaton of VMs, etc…at any time of the day.
The events and systems log tab help with troubleshooting problems
Sessions shows number of open connections to VirtualCenter Server
Maps panel shows relationship of items and is very helpful in verifying vmotion criteria are configured correctly.
Guided consolidation feature uses VMware converter and will
- Discover – servers that might be good candidates for VMS
- Analyze – how good of a candidate they would make
- Consolidate – Create the VMs from the Physical machines
The Client Settings option lets you adjust the timeout values for slow WAN connections, max number of virtual machine consoles, Hint Messages and getting started tabs.
VMware Files
VM_name.vmx = Virtual Machine config file
VM_name.vmdk = File describing virtual disk characteristics
VM_name-flat.vmdk = Pre-allocated virtual disk file that contains all the data
VM_name.nvram = Virtual machine Bios
Vmware.log = log File
Vmware-#.log = older log file (6 kept)
VM_name.vswap = swap file
VM_name.vmsd = Snap shots
To browse files, choose the ESX server and go to summary tab. Right click on the datastore were the files are located and choose Browse Datastore.
Virtual Hardware
ESX Server VMs lack USB and Sound Adapter Support
6 total PCI slots
5 usable PCI slots since one is Video
2 IDE controllers supported for up to 4 cd-rom drives
1,2 or 4 CPUs per VM with SMP
64 GB Memory Max per VM
Many guest OS/application combinations are not enhanced by the additional CPU. Two or four VCPU VMs should be created only in the comparatively infrequent instances where they are of benefit, not as a standard config
LSILogic or BusLogic Adapter is chosen automatically for you based on choice of guest OS.
ESX Server virtual disks are monolithic meaning if you create a 6GB virtual disk it creates a single file that is 6GB in size.
Virtual CD-ROM drive or floppy drive can point to either the CD-ROM drive or floppy drive located on the ESX Server, a CD ISO image (.iso) or floppy (.flp) image or even the CD-ROM or floppy drive on your local system
Ctrl + Alt + Ins is how to send Ctrl + Alt + Delete to the vm console
VMware tools should always be installed and allow:
- device drivers if necessary
- Manual connection/disconnection of some devices while powered on
- Improved mouse
- Memory Management
- Quiescing a file system
- Time Sync
- Ability to gracefully shut down machine from VirtualCenter
Most visible benefit is better video performance and that you can move your mouse pointer freely into and out of the VM console window
VMware Converter Enterprise
Can import almost any Windows Os into ESX server
- import physical machines to VMs
- import non-ESX VMware VMs
- Import Microsoft Virtual Server 2005 VMs
- Convert 3rd-party backup or disk images to VMs
- Restore VCB images to VMs
- Export Virtual Center VMs to other non-ESX VMware formats
- Reconfigure Virtual Center VMs so they are bootable
- Customize Virtual Center VMs
Physical to Virtual = P2V
- Great for consolidating, testing and troubleshooting and disaster recovery
Windows OS’s can be resized during conversion other OS’s can only be cloned as is
Three Components
- Server
- CLI – Command line interface
- Agent
Server is installed separately on the same server as virtual center or different server. Is included on VirtualCenter CD.
Client is installed via plugin in Virtual Center
Hot Cloning- Four Stages
1.) Prepare source machine
a. Install enterprise agent
b. Snap shot is taken of volumes
2.) Prepare Virtual machine on Destination
a. Create a new virtual machine on ESX server
b. Volumes copied from source machine to ESX server
3.) Completing Conversion
a. Required drivers install to allow os to boot
b. Customizations are applied
4.) Cleaning up
a. Snap shot is removed from source
b. Agent is automatically or manually uninstalled from source
Cold Cloning – Four Stages
1.) Prepare source machine
a. User boots with coldclone cd image and defines conversion values
b. Source volumes copied into RAM
2.) Prepare Virtual machine on Destination
a. New virtual machine is created
b. Volumes copied from source to ESX
3.) Completing Conversion
a. Required drivers install to allow os to boot
b. Customizations are applied
4.) Cleaning up
a. User removes CD and reboots physical server
Two different cloning modes:
1.) Volume-based
a. Available for Hot and Cold
b. Cloning is done on block-level basis if maintean the volume size. If volume is resized to be smaller, cloning is done on a file level basis
2.) Disk-based
a. Only available with cold cloning and VM imports
b. Entire disk is copied as is, cloning transfers all sectors from all disks, preserving all volume metadata
Common problems caused by
- insufficient privileges
- correct ports are not open (445,139,903,443)
- boot.ini is set to read only
Managing VMs
Cold migration moves a powered off vm to another esx server or a different datastore on same esx server.
May or may not move disk depending if you have shared storage between ESX servers
Snapshots
- Useful when you need to revert repeatedly to the same state, useful in Test and Dev
Memory snapshot is created only when VM is powered on. By default snapshot of memory is taken
Snapshots include the following files
VM_name-0000#delta.vmdk – (Snapshot difference file)where number is the next
number in the sequence starting with 1.
VM_name-0000#.vmdk – (Snapshot description file)
VM_name-Snapshot#.vmsn – (Memory state file) size of the file is the size of the VM’s
max memory
Snapshot manager manages the snapshots and allows the following tasks
- Delete – commits the snapshot data to the parent snapshot, then removes the selected snapshot
- Delete All – commits all the immediate snapshots before the current stat icon “You are here” to the base disk and removes all existing snapshots for that virtual machine
- Go to – allow you to restore a particular snapshot.
Modifying Virtual Machine settings
A virtual disk can be added to the VM while powered on
VM must be shut down for anything that would require the machine shutdown in a physical environment, like adding a NIC
A disk can be resized, but the VM must be shut down.
Raw LUNs
Allow VM clustering across boxes, or physical to virtual
Enable use of SAN management software inside guest OS
An RDM (Raw Device Mapping) allows a special file in a VMFS volume to act as a proxy for a raw device
The raw disk mapping is a special file that lives in a datastore (VMFS and NFS) and points to the actual SAN LUN
A Raw disk runs in one of two modes:
Physical Compatibility – allows the guest operating system to directly access the hardware and is useful if you are using SAN-aware applications in the virtual machine
Virtual Compatibility – allows the LUN to behave as if it were a virtual disk so you can use features like snapshoting, cloning and creating templates.
Physical – can’t clone, create template or migrated if it envolves moving disk
Raw disk mapping have following files
Compatibility Mode:
- VM_name_#.vmdk
- VM_name_#.rdm.vmdk
Physical Mode:
- VM_name_#.vmdk
- VM_name_#rdmp.vmdk
General Options
If you change the display name of a virtual machine, it doesn’t rename the file names.
VMware Tools
-Customize power button actions
-When to run scripts
-Update checks and time syncs
Advanced boot options are useful to delay power on to stager vms starting up and to enable the vm to boot to bios for bios changes
Each host or cluster can have a custom swapfile datastore location defined
Guided Consolidation
Lowers training requirements for new virtualization user
Recommended for small, simpler environments (small to medium businesses with 100 physical servers or less)
Allows new users to quickly realize the benefits from server consolidation
Can discover and analyze only Windows server-family OSes
No Agent software is involved
You can choose not to install during Virtual Center Installation
(2) services, DataCollector and VMware Converter
Data Collector service runs under the name of “VMware Capacity Planner Service”
It uses a hidden database (not user managed) when collecting data
The data collector service uses LAN Manager or Active Directory if available, thus must have read permissions on the Active Directory.
User also needs local windows admin priviledge in addition to read only on AD
Discovery of Systems is repeated periodically
- Every ½ check for new servers
- Every day check for new domains
You can specify credentials on a system by system basis and a default credential that should be used on all systems
Metrics
- Metrics are collected every hour
- 10 to 12 metrics total
- Data is put into table in VirtualCenter DB
Confidence level – Based on the number of performance samples that the VC has collected. As the VC collects more performance samples, the confidence level goes up.
Data collector is agent less so nothing is installed on remote computers
If target systems are behind firewall, incoming WMI ports need to be open (135, 137,138, 139,a nd 445)
Access Control
Main compenents of security model
- user/group – account with access to Virtual Infrastructure
- Role – A set of one or more privileges
- Privilege – specifies a task a user/group is authorized to perform
- Permission – paring of a user/group and role
There are approximately 100 defined privileges
VirtualCenter users/groups are those that are defined either on the local windows server or in the Active Directory if on a domain
ESX server users/groups are those defined in its service console
No attempt is made to reconcile these users/groups
Privileges are the building blocks of roles
Roles are collections of privileges
Roles can and should be set to propagate down to child objects
A role is neither superior to or subordinate to another role. All roles are independent of each other
ESX default Server Roles
- No Access
- Read-Only
- Administrator
VirtualCenter default Roles
- No Access
- Read-Only
- Administrator
- Virtual Machine Administrator
- Datacenter Administrator
- Virtual Machine Power User
- Virtual Machine User
- Resource Pool Administrator
- VMware Consolidated Backup User
Best Practices paper on Managing VMware VirtualCenter Roles and Permissions. http://www.vmware.com/pdf/vi3_vc_roles.pdf
Permissions can be overridden at a lower level by adding a new permission to the same user
Scenario 1
If a user is a member of multiple groups with permissions on different objects, for each object on which the group has permissions, the same permissions apply as if granted to the user directly
Datacenter level
Group 1 = Administrator
- Greg
- Susan
Server Level
Group 2 = Read Only
- Greg
- Carla
In this scenario, Greg will have administrative permission on everything in the datacenter, except the on server that group 2 is assigned to. Here he will have Read Only access.
Scenario 2
If a user is a member of multiple groups with permission on the same object, the user is assigned the union of privileges assigned to the groups for that object
Group1 = VM_Power_On
- Greg
- Susan
Group 2 = Take_Snapshots
- Greg
- Carla
These groups are both assigned to the Datacenter level. Greg will have both privileges on the datacenter level.
Scenario 3
Permissions defined explicitly for the user on an object take precedence over all group permissions on that same object.
Datacenter Level
Group1 = VM_Power_ON
- Greg
- Susan
Group 2 = Take_Snapeshots
- Greg
- Carla
Greg = Read Only
In this scenario, even though Greg is a member of group 1 and 2, since the user was explicitly defined as read only at the datacenter level, he will only have read only access on everything in the datacenter.
Scenario 4
Permissions applied directly to an object override inherited permissions
If Greg is assigned as a VM User at the Datacenter level, but is also assigned as an administrator to a specific VM, the end result is that Greg will have admin priviledges on the one VM since it overrides the propagated VM User permissions set at the Datacenter level
VirtualCenter Security Model – Relies on Windows user accounts, local or domain
Local windows administrators are automatically assigned the Administrator role at the topmost level of the Inventory.
ESX Server Security Model – ESX user is a service console (Linux) user account.
User accounts, roles and permissions can be configured using the VI Client connected directly to the ESX Server
ESX Server users, root and vpxuser are assigned the Administrator role at the ESX Server Level.
If lockdown mode is enabled, it will prevent the ESX root user from logging directly into the ESX Server using the VI Client. Root user still has ability to log into the ESX Server using secure shell.
Resource Management
CPU Resource settings
Limit – A cap on the consumption of CPU time by this VM, measured in MHz
Reservation – A certain number of CPU cycles reserved for this VM, measured in MHz
The VMkernel chooses which CPU(s) and may migrate
Shares – More shares means that this VM will win competitions for CPU time more often
VCPUs simultaneously scheduled – All virtual cpus are scheduled at same time. This
means that if you have 4 vcpus and you have a reservation of 1000, each vcpu
gets 250 MHz
Mermory Resource settings
Available Memory – Memory size defined when the VM was created
Limit – A cap on the consumption of physical memory by this VM, measured in MB
Reservation – A certain amount of physical memory reserved for this VM.
Shares – More shares means that this VM will win competitions for physical memory
more often
VMkernel allocates a per-VM swap file to cover each VM’s range between available memory and reservation
Memory can be stolen from a VM until it gets to the Reservation. ESX server can not take back memory from a machine once it is down to the Reserved limit.
ESX server can steal memory from a computer that is between the limit and the reservation. If it was being used, ESX creates a local swap file for memory ballooning.
A virtual machine will power on only if its reservation can be guaranteed
CPU Shares
High: # shares = 2000 * (#of vCPUs)
Normal: # shares = 1000 * (# of vCPUs)
Low: # shares = 500 * (# of vCPUs)
Custom: # shares = user-specified value
Memory Shares
High: # shares = 20 * size of VM’s available memory
Normal: # shares = 10 * size of VM’s available memory
Low: # shares = 5 * size of VM’s available memory
Customer: # shares = user-specified value
Resource Pools
Used on a stand alone host or VMware DRS enabled clusters
Provides resources for VMs and child ppos
A resource pool allows you as the administrator to divide and allocate resources to VMs and other resource pools
The topmost resource pool is known as the root resource pool
Resource pool consists of the CPU and memory resources of a particular ESX Server ro VMware DRS cluster
Configuring Resource Pools
Shares – Low, Normal, High, or custom
Reservations – in MHz and MB
Limits – in MHz and MB. Unlimited access by default (up to maximum amount of
resources accessible
Expandable Reservation – Yes = VMs and sub-ppols may draw from this pool’s parent
No = VMs and sub-pools may only draw from this pool, even
if its parent has free resources
Virtual machines do not have expandable reservation. Expandable reservations can only be set at the resource pool level
The root resource pool is the topmost resource pool and is comprised of the sum of all MHz for all CPUs and the sum of all the installed RAM (in MB) available in the compute environment (standalone host or cluster)
Except for the root resource pool, every resource pool has a parent resource pool. A resource pool might contain child resource pools or just VMs that are powered on within it
A child resource pool is used to allocate resources from the parent resource pool for the child’s consumers
A child resource pool cannot exceed the capacity of the parent resource pool unless there is an expandable reservation on the parent and resources available high up the hierarchy
Expandable Reservations
Borrowing resources occurs recursively from the ancestors of the current resource pool.
- as long as the expandable reservation option is selected
- Offers more flexibility but less protection
Expandable reservations are not released until the VM that caused the expansion is shutdown or its reservation is reduced.
An expandable reservation could allow a rogue administrator to claim all unreserved capacity in the environment.
Expandable reservations allows a resource pool that cannot satisfy a reservation request to search through its hierarchy to find unreserved capacity to satisfy the reservation request.
Use expandable reservations carefully. A single child resource pool may use ALL of its parent’s available resources, leaving nothing directly available for other child resource pools.
VMotion
Discs aren’t moved, just state information and current memory content
State information – the current memory content and all the information that defines and identifies the virtual machine
Memory content – includes transaction data and whatever bits of the operating system and applications are in memory
The definition and identification information stored in the state includes all the data that maps to the virtual machine hardware elements, such as BIOS, devices, CPU, MAC address for Ethernet cards and so forth
Steps
1.) vmotion is initiated by user
2.) Virtual machines memory is copied to destination VM, user can still access original VM. Meanwhile changes in memory are recorded in bitmap table
3.) After most of the VM’s memory is copied from the source to the target host, the VM is quiesced, meaning the VM is taken to a state where no additional activity will occur on the VM. The quiesce time is the only time in the VMotion procedure in which the VM is unavailable to users and is very minimal amount of time
4.) VM device state and the memory bitmap containing the list of pages that have C Changed are transferred over during this time
5.) Immediately after the VM is quiesced on the source host, the VM is initialized an starts running on the target host. Additionally, a RARP (reverse ARP) request notifies the subnet that VM A’s MAC address is now on a new switch port
6.) Users are now accessing the vm on the new ESX server and the original VM on the old ESX server is deleted at this time
A virtual machine’s entire network identity, including MAC and IP address is preserved across a VMotion
Reason for Migration Error Message
- VM has an active connection to an internal virtual switch
- VM has an active connection to a CD-ROM or floppy device with a local image mounted
- VM has its CPU affinity set to run on one or more specific physical CPUs
- VM is in a cluster relationship with another VM
Reason for Migration warnings
- VM is configured with an internal virtual switch but is not connected to it
- VM is configured to access a local CD-ROM or floppy image but is not connected to it
- VM has one or more snapshots
- No guest OS heartbeats are being received
Source and destination ESX servers must have
- Visibility to all SAN LUNs (either FC or iSCSI) and NAS devices used by VM
- Gigabit Ethernet backplane
- Access to the same physical networks
- Consistently labeled virtual switch port groups
- Compatible CPUs
The vswitch port group names have to match exactly (the match is case-sensative)
CPU Constraints
Does Not require exact match
- CPU clock speed
- Cache sizes
- Hyperthreading
- Number of Cores
- Virtualization Hardware Assist (32-bit)
Does require exact match
- Manufacturer (AMD vs. Inter)
- Family (P3, P4, Opteron)
- Presence or absence of SSE3 or SSSE3 instructions
- Virtualization Hardware Assist (64-bit Intel)
- Execution-Disabled (Nx/Xd) bit
Default values for the CPU compatibility masks are set by VMware to guarantee the stability of the virtual machines after a VMotion migration
VMware provides a CPU compatibility tool that allows you to check CPU compatibility of hosts participating in a VMotion migration
It can be made into a bootable cd as well
Maps
Use maps to verify that the source and target ESX servers satisfy the VMotion requirements that pertain to shared data stores and networks, display a map that shows
the relationships between the hosts, data stores and networks
DRS
Cluster – A collection of ESX Server hosts and associated VMs
DRS-enable cluster
- Managed by VirtualCenter
- Balances virtual machine load across hosts in cluster
- Enforces resource policies accurately
- Respects placement constraints (affinity and anti-affinity rules)
A maximum of 32 hosts per cluster is supported
Automation Level
- Manual = Initial VM placement and dynamic balancing manual
- Partially Automated = Initial VM placement automated, but balancing is manual
- Fully Automated = initial placement and dynamic balancing are automatic
DRS will show recommendations for Initial VM placement or dynamic balancing when they are set to a manual state
Five Migration Thresholds
Level 1 – most conservative – Applies only five-star recommendations. This level applies recommendations that must be followed to satisfy constrains such as affinity rules and host maintenance
Level 2 – moderately conservative – Applies recommendations with four or more stars. This level includes Level 1 plus recommendations that promise a significant improvement in the cluster’s load balance.
Level 3 – midpoint (default) – Applies recommendations with three or more stars. This level includes Level 1 and 2 plus recommendations that promise a good improvement in the cluster’s load balance
Level 4 – moderately aggressive – Applies recommendations with two or more stars. This level includes Level 1-3 plus recommendations that promise a moderate improvement in the cluster’s load balance.
Level 5 – aggressive – Applies all recommendations. This level includes Level 1-4 plus recommendations that promise a slight improvement in the cluster’s load balance
DRS Placement Costraints
- Affinity and Anti-Affinity rules can be made to specify that some servers should never or always be located on the same physical server
DRS should try to keep certain virtual machines together on the same host (for example for performance reasons)
DRS should try to make sure that certain virtual machines are not together (for example, multiple database servers on same system) and for extra redundancy.
You can customize the automation level for individual virtual machines in a DRS cluster to override the automation level set on the entire cluster
If a virtual machine is set to Disabled, Virtual Center does not migrate that virtual machine or provide migration recommendations for it.
To add a ESX server to a DRS cluster, drag and drop it into the cluster and then use the add host wizard to complete the process
Best practices for DRS
- When DRS makes strong recommendations (typically 4 or 5 star) follow them
- Enable Automation
Resource pools in a DRS cluster
Resource pools can be created only on ESX standalone hosts or VMware DRS-enabled clusters. Clusters that have only VMware HA-enabled (and not VMware DRS) cannot use resource pools
A pool can reflect any organizational structure that makes sense to you such as a pool for each department or a project or a client, etc. You can associate access control and permissions to different levels in the resource hierarchy
The key to understanding and using delegation is to understand roles and their privileges. It will be very beneficial to use the VI Client to explore and gain familiarity with the privileges assigned to each role.
Monitoring cluster usage
Valid – A cluster is valid unless something happens that makes it overcommitted or invalid. In a valid cluster, there are enough resources to meet all reservations and to support all running virtual machines
Overcommitted (yellow) – Cluster becomes overcommitted if it does not have enough capacity to satisfy the constraints it was originally configured with. One thing that can cause this is if a ESX server goes down and resources are lost
Invalid (Red) – A cluster enabled for DRS becomes red when the tree is no longer internally consistent and does not have enough resources available. The total resources in the cluster have nothing to do with whether the cluster is yellow or red. It is possible for the cluster to be DRS red even if there are enough resources at the root level, if there is a an inconsistency at a child level. For example, a DRS cluster turns red if the virtual machines in a fixed resource pool use more resources than the reservation of that resource pool allows
When adding a host to a cluster, choose to create a new resource pool for this host’s virtual machines and resource pools.
By default the resource pool created to represent the host’s resources is named “Grafted from host_name”.
Maintenance mode restricts VM operations on the host so that VMs can be shut down or VMotion’ed
Applies to both standalone hosts and clusters
Place an ESX server into maintenance mode if you are going to
- shut down the esx server
- add the esx server to a cluster
- remove the esx server form a cluster
When in maintenance mode, no new virtual machines can be powered on and no virtual machines will be migrated to this host
If a DRS cluster is set to fully automated level, the VMs on the server that is placed in maintenance mode will automatically be moved off that server onto the remaining host(s) in the cluster. If the DRS cluster is set to the partially automated level, the administrator has to manually move the VMs to a new host or power them down.
Resource Optimization
CPU Cycles
- Hyperthreading and Load Balancing performed by VMkernal
- Owner can configure SMP (multiple procs)
- Administrator can set limits, reservations, share allocation and processor affinity
RAM
- Transparent page sharing vmmemct1 and VMkernel swap performed by VMkernal
- VM owner determines available memory
- Administrator can set limits, reservations and share allocations
Disk Bandwidth
- Administrator sets share allocations
Network Bandwidth
- VM Owner configures virtual switch with teamed NICs
- Administrator performs traffic shapping
A “Hardware execution Context (H.E.C.) is a processor’s capability to schedule one thread of execution.
When a VCPU needs to be scheduled, the VMkernal maps a VCPU to a hardware execution context.
Hyperthreading provides more hardware execution contexts for VCPUs to be scheduled, but it does not double the power of the core. If two processor request require the same part of the CPU, with hyperthreading one will have to wait just as if there was only one core.
VMkernel dynamically schedules virtual machines and the service console
Service console always runs on the first H.E.C
For multi-VCPU, CPU-intensive VMs, the VMkernel tries to avoid scheduling their VCPUs on hardware execution contexts in the same core.
Transparent Memory Page Sharing
VMkernel detects identical pages in VMs memory and maps them to the same underlying physical page.
This doesn’t happen immediately on start up, but will kick in after a little time has elapsed
If any VM tries to modify a page that is shared, the VMkernel will create a new, private copy for that VM, and then map that page into address space of that VM only.
Balloon Driver
Vmmemctl is the ballon driver in the guest os
It will deallocate memory from selected virtual machines when RAM is scarce. This inflates the balloon. When the memory is no longer scarce, the OS will get it back and the balloon will deflate
By default, up to 65% of a VM’s memory can be taken away during the ballooning process, subject of course to the memory reservation setting
Each powered on VM has its own VMkernel swap. Use of the VMkernel swap is a last resort since performance will be noticeably slow
The size of the VMkernel swap file is determined by the difference between how much memory the virtual machine can use (its limit, if no limit is defined, or the amount configured into the virtual hardware) and how much RAM is reserved for it (its reservation.
When VM is powered off, the vmkernal swap file is deleted. It is recreated when the VM is powered back on.
Monitoring VM Performance
Performance tuning methodology
- Assess performance
- Identify the limiting resource
- Make more resources available
- Benchmark Again
Don’t make casual changes to production systems
Performance Graphs will show you realtime and historical usage levels of most hardware resources
You can export the graphs and tear them off to compare side by side with others.
You can control a virtual machine’s access to CPU and memory at three levels.
- Cluster level (if exists)
- Resource Pools
- Directly on the VMs
The key indicator of a virtual machine losing competition for CPU time is “CPU ready” time in its CPU resource graph. Ready time refers to the interval when a virtual machine is ready to execute instructions, but cannot because it cannot get scheduled onto a CPU. This is only available in Real Time
White paper on VMware ESX Server 3 – Ready Time Observations http://www.vmware.com/pdf/esx3_ready_time.pdf
Memory constraints can be determined by checking for high ballooning activity. This graph is also only available in real time
If you suspect that a VM is constrained by disk access
- measure the effective bandwidth between VM and the storage
- Measure the resource consumption using performance graphs
Use a tool like iometer to measure the maximum throughput via the current path to the storage.
Disk graph is real time only
The IOmeter program is also good for determining network bandwidth.
Performance based Alarms
Alarms are asynchronous notifications of changes in host or virtual-machine state
You can also configure VirtualCenter to transmit these messages to external monitoring systems
When you right-click on a virtual machine and choose Add Alarm…, the resulting window has four panels. Visit the general panel to to name the alarm. Visit the Triggers panel to control which load factors are monitored and what the threshold for the yellow and red states are. The other two tabs are reporting and actions
Host based alarms are similar to vm alarms, but with different triggers
Alarm reporting can be adjusted so that it only alerts after a given interval of time and doesn’t alert unless the interval is large enough. This can prevent flooding email, pages, etc…with alarms and duplicate alarms.
Actions are used to send external messages or to respond to problems proactively
You can add custom alarms anywhere in the inventory
You might organize several hosts or clusters into a folder and apply an alarm to a folder
The VI Client reports changes in the host or VM state in its inventory panel
Backup Strategies
Backup files within the virtual machine as well as the bootable virtual machine itself
At the image level, perform backups periodically for Windows and Linux. For example, back up a boot disk image of a Windows virtual machine once a week.
At the file level, perform backups once a day. For example, back up files on drives D, E, and so on every night.
Although you might consider backing up the service console, it doesn’t need to be backed up as frequently as the virtual machines and their data. ESX service console can be reinstalled fairly quickly
General Guidelines for VM Backups
- Store application data in separate virtual disks from system images
- Use backup agents inside guest OSes for application data
- If windows, perform VCB file level backups
- Use full virtual machine backups for system images or plan to redeploy from templates
VMware Consolidated backup (VCB) addresses most of the problems you encounter when performing traditional backups. Consolidated Backup helps you to:
- Reduce the load on you ESX Servers by moving the backup tasks to one or more dedicated backup proxy servers
- Eliminate the need for a backup window by moving to a snapshot based backup approach
- Simplify backup administration by making optional the deployment of backup agents in each virtual machine you backup
- Back up virtual machines that are powered on
Consult the Virtual Machine Backup Guide available on the VMware web site
Virtual Machine High Availability
Three main clustering schemes
1.) Cluster-in-a-box – Two VMs clustered within one ESX server
a. Protects against operator error, application and OS crashes
2.) Cluster-across-boxes – One VM in two separate ESX servers
a. Protects against operator error, application and OS crashes
b. Shared storage require
3.) Cluster between physical and virtual machines
a. Low-cost N+1 redundancy
b. Shared storage required
VMware HA
- Automatic Restart of virtual machines in case of physical server failure
- Provides high availability while reducing the need for passive stand-by hardware and dedicated administrators
- Configuration and management done through VI Client
- Experimental (not supported) support for restarting failed VMs
VMware HA continuously monitors all servers in a cluster and detects server failures. An agent placed on each server mainteains a “heartbeat”.
- heartbeats are sent every 5 seconds
- heartbeat timeout is 15000 milliseconds or 15 seconds
VMware HA uses the heartbeat information that VMware Tools captures to determine virtual machine availability.
- VMTools sends heartbeat every second
- Virtual Machine Failure Monitoring checks for a heartbeat every 20 seconds
Virtual machine will restart if heartbeat is not received in user configurable timeframe
Virtual Machine Failure Monitoring is experimental and not supported for production use. By default, Virtual Machine Failure Monitoring is disabled.
Two VMware HA Prerequisites
- Each host in the cluster should have access to the virtual machines’ files and should be able to power on the VM with no problem
- Host must be configured for DNS. DNS resolution of the host’s fully qualified domain name is what VMware HA relies on.
Make sure service console is redundant since the heartbeats rely on the network. This can be done by adding a second service console port and Vswitch (on a separate network) or adding at least two vmnics to the switch with the one service console port
Following ports need to be open for heartbeat traffic:
- Incoming: TCP/UDP 8042-8045
- Outgoing: TCP/UDP 2050-2250
VMware HA + DRS is a reactive + proactive system
Two cluster-wide policy settings
1.) Number of host failures allowed
2.) Admission control
Admission control policies define when or when not to power on a VM. Refers to Resource reservations required and number of host failures to tolerate
The actual spare capacity available can be monitored in the “current failover capacity” field in a VMware HA cluster’s Summary tab in the VI Client
Restart Priority is based on virtual machines and factor in dependencies. High should be specified for those you want to come online first. Default priority is medium. Disable should be chosen if you don’t want a server to start back up in the event of an ESX host failure.
Cluster nodes are designated as Primary or Secondary nodes. Primary nodes maintain a synchronized view of the entire cluster. There can be up to five primary nodes per cluster. Secondary nodes are managed by the primary nodes
Network failures can cause “split-brain” conditions. In such cases, hosts are unable to determine if the rest of the cluster has failed or has become unreachable.
Isolation response can be configured on an individual VM basis. Power off is the default response, but options include Leave powered on or use cluster setting.
Planning for Deployment
Check the compatibility guides before deploying hardware.
http://www.vmware.com/support/pubs/vi_pubs.html
Consider the peak load that virtual machines place on the “core”resources
- RAM
- CPU
- DISK
- NETWORK
Calculate the resources that each virtual machine will need in order to run
Plan the ESX service console partition sizing and if you will boot from local or SAN
If SAN, make sure it is supported, LUN is available only to the ESX server for this purpose and that the HBA is configured correctly
A single Virtual Center server with minimum hardware requirements is recommended for supporting up to 20 concurrent client connections, 50 managed hosts and 1000 virtual machines.
Virtual Center can support up to 200 hosts and 2000 virtual machines
Do not use SQL 2005 express. Use SQL or Oracle
Determine how you are going to organize the inventory view of Virtual Center
- Group hosts in a datacenter that are under a single administrative control
- Group hosts in a datacenter that meet VMotion requirements
- Group hosts in a cluster to form a single pool of resources
- Group VMs into folders e.g. by business unit or function
Storage Considerations
ESX Server Feature Comparison by storage type
Fibre Channel
- Can do everything
- Boot VM, Boot ESX Server, VMotion, VMFS, RDM, VMCluster, VMwareHA/DRS, VCB
iSCSI
- Can do everything except VM Cluster
NAS
- Can’t Boot ESX Server
- Can’t format as VMFS
- Can’t use VM Cluster
- Can’t RDM (Raw Disk Mapping)
- Can’t use VCB
Local Storage
- Can’t use Vmotion
- Can’t use VMCluster
- Can’t us HA or DRS
- Can’t us VCB
One VMFS volume per LUN
It is best to use a LUN for one purpose at a time
NFS Considerations:
- Use “no_root_squash”
- 8NFS mounts per ESX Server allowed by default. This can be increased to 32
- Avoid VM swapping to NFS volumes
It is a common practice to create RAID volumes with seven disks or less. In RAID volumes consisting of more than seven disks, the overhead of parity calculation can overwhelm any performance benefit
Use preferred paths to setup your ESX Server so that various LUNs are accessed over various paths (if Active/Active SAN)
- ESX 3.5 and 3i (3i is embedded)
- Virtual SMP (multi processor support for VMs, up to 4)
- HA (High Availability)
- Vmotion
- DRS (Distributed Resource Scheduler)
- VMFS (VMware proprietary file system)
- VCB (consolidated backup
- Update Manager
- Storage Vmotion
- Virtual Center/Server
Blueprint for becoming VCP and VI3 certified http://mylearn1.vmware.com/portals/certification
Four main disk files
- Configuration File
- Virtual Disk File
- NVRAM file
- Log File
An operating system that has been virtualized is called a “Guest” operating system
Hosted vs. Bare Metal Hypervisor Architecture
- Hosted installs and runs the virtualization layer as an application on top of an operating system
- VMware Server is a free application that can be installed on a supported Windows or Linux system and provides host-based virtualization
Hypervisor (bare metal) – installs the virtualization layer directly on a clean x86 based system.
White paper for fully understanding Full Virtualization, Paravirtualization and Hardware Assist http://www.vmware.com/files/pdf/VMware_paravirtualization.pdf
Compatibility & System Requirements
- Supported on Intel processors, Xeon and above or AMD Opteron (32-bit mode) processors
System Compatibility http://www.vmware.com/support/pubs/vi_pubs.html
Virtual Infrastructure components
- Virtual Center Management Server
- Virtual Center Database
- Virtual Center License Server
- Virtual Center Agent – On each managed host, software that collects, communicates and executes the actions received from the Virtual Center Server. Installed the first time any host is added to the Virtual Center Inventory
Similar product offerings
- VMware Virtual Desktop infrastructure – using VMs on large server as desktops for thin clients
- WMware Lab Manager
VMware Technology Network (VMTN)
http://www.vmware.com/vmtn
ESX supported boot methods
- local SCSI
- local IDE/ATA or SATA
- network boot
ESX Server Hardware Requirements
- (2) processors 1500MHz or higher x86
- (1) GB RAM (256 GB Max)
- (1) or more NICS (10Gig E supported)
- Disk Storage, local or SAN
Partitions created on Install
/boot = 100 MB
/ = 5GB (15GB in our install)
(Swap) = 544 MB (1.6GB in our install)
/var/log = 2GB recommended, 500 minimum (4GB in our install)
(VMcore) = 100 MB
/vmfs/volumes = Left over disk space for VMs and ISO images
Locations for storing ISO images
- VMFS datastore
- NFS datastore
- /vmimages directory on service console
New install will reformat the boot disk and install new software and config files. Upgrade will allow preservation of an existing ESX Server install and maintain all current config files and directories
Always disconnect SAN before install
Make sure the same disc (dell is Sda) is chosen at first warning prompt, when selecting how you want to partition the disks for the system and when selecting the boot device.
If you select “Crate a default network for virtual machines” during the install, your virtual machines will share a network adapter with the service console which is not recommended.
Licensing
Evaluation Mode – Full featured for 60 days from install. Does not require any licensing configuration
Serial Number – Used to license 3i only, not 3.x
License Server – Centralized licensing, only (1) license file
Host License – License files stored on individual ESX server. (1) license per host. Types are foundation, standard and enterprise
NTP configuration
- For accurate graphs
- For accurate log time stamps
- For accurate time in VM systems
- Doesn’t sync to Server Hardware Clock
SSH Access
- By default, the service console does not allow the root user account to log in using an SSH client. However, it does permit normal user account login access using secure shell and you can then “su” to become root user. This requires creating a none root user account vi the Virtual Center interface
Main reason for ESX server problems
- Defective Hardware (memory, cpu)
- Misconfig during install (wrong nic chosen, wrong boot device)
- Inadequate planning
Compatability guides http://www.vmware.com/support/pubs/vi_pubs.html
Purple Screen of Death (PSOD)
- Machine check exception (usually with CPU problem)
- NMI ECC or Parity Error (memory failures)
What to do if ESX Server Crashes
- screen grab PSOD
- Check Environment (heat, etc…)
- Check for detached hardware
- Gather logs
o Via service console run “vm-support”
o Via Virtual Center
§ File -> Export -> Export Diagnostic Data
The VMkernal allows the virtual machines as well as the service console access to the system’s hardware.
Networking
Virtual Switches allow access to the service console, VM network connectivity and access to IP storage
A virtual switch provides connections for VMs to communicate with each other, whether they are on the same host or different host
Each Service console port and each VMkernal port must be configured with its won IP address, netmask and gateway. There can be multiple of each, but they have to be on different networks
Default ports = 56 (64 – 8)
Default for vswitch0 = 24 (created during install)
Max ports = 1016
The MAC address of a physical NIC is not used at all. Instead, each VM’s virtual NIC has its own MAC address
Three Port Types
- Service Console Port – access to ESX Server management network
- VMkernal port – access to VMotion, iSCSI and/or NFS/NAS networks
- Virtual Machine Port Group – access to VM networks
Virtual NIC on service console is vswif0
All virtual switches are known as vSwitch#
Console command to determine PCI address of a NIC
- “esxcfg-nics –l”
Physical NICs are assigned at the virtual switch level
Three Network policies (configurable tabs)
- Security
- Traffic shaping
- NIC teaming
- Configured at the switch and port group level
- Port group level overrides switch level
VLAN benifits
Improved Security – switch only presents frames to those stations in the right VLANS
Improved Performance – each VLAN is its own broadcast domain
Lower cost – less hardware required for multiple LANs
ESX Server supports IEEE 802.1Q VLAN Tagging
Packets from a VM are tagged as they exit the virtual switch and are cleared (untagged) as they return to the VM
Network Policy - Security
Configured at both port group and vswitch level
Promiscous Mode – Use for IDS (intrusion detection), default is reject
MAC Address Changes – required by clusters and some software firewalls. Default is accept, but we set to reject.
Forged Transmitts – when set to reject drops any frames which the guest sends where the source address field contains a MAC address other then the assigned virtual NIC Mac address. Default is accept, but we set to reject.
Network Policy – Traffic Shaping
- controlling outbound network traffic only
- Average Rate - Kbps
- Peak Rate - Kbps
- Burst Rate - KB
Average bandwidth and peak bandwidth are specified in Kbps (kilobits per second) while burst size is specified in KB (Kilobytes).
Network traffic shaping is off by default
Network Policy – NIC Teaming
Provides Load Balancing and Network Failure Detection
Three Load Balancing Methods
1.) vSwitch Port- Based
a. Route based on the originating Port ID (default)
b. VMs outbound traffic is mapped to a specific physical NIC
c. Simple, fast and does not require the VMkernel to examine the frame
d. Spreads out traffic evenly across the physical NICs
e. No single-NIC VM will ever get more bandwidth than can be provided by a single physical adapter
2.) Source MAC-Based
a. Each VMs outbound traffic is mapped to a specific physical NIC based on the virtual NICs MAC address
b. Low overhead
c. Compatible with all switches
d. Might not spread traffic out evenly across the physical NICS
e. No single-NIC VM will ever get more bandwidth than can be provided by a single physical adapter
3.) IP – Based
a. NIC for each outbound packet is chosen based on its source and destination IP address
b. Higher CPU overhead
c. Not compatible with all switches
d. Has better distribution of traffic across physical NICs
e. A single-NIC VM might use the bandwidth of multiple physical adapters
Detecting and Handling Network Failure
- Link state only
- Link state + beaconing
Load Balancing option – use explicit failover order or override
Rolling Failover – how a physical adapter is returned to active duty after recovering
When using explicit failover order, always use the highest order uplink from the list of Active adapters which passes failover detection criteria
If rolling failover is set to no, the adapter is returned to active duty immediately upon recovery. If rolling is set to Yes, a failed adapter is left inactive even after recovery until another currently active adapter fails.
Storage
Fibre Channel switches interconnect multiple nodes to form the “fabric” in a Fibre Channel SAN
Fibre Channel encapsulates SCSI commands
Fibre Channel SAN consists of:
- storage system
- LUN
- SP (Storage Processor(s))
- HBA (host bus adapater)
- FC Swithces
WWN = Word-Wide Name - Unique, 64 bit address assigned to Fibre Channel node
Example: 50:06:01:60:10:20:AD:87
Controlling hosts’ access to LUNs
- Soft zoning – done on a Fibre Channel switch, controls LUN visibility on a per
WWN basis
- Hard zoning – control of storage-processor visibility ona per switch port basis.
Word Wide Names (WWNs) are assigned by the manufacturer of the SAN equipment. HBAs and SPs have WWNs. WWNs are used by SAN administrators to identify your equipment for zoning purposes.
LUN masking can be done within the ESX Server, but is normally performed at the storage processor level and with newer switches at the fabric level
VMKernal Disk partition addressing scheme
Example: vmhba0:0:11 or vmhba1:1:12:3
Vmhba
Vmhba – standard label that identifies a physical host bus adapter
Adapter – Adapter ID assigned to each HBA
Target – SCSI target that the Storage Processor presents
LUN – Logical unit number
Partition – partition on the lun
Fibre channel storage adapter is recognized by the VMkernal during the boot sequence
VMkernel scans up to 256 LUNs (only first 128 during ESX install)
Rescan to find newly assigned LUNs
Troubleshooting SAN connectivity http://www.vmware.com/pdf/vi3_san_design_deploy.pdf
iSCSI
Boot ESX server from iSCSI only with Hardware initiator
LUN masking in iSCSI works like it does in Fibre Channel. Ethernet switches do not implement zoning like Fibre Channel switches, instead, you can create zones using vlans
iSCSI Target Naming
iqn.1992-08.com.netapp:stor1
- iqn – all start with this
- date code – specifying the year and month in which the organization registered the
domain or sub-domain name used as the naming authority string
- The organizational naming authority string which consists of a valid reversed domain or subdomain name
- optionally, a “:” followed by an assigned alias
Example: iqn-2008-06.com.tangosoftware:vmfs1
Target Discover Methods
- Static Configuration – only available for Hardware initiators. IP address, TCP port and the iSCSI target name manually defined at hardware level
- Send Targets – Uses Targets IP address and port to query available targets
CHAP Authentication
Optional, stands for Challenge-Handshake Authentication Protocol
It is best practice to create a separate isolated IP network fro iSCSI traffic since transmitted data is unencrypted.
An isolated network is the only way VMware supports iSCSI
The software initiator is a port of the Cisco iSCI Initiator Command Reference implementation.
The Service Console and VMkernal NICs both need access to the iSCSI storage since the iSCSI daemon initiates the session and handles login and authentication. The actual I/O goes through he VMkernal
ESX Server does not support both hardware and software initiators running simultaneously
Main steps to setting up iSCSI
1.) Add the VMKernal for iSCSI traffic
2.) Enable iSCSI Traffic Through the Service Console Firewall (CHAP as well if used)
3.) Enable the iSCSI initiator in the storage section
4.) Input the IP address of iSCSI SAN in the send targets under Dynamic Discovery
iSCSI software initiator is identified as vmhba32
iSCSI hardware initiators are identified by bus location just like Fibre Channel HBAs.
Example: vmhba2
VMFS Datastores
Each virtual machine’s files are located in its own subdirectory
Repository for other files:
- templates
- Iso Images
VMFS volumes are addressed by a volume label a datastore name and physical address
Example: vmhba1:0:0:1
VMFS volumes are accessible in the service console underneiath /vmfs/volumes
Features of VMFS-3
- Distributed journaling
- Faster file system recovery, independent of volume size or number of hosts connected
- Scalable distributed locking-survives short and long SAN interruptions much better
- Support for small files- small files allocated from sub-block resource pool
32 - Max hosts accessing VMFS at same time
30720 – Max files per directory
30720 – Max files per Volume
VMware only supports a single VMFS on a single partition on a LUN
A single LUN VMFS must be at least 1.2 GB in size, but due to a limitation of the SCSI-2 protocol, a VMFS cannot exceed 2 TBs in size
A VMFS volume can be >2 TBs by adding extents in up to 2TB size each
VMFS can have up to 32 physical extents for a maximum size of about 64TB
Multipathing with Fibre Channel
- allows continued access to SAN LUNs in the event of hardware failure
- Exactly one path is active (in use) to any LUN at any time
Two Multipathing policies exist:
- MRU – Most Recently Used
- Fixed – Preferred Path
ESX server automatically sets the multipathing policy according to the make and model of the array it detects, so it should be changed manually
ESX Server multipathing is only supported for failover, not automatic load balancing. However, manual load balancing can also be achieved
NAS Storage and NFS Datastores
Two Key NAS protocols
- NFS (Network File System)
- SMB (Windows networking, also known as CIFS – Common Internet File System
ESX Server supports NFS version 3 over TCP
NFS volumes are treated just like VMFS volumes in Fibre Channel or iSCSI storage and thus can utilizes
- VMotion
- Create virtual machines
- Boot Virtual machines
- Mount ISO files
/etc/exports – defines the systems allowed to access the shared directory on the NAS
Options used in file are:
- Name of shared directory
- Subnet(s) allowed to access the share
- “rw:” Allows Read and Write
- “no_root_squash” – allows VMkernel to access the NFS volume using UID0 as root which usually is not allowed
- “sync:” – means all file wries must be committed to the disk before the write request by the client is actually copleted
NFS requires a VMKernal port just like iSCSI
Virtual Center installation
Highly recommended to install VMware License Server on same windows server as the virtual center server
Most critical component of Virtual Center is the Database
Security for the VirtualCenter Server is built on windows security. Active directory if in a domain or local computer if not in domain
Default ports needed
- 27000 and 27010 (only if license server is not on same machine)
- 1521 – Oracle DB
- 1433 – SQL DB
- 443 and 80 – Web Access and SDK Clients
- 902 – Virtual Center to managed hosts
- 443 – VIClient to Virtual Center
Require Separate License
- VMotion
- HA
- DRS
Virtual Center Architecture
Core Services – Management of Virtual machines and resources, task scheduler, statistics logging, alarms and events. VM Provisioning and host and VM configuration
Distributed Services – Vmotion, HA, DRS
Additional Services/Modules – Vmware Update Manager, Converter Enterprise
Database Interface
ESX Server Management – Virtual Center Agent automatically installed when host is added to virtual center
Active Directory Interface
VI API and VI SDK – allow third party applications
Hardware Requirements
Processor = 2 GHz Intel or AMD x86
Memory = 2GB
Storage = 560 MB minimum, Recommended 2GB
Network = 10/100 (1GB recommended)
Software Requirements
- 32 bit OS only
- Windows 2000 Server SP4 w/Update Rollup1
- Windows XP Pro SP2
- Windows 2003 Server SP1
- Windows 2003 Server R2
Database Requirements
Oracle 9iR2
Oracle 10gR1 (versions 10.1.0.3 and higher)
Oracle 10gR2
SQL Server 2000 SP4 or Enterprise Ed.
SQL Server 2005 (Enterprise SP1 or SP2)
SQL Server 2005 Express – Eval/Demo only
VI Client provides a database estimation calculator in which you enter the number of hosts and virtual machines in your inventory. This will give an estimated space size needed.
Microsoft SQL
- user needs to be sysadmin server role or db_owner fixed database role
- create ODBC connection of type DSN
- SQL server Authentication needs to be used unless DB is on same server
VMware License Server
- Licenses are stored on a license server
- Single host and centralized licenses can be combined
- 14-day grace period
Two virtual center editions
Foundation – up to 3 hosts
Virtual Center – Max number of hosts
14-Day Grace Period Details
After expiration you can’t do the following things
- Power on Virtual Machine
- Use VMotion
- Use HA
- Use DRS
These things are not supported During or after grace period
- All the above
- Add an ESX Server Host to Inventory
- Move an ESX Server Host into or out of Cluster
- Add or Remove License Keys
- Upgrade
Virtual Center communicates with ESX Server via vpxa
VI Client communicates with ESX Server via vmware-hostd
Web client connects to Web Access which is on both the Virtual Center Server and the ESX Host
443 and 902 are ports used for traffic
A Single Virtual Center Server can manage ESX Servers that are located in different geographical locations but connected by WAN link or VPN link
Virtual Center Server log files
- vpxd-#.log (10 kept, 0-9)
- vpxd-index.log
- located C:\windows\temp\vpx
If License server does not start, it is likely due to incorrect or corrupt license file
LMTOOLS program used to manage licenses
Virtual Center Inventory Hierarchy
The root folder is either “Hosts and Clusters” or “Virtual Machines and Templates” depending on the view selected. These are the two most common views. (others are Networks view and Datastores view)
Under the root folder is one or more data center objects
A Datacenter object is the primary organizational structure in the inventory
VirtualCenter Server can contain multiple datacenters
Folders can be added to help organize inventory items like servers by function or processor type. Don’t add two many folders and make too complex fo a hierarchy
Hosts that are grouped together are referred to as a cluster
VirtualCenter Server can support VMware DRS and VMware HA clusters which contain up to 32 ESX Servers
If you can’t add a host to virtual center you may need to check that vmware-hostd is running on the ESX server
Lockdown mode prevents use of VI Client when VirtualCenter server is installed, this is only available in 3i not 3.x
Scheduled tasks tab allows to schedule creaton of VMs, etc…at any time of the day.
The events and systems log tab help with troubleshooting problems
Sessions shows number of open connections to VirtualCenter Server
Maps panel shows relationship of items and is very helpful in verifying vmotion criteria are configured correctly.
Guided consolidation feature uses VMware converter and will
- Discover – servers that might be good candidates for VMS
- Analyze – how good of a candidate they would make
- Consolidate – Create the VMs from the Physical machines
The Client Settings option lets you adjust the timeout values for slow WAN connections, max number of virtual machine consoles, Hint Messages and getting started tabs.
VMware Files
VM_name.vmx = Virtual Machine config file
VM_name.vmdk = File describing virtual disk characteristics
VM_name-flat.vmdk = Pre-allocated virtual disk file that contains all the data
VM_name.nvram = Virtual machine Bios
Vmware.log = log File
Vmware-#.log = older log file (6 kept)
VM_name.vswap = swap file
VM_name.vmsd = Snap shots
To browse files, choose the ESX server and go to summary tab. Right click on the datastore were the files are located and choose Browse Datastore.
Virtual Hardware
ESX Server VMs lack USB and Sound Adapter Support
6 total PCI slots
5 usable PCI slots since one is Video
2 IDE controllers supported for up to 4 cd-rom drives
1,2 or 4 CPUs per VM with SMP
64 GB Memory Max per VM
Many guest OS/application combinations are not enhanced by the additional CPU. Two or four VCPU VMs should be created only in the comparatively infrequent instances where they are of benefit, not as a standard config
LSILogic or BusLogic Adapter is chosen automatically for you based on choice of guest OS.
ESX Server virtual disks are monolithic meaning if you create a 6GB virtual disk it creates a single file that is 6GB in size.
Virtual CD-ROM drive or floppy drive can point to either the CD-ROM drive or floppy drive located on the ESX Server, a CD ISO image (.iso) or floppy (.flp) image or even the CD-ROM or floppy drive on your local system
Ctrl + Alt + Ins is how to send Ctrl + Alt + Delete to the vm console
VMware tools should always be installed and allow:
- device drivers if necessary
- Manual connection/disconnection of some devices while powered on
- Improved mouse
- Memory Management
- Quiescing a file system
- Time Sync
- Ability to gracefully shut down machine from VirtualCenter
Most visible benefit is better video performance and that you can move your mouse pointer freely into and out of the VM console window
VMware Converter Enterprise
Can import almost any Windows Os into ESX server
- import physical machines to VMs
- import non-ESX VMware VMs
- Import Microsoft Virtual Server 2005 VMs
- Convert 3rd-party backup or disk images to VMs
- Restore VCB images to VMs
- Export Virtual Center VMs to other non-ESX VMware formats
- Reconfigure Virtual Center VMs so they are bootable
- Customize Virtual Center VMs
Physical to Virtual = P2V
- Great for consolidating, testing and troubleshooting and disaster recovery
Windows OS’s can be resized during conversion other OS’s can only be cloned as is
Three Components
- Server
- CLI – Command line interface
- Agent
Server is installed separately on the same server as virtual center or different server. Is included on VirtualCenter CD.
Client is installed via plugin in Virtual Center
Hot Cloning- Four Stages
1.) Prepare source machine
a. Install enterprise agent
b. Snap shot is taken of volumes
2.) Prepare Virtual machine on Destination
a. Create a new virtual machine on ESX server
b. Volumes copied from source machine to ESX server
3.) Completing Conversion
a. Required drivers install to allow os to boot
b. Customizations are applied
4.) Cleaning up
a. Snap shot is removed from source
b. Agent is automatically or manually uninstalled from source
Cold Cloning – Four Stages
1.) Prepare source machine
a. User boots with coldclone cd image and defines conversion values
b. Source volumes copied into RAM
2.) Prepare Virtual machine on Destination
a. New virtual machine is created
b. Volumes copied from source to ESX
3.) Completing Conversion
a. Required drivers install to allow os to boot
b. Customizations are applied
4.) Cleaning up
a. User removes CD and reboots physical server
Two different cloning modes:
1.) Volume-based
a. Available for Hot and Cold
b. Cloning is done on block-level basis if maintean the volume size. If volume is resized to be smaller, cloning is done on a file level basis
2.) Disk-based
a. Only available with cold cloning and VM imports
b. Entire disk is copied as is, cloning transfers all sectors from all disks, preserving all volume metadata
Common problems caused by
- insufficient privileges
- correct ports are not open (445,139,903,443)
- boot.ini is set to read only
Managing VMs
Cold migration moves a powered off vm to another esx server or a different datastore on same esx server.
May or may not move disk depending if you have shared storage between ESX servers
Snapshots
- Useful when you need to revert repeatedly to the same state, useful in Test and Dev
Memory snapshot is created only when VM is powered on. By default snapshot of memory is taken
Snapshots include the following files
VM_name-0000#delta.vmdk – (Snapshot difference file)where number is the next
number in the sequence starting with 1.
VM_name-0000#.vmdk – (Snapshot description file)
VM_name-Snapshot#.vmsn – (Memory state file) size of the file is the size of the VM’s
max memory
Snapshot manager manages the snapshots and allows the following tasks
- Delete – commits the snapshot data to the parent snapshot, then removes the selected snapshot
- Delete All – commits all the immediate snapshots before the current stat icon “You are here” to the base disk and removes all existing snapshots for that virtual machine
- Go to – allow you to restore a particular snapshot.
Modifying Virtual Machine settings
A virtual disk can be added to the VM while powered on
VM must be shut down for anything that would require the machine shutdown in a physical environment, like adding a NIC
A disk can be resized, but the VM must be shut down.
Raw LUNs
Allow VM clustering across boxes, or physical to virtual
Enable use of SAN management software inside guest OS
An RDM (Raw Device Mapping) allows a special file in a VMFS volume to act as a proxy for a raw device
The raw disk mapping is a special file that lives in a datastore (VMFS and NFS) and points to the actual SAN LUN
A Raw disk runs in one of two modes:
Physical Compatibility – allows the guest operating system to directly access the hardware and is useful if you are using SAN-aware applications in the virtual machine
Virtual Compatibility – allows the LUN to behave as if it were a virtual disk so you can use features like snapshoting, cloning and creating templates.
Physical – can’t clone, create template or migrated if it envolves moving disk
Raw disk mapping have following files
Compatibility Mode:
- VM_name_#.vmdk
- VM_name_#.rdm.vmdk
Physical Mode:
- VM_name_#.vmdk
- VM_name_#rdmp.vmdk
General Options
If you change the display name of a virtual machine, it doesn’t rename the file names.
VMware Tools
-Customize power button actions
-When to run scripts
-Update checks and time syncs
Advanced boot options are useful to delay power on to stager vms starting up and to enable the vm to boot to bios for bios changes
Each host or cluster can have a custom swapfile datastore location defined
Guided Consolidation
Lowers training requirements for new virtualization user
Recommended for small, simpler environments (small to medium businesses with 100 physical servers or less)
Allows new users to quickly realize the benefits from server consolidation
Can discover and analyze only Windows server-family OSes
No Agent software is involved
You can choose not to install during Virtual Center Installation
(2) services, DataCollector and VMware Converter
Data Collector service runs under the name of “VMware Capacity Planner Service”
It uses a hidden database (not user managed) when collecting data
The data collector service uses LAN Manager or Active Directory if available, thus must have read permissions on the Active Directory.
User also needs local windows admin priviledge in addition to read only on AD
Discovery of Systems is repeated periodically
- Every ½ check for new servers
- Every day check for new domains
You can specify credentials on a system by system basis and a default credential that should be used on all systems
Metrics
- Metrics are collected every hour
- 10 to 12 metrics total
- Data is put into table in VirtualCenter DB
Confidence level – Based on the number of performance samples that the VC has collected. As the VC collects more performance samples, the confidence level goes up.
Data collector is agent less so nothing is installed on remote computers
If target systems are behind firewall, incoming WMI ports need to be open (135, 137,138, 139,a nd 445)
Access Control
Main compenents of security model
- user/group – account with access to Virtual Infrastructure
- Role – A set of one or more privileges
- Privilege – specifies a task a user/group is authorized to perform
- Permission – paring of a user/group and role
There are approximately 100 defined privileges
VirtualCenter users/groups are those that are defined either on the local windows server or in the Active Directory if on a domain
ESX server users/groups are those defined in its service console
No attempt is made to reconcile these users/groups
Privileges are the building blocks of roles
Roles are collections of privileges
Roles can and should be set to propagate down to child objects
A role is neither superior to or subordinate to another role. All roles are independent of each other
ESX default Server Roles
- No Access
- Read-Only
- Administrator
VirtualCenter default Roles
- No Access
- Read-Only
- Administrator
- Virtual Machine Administrator
- Datacenter Administrator
- Virtual Machine Power User
- Virtual Machine User
- Resource Pool Administrator
- VMware Consolidated Backup User
Best Practices paper on Managing VMware VirtualCenter Roles and Permissions. http://www.vmware.com/pdf/vi3_vc_roles.pdf
Permissions can be overridden at a lower level by adding a new permission to the same user
Scenario 1
If a user is a member of multiple groups with permissions on different objects, for each object on which the group has permissions, the same permissions apply as if granted to the user directly
Datacenter level
Group 1 = Administrator
- Greg
- Susan
Server Level
Group 2 = Read Only
- Greg
- Carla
In this scenario, Greg will have administrative permission on everything in the datacenter, except the on server that group 2 is assigned to. Here he will have Read Only access.
Scenario 2
If a user is a member of multiple groups with permission on the same object, the user is assigned the union of privileges assigned to the groups for that object
Group1 = VM_Power_On
- Greg
- Susan
Group 2 = Take_Snapshots
- Greg
- Carla
These groups are both assigned to the Datacenter level. Greg will have both privileges on the datacenter level.
Scenario 3
Permissions defined explicitly for the user on an object take precedence over all group permissions on that same object.
Datacenter Level
Group1 = VM_Power_ON
- Greg
- Susan
Group 2 = Take_Snapeshots
- Greg
- Carla
Greg = Read Only
In this scenario, even though Greg is a member of group 1 and 2, since the user was explicitly defined as read only at the datacenter level, he will only have read only access on everything in the datacenter.
Scenario 4
Permissions applied directly to an object override inherited permissions
If Greg is assigned as a VM User at the Datacenter level, but is also assigned as an administrator to a specific VM, the end result is that Greg will have admin priviledges on the one VM since it overrides the propagated VM User permissions set at the Datacenter level
VirtualCenter Security Model – Relies on Windows user accounts, local or domain
Local windows administrators are automatically assigned the Administrator role at the topmost level of the Inventory.
ESX Server Security Model – ESX user is a service console (Linux) user account.
User accounts, roles and permissions can be configured using the VI Client connected directly to the ESX Server
ESX Server users, root and vpxuser are assigned the Administrator role at the ESX Server Level.
If lockdown mode is enabled, it will prevent the ESX root user from logging directly into the ESX Server using the VI Client. Root user still has ability to log into the ESX Server using secure shell.
Resource Management
CPU Resource settings
Limit – A cap on the consumption of CPU time by this VM, measured in MHz
Reservation – A certain number of CPU cycles reserved for this VM, measured in MHz
The VMkernel chooses which CPU(s) and may migrate
Shares – More shares means that this VM will win competitions for CPU time more often
VCPUs simultaneously scheduled – All virtual cpus are scheduled at same time. This
means that if you have 4 vcpus and you have a reservation of 1000, each vcpu
gets 250 MHz
Mermory Resource settings
Available Memory – Memory size defined when the VM was created
Limit – A cap on the consumption of physical memory by this VM, measured in MB
Reservation – A certain amount of physical memory reserved for this VM.
Shares – More shares means that this VM will win competitions for physical memory
more often
VMkernel allocates a per-VM swap file to cover each VM’s range between available memory and reservation
Memory can be stolen from a VM until it gets to the Reservation. ESX server can not take back memory from a machine once it is down to the Reserved limit.
ESX server can steal memory from a computer that is between the limit and the reservation. If it was being used, ESX creates a local swap file for memory ballooning.
A virtual machine will power on only if its reservation can be guaranteed
CPU Shares
High: # shares = 2000 * (#of vCPUs)
Normal: # shares = 1000 * (# of vCPUs)
Low: # shares = 500 * (# of vCPUs)
Custom: # shares = user-specified value
Memory Shares
High: # shares = 20 * size of VM’s available memory
Normal: # shares = 10 * size of VM’s available memory
Low: # shares = 5 * size of VM’s available memory
Customer: # shares = user-specified value
Resource Pools
Used on a stand alone host or VMware DRS enabled clusters
Provides resources for VMs and child ppos
A resource pool allows you as the administrator to divide and allocate resources to VMs and other resource pools
The topmost resource pool is known as the root resource pool
Resource pool consists of the CPU and memory resources of a particular ESX Server ro VMware DRS cluster
Configuring Resource Pools
Shares – Low, Normal, High, or custom
Reservations – in MHz and MB
Limits – in MHz and MB. Unlimited access by default (up to maximum amount of
resources accessible
Expandable Reservation – Yes = VMs and sub-ppols may draw from this pool’s parent
No = VMs and sub-pools may only draw from this pool, even
if its parent has free resources
Virtual machines do not have expandable reservation. Expandable reservations can only be set at the resource pool level
The root resource pool is the topmost resource pool and is comprised of the sum of all MHz for all CPUs and the sum of all the installed RAM (in MB) available in the compute environment (standalone host or cluster)
Except for the root resource pool, every resource pool has a parent resource pool. A resource pool might contain child resource pools or just VMs that are powered on within it
A child resource pool is used to allocate resources from the parent resource pool for the child’s consumers
A child resource pool cannot exceed the capacity of the parent resource pool unless there is an expandable reservation on the parent and resources available high up the hierarchy
Expandable Reservations
Borrowing resources occurs recursively from the ancestors of the current resource pool.
- as long as the expandable reservation option is selected
- Offers more flexibility but less protection
Expandable reservations are not released until the VM that caused the expansion is shutdown or its reservation is reduced.
An expandable reservation could allow a rogue administrator to claim all unreserved capacity in the environment.
Expandable reservations allows a resource pool that cannot satisfy a reservation request to search through its hierarchy to find unreserved capacity to satisfy the reservation request.
Use expandable reservations carefully. A single child resource pool may use ALL of its parent’s available resources, leaving nothing directly available for other child resource pools.
VMotion
Discs aren’t moved, just state information and current memory content
State information – the current memory content and all the information that defines and identifies the virtual machine
Memory content – includes transaction data and whatever bits of the operating system and applications are in memory
The definition and identification information stored in the state includes all the data that maps to the virtual machine hardware elements, such as BIOS, devices, CPU, MAC address for Ethernet cards and so forth
Steps
1.) vmotion is initiated by user
2.) Virtual machines memory is copied to destination VM, user can still access original VM. Meanwhile changes in memory are recorded in bitmap table
3.) After most of the VM’s memory is copied from the source to the target host, the VM is quiesced, meaning the VM is taken to a state where no additional activity will occur on the VM. The quiesce time is the only time in the VMotion procedure in which the VM is unavailable to users and is very minimal amount of time
4.) VM device state and the memory bitmap containing the list of pages that have C Changed are transferred over during this time
5.) Immediately after the VM is quiesced on the source host, the VM is initialized an starts running on the target host. Additionally, a RARP (reverse ARP) request notifies the subnet that VM A’s MAC address is now on a new switch port
6.) Users are now accessing the vm on the new ESX server and the original VM on the old ESX server is deleted at this time
A virtual machine’s entire network identity, including MAC and IP address is preserved across a VMotion
Reason for Migration Error Message
- VM has an active connection to an internal virtual switch
- VM has an active connection to a CD-ROM or floppy device with a local image mounted
- VM has its CPU affinity set to run on one or more specific physical CPUs
- VM is in a cluster relationship with another VM
Reason for Migration warnings
- VM is configured with an internal virtual switch but is not connected to it
- VM is configured to access a local CD-ROM or floppy image but is not connected to it
- VM has one or more snapshots
- No guest OS heartbeats are being received
Source and destination ESX servers must have
- Visibility to all SAN LUNs (either FC or iSCSI) and NAS devices used by VM
- Gigabit Ethernet backplane
- Access to the same physical networks
- Consistently labeled virtual switch port groups
- Compatible CPUs
The vswitch port group names have to match exactly (the match is case-sensative)
CPU Constraints
Does Not require exact match
- CPU clock speed
- Cache sizes
- Hyperthreading
- Number of Cores
- Virtualization Hardware Assist (32-bit)
Does require exact match
- Manufacturer (AMD vs. Inter)
- Family (P3, P4, Opteron)
- Presence or absence of SSE3 or SSSE3 instructions
- Virtualization Hardware Assist (64-bit Intel)
- Execution-Disabled (Nx/Xd) bit
Default values for the CPU compatibility masks are set by VMware to guarantee the stability of the virtual machines after a VMotion migration
VMware provides a CPU compatibility tool that allows you to check CPU compatibility of hosts participating in a VMotion migration
It can be made into a bootable cd as well
Maps
Use maps to verify that the source and target ESX servers satisfy the VMotion requirements that pertain to shared data stores and networks, display a map that shows
the relationships between the hosts, data stores and networks
DRS
Cluster – A collection of ESX Server hosts and associated VMs
DRS-enable cluster
- Managed by VirtualCenter
- Balances virtual machine load across hosts in cluster
- Enforces resource policies accurately
- Respects placement constraints (affinity and anti-affinity rules)
A maximum of 32 hosts per cluster is supported
Automation Level
- Manual = Initial VM placement and dynamic balancing manual
- Partially Automated = Initial VM placement automated, but balancing is manual
- Fully Automated = initial placement and dynamic balancing are automatic
DRS will show recommendations for Initial VM placement or dynamic balancing when they are set to a manual state
Five Migration Thresholds
Level 1 – most conservative – Applies only five-star recommendations. This level applies recommendations that must be followed to satisfy constrains such as affinity rules and host maintenance
Level 2 – moderately conservative – Applies recommendations with four or more stars. This level includes Level 1 plus recommendations that promise a significant improvement in the cluster’s load balance.
Level 3 – midpoint (default) – Applies recommendations with three or more stars. This level includes Level 1 and 2 plus recommendations that promise a good improvement in the cluster’s load balance
Level 4 – moderately aggressive – Applies recommendations with two or more stars. This level includes Level 1-3 plus recommendations that promise a moderate improvement in the cluster’s load balance.
Level 5 – aggressive – Applies all recommendations. This level includes Level 1-4 plus recommendations that promise a slight improvement in the cluster’s load balance
DRS Placement Costraints
- Affinity and Anti-Affinity rules can be made to specify that some servers should never or always be located on the same physical server
DRS should try to keep certain virtual machines together on the same host (for example for performance reasons)
DRS should try to make sure that certain virtual machines are not together (for example, multiple database servers on same system) and for extra redundancy.
You can customize the automation level for individual virtual machines in a DRS cluster to override the automation level set on the entire cluster
If a virtual machine is set to Disabled, Virtual Center does not migrate that virtual machine or provide migration recommendations for it.
To add a ESX server to a DRS cluster, drag and drop it into the cluster and then use the add host wizard to complete the process
Best practices for DRS
- When DRS makes strong recommendations (typically 4 or 5 star) follow them
- Enable Automation
Resource pools in a DRS cluster
Resource pools can be created only on ESX standalone hosts or VMware DRS-enabled clusters. Clusters that have only VMware HA-enabled (and not VMware DRS) cannot use resource pools
A pool can reflect any organizational structure that makes sense to you such as a pool for each department or a project or a client, etc. You can associate access control and permissions to different levels in the resource hierarchy
The key to understanding and using delegation is to understand roles and their privileges. It will be very beneficial to use the VI Client to explore and gain familiarity with the privileges assigned to each role.
Monitoring cluster usage
Valid – A cluster is valid unless something happens that makes it overcommitted or invalid. In a valid cluster, there are enough resources to meet all reservations and to support all running virtual machines
Overcommitted (yellow) – Cluster becomes overcommitted if it does not have enough capacity to satisfy the constraints it was originally configured with. One thing that can cause this is if a ESX server goes down and resources are lost
Invalid (Red) – A cluster enabled for DRS becomes red when the tree is no longer internally consistent and does not have enough resources available. The total resources in the cluster have nothing to do with whether the cluster is yellow or red. It is possible for the cluster to be DRS red even if there are enough resources at the root level, if there is a an inconsistency at a child level. For example, a DRS cluster turns red if the virtual machines in a fixed resource pool use more resources than the reservation of that resource pool allows
When adding a host to a cluster, choose to create a new resource pool for this host’s virtual machines and resource pools.
By default the resource pool created to represent the host’s resources is named “Grafted from host_name”.
Maintenance mode restricts VM operations on the host so that VMs can be shut down or VMotion’ed
Applies to both standalone hosts and clusters
Place an ESX server into maintenance mode if you are going to
- shut down the esx server
- add the esx server to a cluster
- remove the esx server form a cluster
When in maintenance mode, no new virtual machines can be powered on and no virtual machines will be migrated to this host
If a DRS cluster is set to fully automated level, the VMs on the server that is placed in maintenance mode will automatically be moved off that server onto the remaining host(s) in the cluster. If the DRS cluster is set to the partially automated level, the administrator has to manually move the VMs to a new host or power them down.
Resource Optimization
CPU Cycles
- Hyperthreading and Load Balancing performed by VMkernal
- Owner can configure SMP (multiple procs)
- Administrator can set limits, reservations, share allocation and processor affinity
RAM
- Transparent page sharing vmmemct1 and VMkernel swap performed by VMkernal
- VM owner determines available memory
- Administrator can set limits, reservations and share allocations
Disk Bandwidth
- Administrator sets share allocations
Network Bandwidth
- VM Owner configures virtual switch with teamed NICs
- Administrator performs traffic shapping
A “Hardware execution Context (H.E.C.) is a processor’s capability to schedule one thread of execution.
When a VCPU needs to be scheduled, the VMkernal maps a VCPU to a hardware execution context.
Hyperthreading provides more hardware execution contexts for VCPUs to be scheduled, but it does not double the power of the core. If two processor request require the same part of the CPU, with hyperthreading one will have to wait just as if there was only one core.
VMkernel dynamically schedules virtual machines and the service console
Service console always runs on the first H.E.C
For multi-VCPU, CPU-intensive VMs, the VMkernel tries to avoid scheduling their VCPUs on hardware execution contexts in the same core.
Transparent Memory Page Sharing
VMkernel detects identical pages in VMs memory and maps them to the same underlying physical page.
This doesn’t happen immediately on start up, but will kick in after a little time has elapsed
If any VM tries to modify a page that is shared, the VMkernel will create a new, private copy for that VM, and then map that page into address space of that VM only.
Balloon Driver
Vmmemctl is the ballon driver in the guest os
It will deallocate memory from selected virtual machines when RAM is scarce. This inflates the balloon. When the memory is no longer scarce, the OS will get it back and the balloon will deflate
By default, up to 65% of a VM’s memory can be taken away during the ballooning process, subject of course to the memory reservation setting
Each powered on VM has its own VMkernel swap. Use of the VMkernel swap is a last resort since performance will be noticeably slow
The size of the VMkernel swap file is determined by the difference between how much memory the virtual machine can use (its limit, if no limit is defined, or the amount configured into the virtual hardware) and how much RAM is reserved for it (its reservation.
When VM is powered off, the vmkernal swap file is deleted. It is recreated when the VM is powered back on.
Monitoring VM Performance
Performance tuning methodology
- Assess performance
- Identify the limiting resource
- Make more resources available
- Benchmark Again
Don’t make casual changes to production systems
Performance Graphs will show you realtime and historical usage levels of most hardware resources
You can export the graphs and tear them off to compare side by side with others.
You can control a virtual machine’s access to CPU and memory at three levels.
- Cluster level (if exists)
- Resource Pools
- Directly on the VMs
The key indicator of a virtual machine losing competition for CPU time is “CPU ready” time in its CPU resource graph. Ready time refers to the interval when a virtual machine is ready to execute instructions, but cannot because it cannot get scheduled onto a CPU. This is only available in Real Time
White paper on VMware ESX Server 3 – Ready Time Observations http://www.vmware.com/pdf/esx3_ready_time.pdf
Memory constraints can be determined by checking for high ballooning activity. This graph is also only available in real time
If you suspect that a VM is constrained by disk access
- measure the effective bandwidth between VM and the storage
- Measure the resource consumption using performance graphs
Use a tool like iometer to measure the maximum throughput via the current path to the storage.
Disk graph is real time only
The IOmeter program is also good for determining network bandwidth.
Performance based Alarms
Alarms are asynchronous notifications of changes in host or virtual-machine state
You can also configure VirtualCenter to transmit these messages to external monitoring systems
When you right-click on a virtual machine and choose Add Alarm…, the resulting window has four panels. Visit the general panel to to name the alarm. Visit the Triggers panel to control which load factors are monitored and what the threshold for the yellow and red states are. The other two tabs are reporting and actions
Host based alarms are similar to vm alarms, but with different triggers
Alarm reporting can be adjusted so that it only alerts after a given interval of time and doesn’t alert unless the interval is large enough. This can prevent flooding email, pages, etc…with alarms and duplicate alarms.
Actions are used to send external messages or to respond to problems proactively
You can add custom alarms anywhere in the inventory
You might organize several hosts or clusters into a folder and apply an alarm to a folder
The VI Client reports changes in the host or VM state in its inventory panel
Backup Strategies
Backup files within the virtual machine as well as the bootable virtual machine itself
At the image level, perform backups periodically for Windows and Linux. For example, back up a boot disk image of a Windows virtual machine once a week.
At the file level, perform backups once a day. For example, back up files on drives D, E, and so on every night.
Although you might consider backing up the service console, it doesn’t need to be backed up as frequently as the virtual machines and their data. ESX service console can be reinstalled fairly quickly
General Guidelines for VM Backups
- Store application data in separate virtual disks from system images
- Use backup agents inside guest OSes for application data
- If windows, perform VCB file level backups
- Use full virtual machine backups for system images or plan to redeploy from templates
VMware Consolidated backup (VCB) addresses most of the problems you encounter when performing traditional backups. Consolidated Backup helps you to:
- Reduce the load on you ESX Servers by moving the backup tasks to one or more dedicated backup proxy servers
- Eliminate the need for a backup window by moving to a snapshot based backup approach
- Simplify backup administration by making optional the deployment of backup agents in each virtual machine you backup
- Back up virtual machines that are powered on
Consult the Virtual Machine Backup Guide available on the VMware web site
Virtual Machine High Availability
Three main clustering schemes
1.) Cluster-in-a-box – Two VMs clustered within one ESX server
a. Protects against operator error, application and OS crashes
2.) Cluster-across-boxes – One VM in two separate ESX servers
a. Protects against operator error, application and OS crashes
b. Shared storage require
3.) Cluster between physical and virtual machines
a. Low-cost N+1 redundancy
b. Shared storage required
VMware HA
- Automatic Restart of virtual machines in case of physical server failure
- Provides high availability while reducing the need for passive stand-by hardware and dedicated administrators
- Configuration and management done through VI Client
- Experimental (not supported) support for restarting failed VMs
VMware HA continuously monitors all servers in a cluster and detects server failures. An agent placed on each server mainteains a “heartbeat”.
- heartbeats are sent every 5 seconds
- heartbeat timeout is 15000 milliseconds or 15 seconds
VMware HA uses the heartbeat information that VMware Tools captures to determine virtual machine availability.
- VMTools sends heartbeat every second
- Virtual Machine Failure Monitoring checks for a heartbeat every 20 seconds
Virtual machine will restart if heartbeat is not received in user configurable timeframe
Virtual Machine Failure Monitoring is experimental and not supported for production use. By default, Virtual Machine Failure Monitoring is disabled.
Two VMware HA Prerequisites
- Each host in the cluster should have access to the virtual machines’ files and should be able to power on the VM with no problem
- Host must be configured for DNS. DNS resolution of the host’s fully qualified domain name is what VMware HA relies on.
Make sure service console is redundant since the heartbeats rely on the network. This can be done by adding a second service console port and Vswitch (on a separate network) or adding at least two vmnics to the switch with the one service console port
Following ports need to be open for heartbeat traffic:
- Incoming: TCP/UDP 8042-8045
- Outgoing: TCP/UDP 2050-2250
VMware HA + DRS is a reactive + proactive system
Two cluster-wide policy settings
1.) Number of host failures allowed
2.) Admission control
Admission control policies define when or when not to power on a VM. Refers to Resource reservations required and number of host failures to tolerate
The actual spare capacity available can be monitored in the “current failover capacity” field in a VMware HA cluster’s Summary tab in the VI Client
Restart Priority is based on virtual machines and factor in dependencies. High should be specified for those you want to come online first. Default priority is medium. Disable should be chosen if you don’t want a server to start back up in the event of an ESX host failure.
Cluster nodes are designated as Primary or Secondary nodes. Primary nodes maintain a synchronized view of the entire cluster. There can be up to five primary nodes per cluster. Secondary nodes are managed by the primary nodes
Network failures can cause “split-brain” conditions. In such cases, hosts are unable to determine if the rest of the cluster has failed or has become unreachable.
Isolation response can be configured on an individual VM basis. Power off is the default response, but options include Leave powered on or use cluster setting.
Planning for Deployment
Check the compatibility guides before deploying hardware.
http://www.vmware.com/support/pubs/vi_pubs.html
Consider the peak load that virtual machines place on the “core”resources
- RAM
- CPU
- DISK
- NETWORK
Calculate the resources that each virtual machine will need in order to run
Plan the ESX service console partition sizing and if you will boot from local or SAN
If SAN, make sure it is supported, LUN is available only to the ESX server for this purpose and that the HBA is configured correctly
A single Virtual Center server with minimum hardware requirements is recommended for supporting up to 20 concurrent client connections, 50 managed hosts and 1000 virtual machines.
Virtual Center can support up to 200 hosts and 2000 virtual machines
Do not use SQL 2005 express. Use SQL or Oracle
Determine how you are going to organize the inventory view of Virtual Center
- Group hosts in a datacenter that are under a single administrative control
- Group hosts in a datacenter that meet VMotion requirements
- Group hosts in a cluster to form a single pool of resources
- Group VMs into folders e.g. by business unit or function
Storage Considerations
ESX Server Feature Comparison by storage type
Fibre Channel
- Can do everything
- Boot VM, Boot ESX Server, VMotion, VMFS, RDM, VMCluster, VMwareHA/DRS, VCB
iSCSI
- Can do everything except VM Cluster
NAS
- Can’t Boot ESX Server
- Can’t format as VMFS
- Can’t use VM Cluster
- Can’t RDM (Raw Disk Mapping)
- Can’t use VCB
Local Storage
- Can’t use Vmotion
- Can’t use VMCluster
- Can’t us HA or DRS
- Can’t us VCB
One VMFS volume per LUN
It is best to use a LUN for one purpose at a time
NFS Considerations:
- Use “no_root_squash”
- 8NFS mounts per ESX Server allowed by default. This can be increased to 32
- Avoid VM swapping to NFS volumes
It is a common practice to create RAID volumes with seven disks or less. In RAID volumes consisting of more than seven disks, the overhead of parity calculation can overwhelm any performance benefit
Use preferred paths to setup your ESX Server so that various LUNs are accessed over various paths (if Active/Active SAN)
Subscribe to:
Posts (Atom)