We recently build a RHEL6 virtualization cluster. Using RedHat Cluster Suite, libvirt, and kvm. To manage our RHEL boxes, we use RHN Satellite. On our RHEL5 cluster, this works very well. We can provision VM’s using satellite and kickstart. Enter some data, clickity-click, wait 15 minutes, and you have a VM. After building our fancy new cluster on RHEL6 however, we found this functionality broken. This blog entry is a loose transcription of the aggrivation that I’ve dealt with attempting to hammer out this issue with RedHat’s support. There is a fix at the end, but with any luck, you wont need to worry about that, becasue RedHat will fix the issue.
First, some background. This cluster consists of 3 Dell poweredge blades. They run RHEL6, with RedHat Cluster Suite, and Clustered storage. This means CLVM, GFS, cman, and rgmanager. We have them connected to our XIV over fibre channel. We map storage for VM’s by assigning new volumes from the XIV for each VM to the cluster. So a new 17GB volume is assigned to the cluster, we then create a clustered volume group on the new volume, which is then shared across the cluster. Satellite then provisions a new VM in a logical volume withint the clustered volume group. This works perfectly on RHEL5.
Here’s what we ran into after building the new cluster. When we attempted to provision a new vm into a volume group as we’ve done in the past, the script would die when attempting to verify the size of the target volume group. It looked somethign like this:
LANG=C vgs –noheadings -o vg_free –units g vg_provtest
File descriptor 8 (socket:[252397]) leaked on vgs invocation. Parent PID 29302: /usr/bin/python
File descriptor 9 (/var/log/yum.log) leaked on vgs invocation. Parent PID 29302: /usr/bin/python
(16.00g)
<type ‘exceptions.ValueError’>
invalid literal for float(): 16.00g
So, i’m no python programmer, but I do know my way around a number of other languages.. THAT is a programing error.
So I submit a ticket to RedHat, letting them know that I think theres an errro in their tools, and that it’s affecting the use of their tools on our new cluster. This was June 3rd.
I wont go into too many specifics, but I’ll say this. The tech that I worked with could not reproduce the issue, and he had me getting him all sorts of unrelated information. All of which i provided. Twice he asked me for a screen shot of where this failed in the VM. This error occurred as the host was provisioning storage, the VM was never created! It old him this the first time he asked, and again when he repeated his request.
A few days ago, RedHat Support asked me for some more unrelated information, to which i simply did not respond. Today, i decided to have a closer look at the problem. I found that line 1465 in /usr/lib/python-2.6/site-packages/koan/app.py was thowing the error.
freespace = int(float(freespace_str))
Here’s the block of code that surrounds it:
# check free space
args = “LANG=C vgs –noheadings -o vg_free –units g %s” % location
print args
cmd = sub_process.Popen(args, stdout=sub_process.PIPE, shell=True)
freespace_str = cmd.communicate()[0]
freespace_str = freespace_str.split(“\n”)[0].strip()
freespace_str = freespace_str.replace(“G”,””) # remove gigabytes
print “(%s)” % freespace_str
freespace = int(float(freespace_str))
So, it gets output from vgs, and throws it into a string, then strips the non-numeric character “G” from the string, and then converts the string to a float, and then the float to an int, then assigns the int, to the variable ‘freespace’. The error comes in converting the string, to a flaot. Can you see why?
I eventually figured it out… On line 1463 the non-numeric character ‘G’ is supposed to be stripped. The problem? vgs is returning “16.00g”, not “16.00G”. I replaced the letter ‘G’ on 1463 with ‘g’ and now i’m able to provision VM’s.
I’ll be sending this description to RedHat support as soon as their ticket system starts working properly again.
Dont read too far into this, I very much like RedHat, but their support… well…..