Supermicro with Tylersburg chipset, stupid IPMI behavior.


============ UPDATE==============
I just want to add that this behaviour is fixed on IPMI firmware level.
There is an check-box added in the Configuration -> LAN Select, where one can choose whether to use the dedicated IPMI LAN or On-Board LAN. However still a little crippled menu, because you can not choose which Onboard port to use.
============ UPDATE==============

These days I got my hands on a bunch of supermicro servers with Tylersburg chipset ( -f types of MB with embedded IPMI) and 55xx series processors.

For the MBs from supermicro designed for 54xx and below CPUs, I got used with the IPMIs with dedicated LAN port which were working as expected. The IPMI is going out via the dedicated LAN port no matter what.

I wanted to set up the servers ready for delivery in the data center. All I wanted to do is to set up a proper IPs on the IPMI and make sure I can see it. Everything else can be done remotely.

The servers arrive with a sticker on them which shows the current IP set for IPMI. What I do is to plug the servers in the power sockets, without powering them on, then connect to IPMI, change the IP address and GW, then save and that is all. No noise in the office no head ace.

With all servers with chipset different from Tylersburg so far, no problem.
What was my astonishment when I realized that I can not see the IPMI. My first assumption was that the IP on the sticker was wrong. Ok not a big deal, just boot into BIOS and change it from there. I did boot but all I saw was that the IP address is set as on the sticker and I still did not have any ping to the IPMI.

I tried to set it to something else, still the same result unable to even get an ARP response from the IPMI card. These kind of problems really drives me mad.

I got on the phone my HW vendor to ask for the weird behaviour. After a few minutes conversation, I started doing something which was against any logic. I started to unplug the network cable and try the other LAN port on the MB which were supposed to be dedicated LAN port for the MB, and guess what, IPMI was located on the first MB port instead of the dedicated IPMI port.

From this point on, in order to make the stupid IPMI to move to the dedicated port I had to do a cold reboot with a network cable attached to the IPMI port.

If you plug the server in the power socket without network cables connected, the IPMI is so "smart" that it goes on the wrong port again. It does not start on the dedicated IPMI LAN port but on the first port of the MB. This is completely stupid logic for me. Why should the stupid IPMI play smart. This is a server after all. You can not play with the remote management because you think you are smart. I want to be able to pin down the IPMI LAN port, to something unchangeable. Everything else is incredibly stupid, although it may look cool to someone.

Why do I hate this. I find this behaviour unacceptable because for example you have lots of servers. All of them have IPMI cards. All the IPMI cards are connected in a private LAN which is visible only via VPN for example.
Now for some reason you want cold reboot of some server. You call the datacenter support and request a cold reboot.
All is fine so far, but imagine that during the cold reboot the support guy by mistake unplug the IPMI of the server, or the RJ45 connector is not well connected or something like this. After the cold reboot the "smart" IPMI will be pinned down to the MB LAN port and depending on the IP address set it might become visible to the public, which is really, really not a very good idea.

I thought that this is not normal behaviour, and started to look for a way to disable it. However I found a raised middle finger in the docs:

Notes:
1. If you wish to use the IPMI-dedicated LAN port for your network connections, be sure to connect an RJ45 cable to your dedicated LAN port before you activate the BMC (at first power-on or cold reset). Otherwise, the BMC will look for a shared LAN port to connect to if the IPMI-dedicated LAN cable is not detected upon BMC activation.
2. However, should you decide to use the IPMI-dedicated LAN port for a network connection, please perform a BMC cold reset or power cycle reset in order for the dedicated LAN to be detected.

And the worst thing, you can not disable this stupidity so far.

Supermicro are saying that they are working on a resolution of this issue. I hope to fix this soon and to be able to choose from the BIOS on which port I want the IPMI, not someone else to choose for me.

Idle3 disabling, not working for some WD drives

Since these days I have troubles with the sutpid "Green" technology from Western Digital, I made some check on my servers and find on some of them more GREEN Drives. I tried to disable idle3 state on some of them, but unfortunately it just does not work for some of them.
For example on:
WD10EACS - 1T Standard edition
WD1000FYPS - 1T WD RE2-GP

On the above drives I was unable to disable idle3 state.

wdidle3.exe /D

wdidle3.exe gives me something like: idle3 timer set to 6500ms. No matter that one of them is RE edition. I suppose it uses some old firmware. I will try to get the latest firmware from WD.

The program does not care that I pass the option for disabling the timer. The only good thing in this case is to increase the timer to maximum value

wdidle3.exe /S255

Actually from the HDDs I have tried to play with, the only one which actually agree to disable their timer were WD 2TB RE4 model: WD2002FYPS

Conclusion: DO NOT buy GREEN HDDs for servers. They will screw you.

DOS boot from GRUB



If you are running linux and you are wondering how to boot DOS without using boot floppy or CD image keep reading :).

This article will also help to those people, who want to flash their BIOS without making a boot floppy or CD or DVD or whatever.

You can download either FreeDOS or MSDOS or whatever DOS version you want.
All you need is plain .img image. I will show you how to do this using FreeDOS, but it should be pretty similar for other DOS versoins.

You can download FreeDOS image from here:
http://www.fdos.org/bootdisks/autogen/FDOEM.144.gz
Or MSDOS image from here:
http://www.allbootdisks.com/

1. Download the fdos and the files you theoretically would want to put on a floppy.

2. Install syslinux (dos floppy bootloader):

On redhad based distros this is pretty straight forward:

yum install syslinux

I guess that almost all distributions have this package
Create DOS directory and copy bootloader/diskimage

3. Create a directory for DOS stuff and copy the bootloader and the disk image there. memdisk may be elsewhere on your system, and the image will be wherever you copied/unzipped it to.

    mkdir /boot/dos
cp /usr/lib/syslinux/memdisk /boot/dos
cp FDOEM.144 /boot/dos

NOTE: memdisk may be located somewhere else. It is part of syslinux package, just find where it is.

If you want to add some additional programs/files to the floppy (for example BIOS flash program and ROM file)

4. Create a mount point and mount the floppy image.

mkdir /media/floppy mount -t msdos -o loop /boot/dos/FDOEM.144 /media/floppy

Note, if a different boot image is being used (instead of FDOEM.144) that should be called instead.
Once /media/floppy is mounted, one can copy various DOS programs to that "floppy" area, such that they will be present after one reboots.

Add boot option to grub for floppy

5. Add a boot option for the floppy image.

For redhat distros and derivatives the GRUB config file is:

/etc/grub.conf

for Suse

/boot/grub/menu.lst

title DOS
root (hd0,0)
kernel /boot/dos/memdisk
initrd /boot/dos/FDOEM.144
boot

Note: If you have separate boot parition you should remove "/boot" from the path:

title DOS
root (hd0,0)
kernel /dos/memdisk
initrd /dos/FDOEM.144
boot

Regarding the line: "root (hd0,0)", take a look at the other entries in your config file and put the same numbers

Dos boot images


In my work I am dealing with lots of machines which are not running Microsoft OS-es. However sometimes (for example to flash bios) I need to boot DOS, which really drives me mad, because I use linux on my laptop and all the dos images available for download are basically .exe programs which are trying to automatically create you an FDD disk. For example on this site: http://www.bootdisk.com/ there are lots of boot images, but unfortunately all of them are "smart" .exe programs.

As you can imagine this is completely unsuitable in my case for 2 reasons:

1. I dont run windows.
2. I dont have a FDD drive.

This basically renders this images pretty useless.
So my basic goal was to find a plain .img file, which can be copied directly on floppy with 'dd' or, the better choice, to be booted directly from GRUB.

After some google searching I found a site which have lots of Dos bootable disks in .img files.

You can download such images from this site: http://www.allbootdisks.com/

If you want to know, how to boot DOS from whitin GRUB, take a look at this article:

http://www.poweradded.net/2009/08/dos-boot-from-grub.html

Western Digital WD RE4 2TB HDD troubles: UPDATE1

Here is a little update of my article about WD RE4 2TB HDD

Here is the previous post:
http://www.poweradded.net/2009/08/western-digital-wd-re4-2tb-hdd-troubles.html

After 4 days of testing with "smart" parking turned off, using wdidle3.exe, it looks like the problem is gone so far.
However I am still pissed off to the manufacturer of controller and backplane (Supermicro for both). It is not acceptable some stupid failure of a lousy HDD to drag the other arrays down. You can not imagine the headache you will get when you loose 14T of data if it is important. And all this because of the stupid and not suitable for servers "smart" head parking from Western Digital


P.S. You know, when I was trying to find if anyone have at least something similar to my problems with these drives, when I type WD RE4 problems in google, I was getting some result for "Resident Evil 4". Quite suitable dont you think :) ? The new drives from Western Digital 2TB, Green Power, Resident Evil 4 (RE4).

More to come, when I get the HDDs and plug them in to my chassis.

Western Digital WD RE4 2TB HDD troubles

Story about, how my data got screwed, when I tried to use the new 2T RE4 WD drives.

Here is the story:
We have one NAS storage which is 24 port chassis from SuperMicro with expander on the backplane:
http://supermicro.com/products/chassis/4U/846/SC846E1-R900.cfm

Raid Controller: SuperMicro H4iR
http://supermicro.com/products/accessories/addon/AOC-USAS-H4iR.cfm

3 RAID5 arrays each with 6 x 1T RE3 Western Digital HDD

The machine was working just fine for more than 7 months.
In July I got 6x2T WD RE4 GreenPower and decided to full the last six bays with drives.
All was fine, the raid array was built with no problems, then I copied approx 5T of data on the new array, still no problem, then I left it idle for the night.

On the next morning the backup system started to use the new array, which after some time of work just went down, because 2 of the 2T HDDs failed + one array of 1T HDDs went down, because 2 of its HDDs were also reported dead by the controller.

At this point I was almost sure that all the data is lost, because all arrays are united in a single filesystem with linux LVM. And I was right. I marked one failed disk from each array as "online", which puts the array online, however I was unable to recover from this state, because the filesystem basically refused to repair at some point.

So I lost 14T of data because of some weird 2T + backplane + Controller behavior.

I returned the disks to my vendor, which started testing with the same type of chassis and same type of controller and guess what, the same crap, sudden OS reboots and other peculiar things.

After they spent lot of time in testing with different chassises (with and without expander), today they called me that probably they have found the reason for this shitty behavior.

They have disabled "intellypark" feature of the HDD and according to their tests this miraculously solved the problem. I am still not terribly convinced, but it really looks like this is the problem.

If you take a look at this thread here, you will see that this stupid GREEN thing can do more harm than good:

http://chbits.blogspot.com/2009/07/fixing-wd-gps-drives-with-wdtler-and.html

Conclusion: For servers: DO NOT BUY GREEN HDDs, they suck. All was just fine with the NON green RE1, RE2, RE3 and it looks like it will be fine and with RE4 with disabled head parking, which makes it so evil and non green.