Master in Linux Ubuntu Cpanel Vps
Tuesday, February 3, 2009
Resolve syslogd failed issue
[~]# rpm -q sysklogd
If not install it,
[~]# yum install sysklogd
Then restart the service,
[~]# /etc/init.d/syslog restart
Monday, January 12, 2009
Fantastico, PHPList and blank pages
I’ve lately had a few clients complaining about their PHPList installs not working properly when using Fantastico, I tried it out myself and it does indeed seem to be somewhat broken. Usually when you go the adminstration page, it simply shows and empty page and if you view the source for that page, it contains something along of these lines:
The cause of this problem seems to usually be a default Fantastico setting that is not properly set, to fix this, you will need to edit your /config/config.php
and replace the following:
define("PLUGIN_ROOTDIR","/tmp");
With the following, of course replacing username
by your cPanel username and making sure that the path for your account is correct (sometimes, it’s home2
instead of home
).
define("PLUGIN_ROOTDIR","/home/username/tmp");
After a few days of research on this and a few angry clients, it’s all figured out so I guess why not share them with other system administrators, it’ll save them some trouble!
Helpful cPanel included application
cPanel may have some very annoying bugs sometimes however there are very useful bits of it that can help in general system administration, dealing with a very busy server and trying to terminate an account that has high disk usage will make the load averages go sky high however thanks to this neat little application provided with cPanel, you can forget about freaking out on high server load. I have personally tried multiple solutions (including nice) but the loads would still go high and the server would be unusable.
With every cPanel installation, there is a binary located at /usr/local/cpanel/bin/cpuwatch
, what cpuwatch
does is that it executes the command and monitors the load, if the load goes past the set limit, it will stop the application and resume it after the load averages are below the threshold for a few seconds, the usage for it is very simply
cpuwatch
maxload : system load at which throttling should commence
command : command to run under cpuwatch
-p PID : monitor and throttle the existing process PID
Another neat feature is that it can fork a new process or attach itself to a running process, here is an example of deleting an account using SSH and setting the load average threshold to 4.0
/usr/local/cpanel/bin/cpuwatch 4.0 /scripts/killacct username
The load average will go past 4 however the process will stop running as it goes past that limit, if you already have a process running and you do not want to restart it all, you can run the following command to attach it to the process, in this case, the process ID of my process is 18274.
/usr/local/cpanel/bin/cpuwatch 4.0 -p 18274
It’s a very simple but very neat utility that has saved me a few times where I had to do major file operations and did not want the server to have high load averages, this same binary is also used when the logs are running for cPanel and as well as when the cPanel backups are running.
Migrating LVM volumes over network (using snapshots)
We run a big share of Xen virtual servers spanned over multiple servers and if you want to use the full or best capability of Xen, I would suggest LVM (Logical Volume Manager), it makes life a lot easier, especially for those who do not run a RAID setup (We run RAID10 on all VM nodes) as you can split the partition over multiple hard drives. I’m not going to cover setting up the LVM as there are loads of tutorials on how to do that but I will rather cover the best way to migrate a LVM volume.
First, we will need to create a snapshot of the LVM volume as we cannot create an image of the live version, we run the following line:lvcreate -L20G -s -n storageLV_s /dev/vGroup/storageLV
The 20G
part is the size of the snapshot LVM, I would suggest looking up the size of the real original LV and making it the same, you can find out the size of the LV by using this command: lvdisplay /dev/vGroup/storageLV
— There will be a “LV Size” field, get it from there and put it in the command, the -n
switch is for the name, usually I name them the same as the LV with a trailing _s
for snapshot, the last argument is simply the real LV that we want to make a snapshot of.
Afterwards, we will use dd in different way, usually if you use dd in one line, it’s either reading or it’s either writing which makes it crawl, to bypass this, we will read the LV and pipe it to one that writes so the minimum speed is the fastest speed of the slowest hard drive (I could re-phrase that but it’s 11:10 PM!) — To speed it up a bit more, we used a block size of 64K.dd if=/dev/vGroup/storageLV_s conv=noerror,sync bs=64k | dd of=/migrate/storageLV_s.dd bs=64k
I won’t cover the file transfer process as there are multiple methods, if you want to use SCP, I would suggest disabling encryption or anything as it really slows it down, our node usually has httpd
installed on them so I simply changed the configuration to listen on a different port (for security) and changed the DocumentRoot to /migrate
Once you got your file on the server, you’ll need to re-create the LV on the target server, you’ll need to run thislvcreate -L20G -n storageLV vGroup
You’ll have to keep the same size, bring the same name (this time without a trailing _s
as it won’t be a snapshot) and the volume group at the end.
The last step is to actually restore the image using dd, again using our block-size & pipe tweak for better performance.dd if=/migrate/storageLV_s.dd conv=noerror,sync bs=64k | dd of=/dev/vGroup/storageLV bs=64k
I have migrated around 16 LVs with this method without any problems, 13 of them were 20G each, 2 40G and 1 75G — So far every part is fast however I have to admit that the slowest part was the file transfer, I would suggest using a Gbit crossover or even better if you have a Gbit switch, if you don’t but you’re right next to the server, might consider using a spare USB 2.0 HDD as they are much faster compared to 100mbps (USB2.0 is around 480Mbps).
AACRAID based controllers timing out / aborting / SCSI hang
We’ve been lately starting to use more Adaptec RAID controllers rather than 3ware RAID controllers. 3ware has been nothing but trouble for us, dropping hard drives, even RAID5 arrays are running slower than a regular hard drive with no RAID. Our latest issue was a server just simply having a Kernel Panic when using high IO, our experience with 3ware RAID controllers & Linux is terrible.
On this other side, Adaptec has been great. We’ve been using them for a while now and see no problems at all, however there is just a small catch, Linux usually has a SCSI subsystem timeout of less than 30 seconds which results in a small difference between the controller timeout (at 35 seconds) versus the Linux timeout (at 30 seconds). This usually brings a server to a halt for a couple of seconds (and minutes in cases) till the server recovers, errors like this are thrown in the console:
aacraid: Host adapter abort request (0,1,3,0)
aacraid: Host adapter abort request (0,1,1,0)
aacraid: Host adapter abort request (0,1,2,0)
aacraid: Host adapter abort request (0,1,1,0)
aacraid: Host adapter abort request (0,1,2,0)
aacraid: Host adapter reset request. SCSI hang ?
The best method that usually works best is to increase the timeout higher than 45 to ensure that the Linux timeout does not occur before the RAID controller timeout, this is done per device / array.
echo '45' > /sys/block/sda/device/timeout
echo '45' > /sys/block/sdb/device/timeout
echo '45' > /sys/block/sdc/device/timeout
This should be done to every device, 45 is a good number however you can use what you’d like as long as it’s over 35. If you’re experiencing issues with loads going sky-high with no apparent reason, this might very well be the reason, to check if this is a possible cause, you can run the following
dmesg | grep aacraid
If you see errors like the ones that I have up there, then I suggest using that small workaround, if even after using the workaround, you’re still facing these problems, here are the suggestions/checklist that Adaptec suggests:
- Check for any updated firmware for the motherboard, controller, targets and enclosure on the respective manufacturer’s web sites.
- Check per-device queue depth in SYSFS to make sure it is reasonable.
- Engage disk drive manufacturer’s technical support department to check through compatibility or drive class issues.
- Engage enclosure manufacturer’s technical support department to check through compatibility issues.
Anyhow, just like with every Linux issue, your mileage may vary, so if you know of any other fixes or figured out a way how to fix this, feel free to post it as a comment to help others.
/tmp clean-up script modification, sessions dying with PHP
It seems there there was a little flaw in the script that I wrote a while ago, any PHP sessions on the server will timeout/die after 1 hour if you run that as an hourly cronjob, I have made a small modification to the script.
The only small modification is that now, it deletes all sess_*
files that have not been accessed for 5 days therefore are probably just sitting there and never going to be used again, the rest remain deleted because it’s failed uploads/etc that will never be used again.
#!/bin/bash
# Change directory to /tmp
cd /tmp
# Clean up trash left by Gallery2
ls | grep '[0-9].inc*’ | xargs rm -fv
# Clean up PHP temp. session files
find /tmp -atime +5 -name ’sess_*’ -print | xargs rm -fv
# Clean up dead vBulletin uploads
ls | grep ‘vbupload*’ | xargs rm -fv
# Clean up failed php uploads
ls | grep ‘php*’ | xargs rm -fv
# Clean up failed ImageMagick conversions.
ls | grep ‘magick*’ | xargs rm -fv
Thanks!
Migrating cPanel reseller accounts with root WHM access
We have a few clients that were WHM resellers and upgraded to VPS with root cPanel, however there is no main easy way to migrate WHM reseller account to a root account under WHM, that’s where we started playing around and making our own script. You need root access to the source and final server to do this, this looks like the fastest way to do this at the moment.
First of all, we need to get all the accounts and create a backup of their accounts, this is done using a couple of bash lines and parsing the file /etc/trueuserowners
. That file contains every account on the server and the accounts that own it. This following will go over that file, take out the resellers accounts and back them up one by one. The best way to make sure no mistake happens is to ensure that /home
or /home2
or any other /home
directories because the /scripts/pkgacct
creates them there. We also use cpuwatch
not to create excessive load on the server when creating the backup. It will stop the process once load averages go higher than 10.
for i in `cat /etc/trueuserowners | grep resellerusername | awk -F ‘:’ ‘{ print $1; }’`;
do /usr/local/cpane/bin/cpuwatch 10 /scripts/pkgacct $i;
done;
Usually this takes a while so this is what you can do if it’s halfway done to save time
/usr/bin/rsync --delete -avvxHt --progress -e ssh /home/cpmove-* 1.2.3.4:~
This will start moving the files from the old server to the new one, we use rsync
so that we can just run it again after the full backup is done so we don’t have to restart the whole process again. Once it’s done and all data has been moved, another loop, I did this one quickly however I think it could be cleaner, if someone has a better way to do with (regexp?), please let me know.
for i in `ls | grep cpmove- | awk -F '.' '{ print $1 }' | awk -F '-' '{ print $2 }'`;
do /scripts/restorepkg $i;
done;
After finishing that, you should be all done!