Help and Knowledge

Dedicated Servers

Dedicated Servers

Navigating the Rescue Mode for Linux

This document will take you through the process of booting your Linux server into rescue mode to identify and fix the problem(s) that may be causing it to be unresponsive.

This guide will instruct you on how to:

Logging into rescue mode

If your Linux dedicated server is unresponsive and fails to come online after a reboot, you can boot the server into rescue mode from the Tagadab control panel to identify and fix the problem.

  1. Once rescue mode has been started on your dedicated server, log into the system via SSH using your servers usual IP address and the root password that was set when the system was first installed (you can find this in your Tagadab control panel). You can also access the server in graphical mode using VNC if you have a VNC client installed.

    Please be aware that the rescue mode system will have a different SSH host key to your normal server. If you are using PuTTY you will see a warning like Screen 1:

    PuTTY security warning

    Screen 1

  2. Accept the warning by clicking the 'Yes' button and logging in. If you are using SSH from a Linux or Mac shell, you may need to remove the old version of the SSH key from your known hosts file before logging in. Once you have finished with rescue mode and booted your server normally, it will return to using its usual SSH host key and you will see a similar warning again.

    You should see a window similar to Screen 2 once you are logged in:

    Linux or Mac shell

    Screen 2

 Back to Top

Identifying your disk partitions

  1. Identify your disk partitions before recovering your system. Get a list of all of the disks connected to the system and their partitions by running the command 'fdisk –l' as noted in Screen 3:

    Disk list

    Screen 3

  2. The exact output from this will vary depending on the number of disk in your server, the number of partitions on each disk, and whether or not your system uses software RAID. Screen 3 shows one disk (/dev/sda) that contains four partitions (numbered 1, 2, 5 and 6). The first partition (/dev/sda1) is marked as bootable, so this would be the partition mounted under /boot.

    The second partition (/dev/sda2) is an extended partition and is only used as a container for the other two partitions. It is not mountable. The third partition (/dev/sda5) is the swap space, and the fourth partition (/dev/sda6) is the root partition, normally mounted as /. If your server has two disks the output will look something like Screen 4:

    Two disks

    Screen 4

    If your system uses software RAID, it will look something like Screen 5:

    Two disks

    Screen 5

  3. If your system uses software RAID, there are additional steps you will need to take before attempting to fix disk issues or access your data. Please refer to the separate software RAID instructions in the following sections.

    If no disks are displayed (or an incorrect number of disks are displayed) then the disk(s) may have already suffered a catastrophic failure. In such an event, you will need to ask Tagadab Support to arrange for a replacement disk / server and then restore any backups.

 Back to Top

Detecting physical disk problems

  1. Your disk(s) may have physical errors that cannot be corrected and would require a disk replacement. You can use the smartctl program to test the disk to see if this is the case. First, check that the disk has its SMART capability enabled with the command 'smartctl –i /dev/diskname', swapping diskname for the correct device as shown in Screen 6.

    This command should be successful as all Tagadab disks have SMART enabled. If this command does not successfully return the disk(s), a catastrophic failure may have occurred and the disk(s) will need to be replaced.

    SMART

    Screen 6

  2. Run a test on the disk using 'smartctl –t short /dev/diskname'. Further options are available (use 'man smartctl' to see them). You will see a message that the test will take around one minute to complete as shown in Screen 7:

    smart-ctl

    Screen 7

  3. After waiting a minute, use 'smartctl –a /dev/diskname' to see the results displayed as a table with the number of disk failures that have occurred over the disk's lifetime. The example in Screen 8 does not show any major errors:

    No errors

    Screen 8

  4. Look out for a high error count next to any of the errors with the type 'Pre-fail' as these may be an indication that the disk is going to fail soon. If any of your disks have this type of error, please contact Tagadab Support.
  5. Smartctl can be used on systems with multiple disks by running the above sequence of commands for each disk (not each partition).

    RAID Instructions

    There are no separate instructions required for this section.

 Back to Top

Detecting and fixing file system errors

  1. Your server may fail to boot if there are errors with the file system. You can identify and correct these errors using the fsck tool. For example, if you have seen errors in the systems logs indicating partition problems on the root disk (/dev/sda6 as shown in Screen 9), you can try to correct this by running the command 'fsck /dev/sda6'. This must be done before the disk has been mounted.

    fsck

    Screen 9

  2. In Screen 9, there are a few minor errors that fsck has fixed. For more severe errors, fsck may ask if you would like to fix them through a prompt. To avoid being prompted and simply accept the default options, run the fsck command with the –a flag. Further details are available from the fsck manual (type 'man fsck').
  3. If you fix any disk errors, exit rescue mode and attempt to boot the system normally. If the system still fails to boot, or can’t fix the disk errors, you may need to recover any data that you did not back up (see the section on recovering data).

    RAID Instructions

    Perform fsck on the RAID device rather than on the member partitions to check the file system on both disks simultaneously. The RAID device will likely be either /dev/md0 or /dev/md1, whichever is the largest (the smaller RAID device will be swap space). In Screen 10, minor errors have been corrected.

    Minor errors fixed

    Screen 10

 Back to Top

Accessing your data

            1. If your disks did not show any errors, or you know your system did not boot due to disk related reasons (e.g., incorrectly enabled firewall, incorrectly modified grub, etc.) you will need to access your disk(s) to either correct the problem or recover the data before reimaging. To do this, the disk(s) needs to be mounted.
            2. From earlier steps, you should have already established the root partition. In our one disk example shown in Screen 11, it is /dev/sda6 and in our RAID example sin Screen 12 it is /dev/md0. For servers with multiple disks, you may want to access the partition on the second disk, although problems that prevent a server booting will normally be on the partition mounted at /.
            3. To access the data on the root partition, create a mount point for the partition. For our one disk system, it will be created at /mnt/sda6. We then mount the disk on this mount point, and cd into the directory to view our system as shown below in Screen 11:

              Mount point

              Screen 11

              RAID Instructions

              Create a mount point at /mnt/md0, mount the RAID device here and cd into the directory as shown in Screen 12. You can now view and edit your files using standard Linux tools (such as less, cat, vi, nano).

              RAID Mount point

              Screen 12

Chroot

            1. Use the chroot command to change the root of the rescue system to the root on the disk. This is needed if you wanted to use the 'passwd' program to reset one of your system passwords.
            2. Then use 'chroot mountpoint' to change the root to the partition you have mounted. In Screen 13 we used 'chroot /mnt/sda6' or 'chroot /mnt/md0'. You may see an error such as:

              chroot: failed to run command `/bin/zsh': No such file or directory

              > This indicates that the zsh shell used by the rescue system is not available to run (i.e., it is not installed) on your dedicated server. In this case, modify the command to run the bash shell:

              'chroot mountpoint bash'

            3. Finally, run any remaining commands (such as passwd), and use exit to come out of the chroot.

              chroot mount point

              Screen 13

Recovering your data

If you are unable to fix your server, you will need to copy any data that is not backed up before requesting a reimage from the Tagadab control panel. If you have access to another server that runs FTP or SSH, use the command line FTP or SCP tools to upload your data to that server. Otherwise, you can connect an SCP client (such as WinSCP for Windows) to the rescue mode server, navigate to the point where you mounted the disk and download the data to your local system.

SCP client

Screen 14

Unmount

When you have finished making changes, unmount the disk, and end rescue mode by rebooting the server from the control panel as shown in Screen 15. If necessary, reimage the server via the Tagadab control panel.

Unmount

Screen 15

 Back to Top