Overview
--------
The diskdumputils package is installed on the machine that enables
disk dump. It loads and configures the diskdump modules so that
if the machine crashes, the memory dump will be dumped to disk.

Supported Drivers
------------------
Disk dump only works on the disks which are connected to
adapters that the following drivers control:

	RHEL3			RHEL4
	---------------------------------
	aic7xxx			aic7xxx
	aic79xx			aic79xx
	dpt_i2o
				ipr
	megaraid2		megaraid
	mptfusion		mptfusion
	sym53c8xx		sym53c8xx
	sata_promise		sata_promise
	ata_piix		ata_piix
	CCISS

Supported Kernels
------------------
Disk dump is supported in the following Red Hat kernels,
where <kernel-version> is the version containing this
diskdumputils package:

	RHEL3
		kernel-<kernel-version>.i686.rpm
		kernel-smp-<kernel-version>.i686.rpm
		kernel-hugemem-<kernel-version>.i686.rpm
		kernel-<kernel-version>.athlon.rpm
		kernel-smp-<kernel-version>.athlon.rpm
		kernel-<kernel-version>.ia64.rpm
		kernel-<kernel-version>.x86-64.rpm
		kernel-smp-<kernel-version>.x86-64.rpm
		kernel-smp-<kernel-version>.ia32e.rpm
	RHEL4
		kernel-<kernel-version>.i686.rpm
		kernel-smp-<kernel-version>.i686.rpm
		kernel-hugemem-<kernel-version>.i686.rpm
		kernel-<kernel-version>.ia64.rpm
		kernel-<kernel-version>.ppc64.rpm
		kernel-<kernel-version>.x86-64.rpm
		kernel-smp-<kernel-version>.x86-64.rpm

Setup
------------
The setup procedure is as follows.  First a dump device must be
selected.  Either a whole device or a partition is fine.  The dump
device is wholly formatted for dump, and cannot be shared with
a file system.  A swap partition can be used as well as a dump
partition.  The size of dump device should be big enough to
save the whole dump.  The dump size to be written consists of
the size of whole memory plus a header field.  To determine
the exact size required, refer to the output of /proc/diskdump
after the diskdump module is loaded:

  # modprobe diskdump

  # cat /proc/diskdump

  # sample_rate: 8
  # block_order: 2
  # fallback_on_err: 1
  # allow_risky_dumps: 1
  # total_blocks: 262042
  #

The total block size is expressed in page-size units, so in this
example, the selected device must contain at least (262042 * 4096) bytes
on an i386 machine.


Select the dump partition in /etc/sysconfig/diskdump, as in the
following example:

  -------------------
  DEVICE=/dev/sde1
  -------------------

Multiple dump partitions can be specified as a colon-separated list.
Each partition should be large enough to contain whole dump, even if
multiple partitions are specified.

Next, Format the dump partition.  The administrator needs to execute
this once:

  # service diskdump initialformat


Lastly, enable the service:

  # chkconfig diskdump on
  # service diskdump start


If /proc/diskdump exists, and it shows the registered dump device,
the diskdump has been activated:

  # cat /proc/diskdump
  /dev/sde1 514080 1012095

Testing diskdump
---------------
To test the diskdump, use Alt-SysRq-C or "echo c > /proc/sysrq-trigger".
After completing the dump, a vmcore file will be created during the next
reboot sequence, and saved in a directory of the name format:

  /var/crash/127.0.0.1-<date>

The vmcore file's format is same as that created by the netdump
facility, so you can use the crash command to analyze it

Note
----
Once you set up, it is not necessary to do anything after that.
But you should always maintain enough diskspace in /var/crash.
If there is not enough space, the dump file will be partially saved;
an incomplete dump file will be named vmcore-incomplete.

Diskdump currently contains one customizable script file called
diskdump-nospace.  The diskdump-nospace script is called prior
to the creation of the vmcore file if /var/crash does not have
enough space to hold the complete dumpfile.  The script may be
customized to clean up enough space for the dump in question
to proceed.

Tunable parameters
----
The diskdump module has following module parameters:

block_order:	Specifies the dump-time I/O block size. Default value is 2,
		which sets the I/O block size equal to "page-size << 2", or
		16 kbytes on an i386 machine. Larger values may make for
		better performance, but occupies more module memory.

sample_rate:	Determine how many blocks in the dump partition are verified
		before actual memory dumping begins. Default value is 8,
		which means one of every "1<<8" (256) blocks are verified.
		Specifying zero means all blocks in the partition are verified,
		and a minus value disables verification.

dump_level:	A memory collection level that specifies which memory pages
		will be dumped.  Default value of 0 dumps all pages of 
		physical RAM into the vmcore file.  To avoid excessively
		large vmcore files, page cache pages, zero-filled pages,
		free pages, and user application pages may be eliminated
		from the file.  Specifying one of the dump_level values 
		from 1 to 15 will skip one or more memory page type(s) if 
		that page type is marked with an X in the following table:

		dump	cache	zero	free	user	description
		level	page	page	page	page
		---------------------------------------------------------
		  0					default
		  1	 X
		  2		 X
		  3	 X	 X			recommended
		  4			 X
		  5	 X		 X
		  6	 	 X	 X
		  7	 X	 X	 X
		  8				X
		  9	 X			X
		 10		 X		X
		 11	 X	 X		X
		 12			 X	X
		 13	 X		 X	X
		 14		 X	 X	X
		 15	 X	 X	 X	X	minimum dump size

This partial dump feature provides a memory collection level that can select
the amount of physical memory that is dumped.  All of physical memory is 
usually not required to investigate a kernel issue.  Most of physical memory
typically contains user application data, page cache memory (file data), free
memory pages, and zero-filled pages.  By skipping one of more of those page
types when creating the vmcore file, the crash dump will be significantly
smaller, and the dump procedure less time-consuming.  While the actual vmcore
file size may vary because of the status of system and the dump_level specified,
the minimum amount of data required to analyze the dump will always be captured.
However, since there may be circumstances where it will be necessary to capture
all of physical memory, it is not recommended that a dump partition size be less
than the actual memory size of system.  

Note that this feature has some risks.  There are memory management lists
which are scanned for a page's memory attribute, so if the list has been
corrupted, the scanning process may fail.  For example, when specifying a 
dump_level from 4-7 or from 12-15, the kernel's free page linked lists are
scanned; if the list is corrupt, diskdump may hang.  Furthermore, it is
possible that a page type that has been skipped may be necessary to fully
investigate the cause of some issues.  Therefore, a memory collection level
should be selected to suit each situation.  The recommended level is 3, 
because it is easiest to determine whether a page is zero-filled or if it
is a page cache page, and because no page lists need to be traversed.

Example:

The following option sets I/O block size to 32 kbytes, and verification is
done on every block in the partition.  Also, cache page and zero page are
skipped by partial dump feature.

	options diskdump 'block_order=3' 'sample_rate=0' 'dump_level=3'


