Over the last couple of weeks, I have had a few mysterious crashes with Mandrake 10.1 Community Edition. When the machine became non responsive, it would not accept ssh connections, typically failing with a "no route to host" response, or even pings. There would be no video on the console and even the Caps Lock key on the keyboard would not work. I would have no choice but to hit the Reset button on the machine. The logs did not yield any useful information. In fact, at the time of the crash, there would be a few lines of junk consisting of the pattern ^@ written to /var/log/messages, which made it hard to diagnose the problem.
Yesterday, it happened again but this time, the behaviour was slightly different in that the machine would return a "refused ssh connection port 22" (or something to that effect) when I attempted to get a remote shell on it. When I looked in the logs, I saw some useful information starting with the following.
Nov 1 08:29:24 jupiter kernel: Unable to handle kernel paging request at virtual address 07610034
The kernel dumped the state of the CPU registers and memory into the logs but the machine was still accepting ssh connections at that point. Five minutes later, it appears to have gone off into the weeds completely as I saw the following in the logs:
#################
Nov 1 08:34:59 jupiter kernel: ------------[ cut here ]------------
Nov 1 08:34:59 jupiter kernel: kernel BUG at include/linux/list.h:148!
Nov 1 08:34:59 jupiter kernel: invalid operand: 0000 [#2]
Nov 1 08:34:59 jupiter kernel: CPU: 0
Nov 1 08:34:59 jupiter kernel: EIP: 0060:[] Tainted: PF VLI
Nov 1 08:34:59 jupiter kernel: EFLAGS: 00010097
Nov 1 08:34:59 jupiter kernel: EIP is at shrink_cache+0x2fa/0x320
Nov 1 08:34:59 jupiter kernel: eax: ffffffff ebx: c0320494 ecx: c1482c58 edx: c12fa1d8
Nov 1 08:34:59 jupiter kernel: esi: c03204ac edi: c12fa1c0 ebp: dfe27ebc esp: dfe27e40
Nov 1 08:34:59 jupiter kernel: ds: 007b es: 007b ss: 0068
Nov 1 08:34:59 jupiter kernel: Process kswapd0 (pid: 8, threadinfo=dfe26000 task=dfe2cca0)
Nov 1 08:34:59 jupiter kernel: Stack: c03204ac c1157390 00000043 00000042 00000000 0000014d c13ef7f0 c1277760
Nov 1 08:34:59 jupiter kernel: 00000000 00000001 c128ed58 c128ec40 c128eec0 dfe27ea8 c03205a8 c0320494
Nov 1 08:34:59 jupiter kernel: 0000014d dfe27ef4 dfe27ebc c0144bb2 c0320494 00000080 dfe27f20 0000000b
Nov 1 08:34:59 jupiter kernel: Call Trace:
Nov 1 08:34:59 jupiter kernel: [] shrink_zone+0x72/0xa0
Nov 1 08:34:59 jupiter kernel: [] balance_pgdat+0x117/0x250
Nov 1 08:34:59 jupiter kernel: [] kswapd+0xc7/0xe0
Nov 1 08:34:59 jupiter kernel: [] autoremove_wake_function+0x0/0x40
Nov 1 08:34:59 jupiter kernel: [] autoremove_wake_function+0x0/0x40
Nov 1 08:34:59 jupiter kernel: [] kswapd+0x0/0xe0
Nov 1 08:34:59 jupiter kernel: [] kernel_thread_helper+0x5/0x10
Nov 1 08:34:59 jupiter kernel:
Nov 1 08:34:59 jupiter kernel: Code: 5e 5f 5d c3 0f 0b 95 00 7d fb 2d c0 e9 53 ff ff ff 0f 0b 94 00 7d fb 2d c0 e9 3b ff ff ff 0f 0b 95 00 7d fb 2d c0 e9 0
5 fe ff ff <0f> 0b 94 00 7d fb 2d c0 e9 ed fd ff ff 8d 55 9c 39 55 9c 0f 85
Nov 1 08:34:59 jupiter kernel: ------------[ cut here ]------------
Nov 1 08:34:59 jupiter kernel: kernel BUG at mm/vmscan.c:514!
Nov 1 08:34:59 jupiter kernel: invalid operand: 0000 [#3]
Nov 1 08:34:59 jupiter kernel: CPU: 0
Nov 1 08:34:59 jupiter kernel: EIP: 0060:[] Tainted: PF VLI
Nov 1 08:34:59 jupiter kernel: EFLAGS: 00013046
Nov 1 08:34:59 jupiter kernel: EIP is at shrink_cache+0xe1/0x320
Nov 1 08:34:59 jupiter kernel: eax: 00000000 ebx: c0320494 ecx: 00000000 edx: c12fa1d8
Nov 1 08:34:59 jupiter kernel: esi: c03204ac edi: c12fa1c0 ebp: d364dc78 esp: d364dbfc
Nov 1 08:34:59 jupiter kernel: ds: 007b es: 007b ss: 0068
Nov 1 08:34:59 jupiter kernel: Process vmware-vmx (pid: 6210, threadinfo=d364c000 task=dceeb2a0)
Nov 1 08:34:59 jupiter kernel: Stack: c03204ac c11d5ad8 00000001 00000000 00000000 00000020 d364dc14 d364dc14
Nov 1 08:34:59 jupiter kernel: 00000000 00000001 c12b75c8 c12af530 c12af490 00d8fb73 00000000 c0320494
Nov 1 08:34:59 jupiter kernel: 00000020 d364dca4 d364dc78 c0144bb2 c0320494 00000080 d364dce4 0000000c
Nov 1 08:34:59 jupiter kernel: Call Trace:
Nov 1 08:34:59 jupiter kernel: [] shrink_zone+0x72/0xa0
Nov 1 08:34:59 jupiter kernel: [] shrink_caches+0x8c/0xc0
Nov 1 08:34:59 jupiter kernel: [] try_to_free_pages+0xa8/0x180
Nov 1 08:34:59 jupiter kernel: [] __alloc_pages+0x1a4/0x330
Nov 1 08:34:59 jupiter kernel: [] do_page_cache_readahead+0xcf/0x100
Nov 1 08:34:59 jupiter kernel: [] page_cache_readahead+0xec/0x1a0
Nov 1 08:34:59 jupiter kernel: [] do_generic_mapping_read+0xa2/0x3a0
Nov 1 08:34:59 jupiter kernel: [] file_read_actor+0x0/0x110
Nov 1 08:34:59 jupiter kernel: [] __generic_file_aio_read+0x17c/0x1c0
Nov 1 08:34:59 jupiter kernel: [] file_read_actor+0x0/0x110
Nov 1 08:34:59 jupiter kernel: [] buffered_rmqueue+0xc9/0x150
Nov 1 08:34:59 jupiter kernel: [] generic_file_read+0x80/0xa0
Nov 1 08:34:59 jupiter kernel: [] pipe_readv+0x262/0x2c0
Nov 1 08:34:59 jupiter kernel: [] do_pollfd+0x98/0xa0
Nov 1 08:34:59 jupiter kernel: [] pipe_read+0x24/0x30
Nov 1 08:34:59 jupiter kernel: [] vfs_read+0x8e/0xe0
Nov 1 08:34:59 jupiter kernel: [] sys_pread64+0x45/0x70
Nov 1 08:34:59 jupiter kernel: [] sysenter_past_esp+0x52/0x79
Nov 1 08:34:59 jupiter kernel:
Nov 1 08:34:59 jupiter kernel: Code: 8b 75 84 39 73 18 0f 84 8e 00 00 00 8b 53 1c 8d 7a e8 8b 47 1c 39 f0 74 07 83 e8 18 8d 74 26 00 0f ba 72 e8 05 19 c0 8
5 c0 75 08 <0f> 0b 02 02 c3 4e 2e c0 8b 4a 04 39 11 0f 85 06 02 00 00 8b 02
Nov 1 08:34:59 jupiter kernel: ------------[ cut here ]------------
Nov 1 08:34:59 jupiter kernel: kernel BUG at mm/vmscan.c:514!
Nov 1 08:34:59 jupiter kernel: invalid operand: 0000 [#4]
Nov 1 08:34:59 jupiter kernel: CPU: 0
Nov 1 08:34:59 jupiter kernel: EIP: 0060:[] Tainted: PF VLI
Nov 1 08:34:59 jupiter kernel: EFLAGS: 00213046
Nov 1 08:34:59 jupiter kernel: EIP is at shrink_cache+0xe1/0x320
Nov 1 08:34:59 jupiter kernel: eax: 00000000 ebx: c0320494 ecx: 00000000 edx: c12fa1d8
Nov 1 08:34:59 jupiter kernel: esi: c03204ac edi: c12fa1c0 ebp: cbb3dda0 esp: cbb3dd24
Nov 1 08:34:59 jupiter kernel: ds: 007b es: 007b ss: 0068
Nov 1 08:34:59 jupiter kernel: Process vmware-vmx (pid: 6212, threadinfo=cbb3c000 task=cdf278c0)
Nov 1 08:34:59 jupiter kernel: Stack: c03204ac c136bfc8 00000001 00000000 00000000 00000020 cbb3dd3c cbb3dd3c
Nov 1 08:34:59 jupiter kernel: 00000000 00000001 c130a9f8 c130b7b8 c130ba38 00203046 cbb3dd78 c0320494
Nov 1 08:34:59 jupiter kernel: 00000020 cbb3ddcc cbb3dda0 c0144bb2 c0320494 00000080 cbb3de0c 0000000c
Nov 1 08:34:59 jupiter kernel: Call Trace:
Nov 1 08:34:59 jupiter kernel: [] shrink_zone+0x72/0xa0
Nov 1 08:34:59 jupiter kernel: [] shrink_caches+0x8c/0xc0
Nov 1 08:34:59 jupiter kernel: [] try_to_free_pages+0xa8/0x180
Nov 1 08:34:59 jupiter kernel: [] __alloc_pages+0x1a4/0x330
Nov 1 08:34:59 jupiter kernel: [] __get_free_pages+0x22/0x60
Nov 1 08:34:59 jupiter kernel: [] __pollwait+0x71/0xb0
Nov 1 08:34:59 jupiter kernel: [] pipe_poll+0x24/0x70
Nov 1 08:34:59 jupiter kernel: [] do_pollfd+0x98/0xa0
Nov 1 08:34:59 jupiter kernel: [] do_poll+0x57/0xd0
Nov 1 08:34:59 jupiter kernel: [] sys_poll+0x187/0x260
Nov 1 08:34:59 jupiter kernel: [] __pollwait+0x0/0xb0
Nov 1 08:34:59 jupiter kernel: [] sysenter_past_esp+0x52/0x79
#################
I had "upgraded" to Mandrake 10.1 CE only a few weeks ago and this coincided with the mysterious crashes. The kernel that was running was 2.6.3x. Apparently, it has some problems. I installed the 2.6.8.1-10 Mandrake kernel and set that to be the default and hope that fixes the problem.
During the course of the restarts I had to do, I noticed that as fstab was being read, the kernel complained that it could not allocate swap space on /dev/hdb. I knew immediately what that was about as I had removed /dev/hdb, an 80GB drive, which was not being used for anything really, other than the swap partition that was on it, to use it in another machine. I had forgotten to comment out the line in fstab referencing that drive. I still had and continue to have a 1GB swap partition on /dev/hda so the machine worked fine but I wonder if my mistake was what was causing the older kernel to crash.