Discovering FreeBSD in my own way

FreeBSD specific actions

GEOM Boot Problems

Postby lik » Mon Jun 15, 2009 7:49 am

GEOM Boot Problems

How to fix boot problems caused by errors in loader.conf when using GEOM/GMIRROR

Recently I had a problem where adding an entry into loader.conf caused my machine to fail at boot stage 3 (/boot/loader). This problem was compounded by the fact my boot disks were running on a GEOM mirror; the GEOM module not being loaded by a FixIt cd and my inability to load it using kldload once FixIt was loaded.


BackgroundI have scsi 4 disks on dual bus. Disks 1 and 3 are a gmirror (/dev/mirror/FreeBSD) and hold my whole FreeBSD install. The other 2 are for data and are using setup using a zfs stripe (but not relevant). After making an addition to /boot/loader.conf and rebooting, the bootstrap hangs at /boot/defaults/loader.conf. So, I presumed the entry I just made was causing problems.


Normally, booting to a FixIt CD, mounting /boot and editing the file would have resolved this sort of problem. However, the geom_mirror module isn't loaded into the kernel by the disc and as such, there is no /dev/mirror/FreeBSDs1x to mount. I was also unable to add the module manually because either kldload is unavailable from the CD; or it returned an error (I forget which now).


To get around this problem I performed the following steps:

1) break out to the loader prompt during boot from CD - select option 6

2) unload all the currently loaded modules run unload. I found this had to be done because just telling it to load geom_mirror, still didn't work. It may have conflicted with another module. All other necessary modules are loaded at a later stage anyway.

3) tell loader to include geom_mirror run enable-module geom_mirror

4) carry on with boot, but when its ready to mount the root filesystem, stop and ask what to boot run boot -a -v


This will then continue with verbose output, then stop and return a mountroot> prompt. ? here shows a list of devices that can be mounted. Specify your root partition and its filesystem. I continued as follows:


5) mount gmirror type ufs:mirror/FreeBSDs1a

6) because this was a bit of trial and error I had to scan my filesytems. So if you receive a root shell fsck the devices - fsck /dev/mirror/FreeBSDs1a, fsck /dev/mirror/FreeBSDs1b etc.

7) your computer should continue to boot as it did before editing loader.conf. Let it finish, login, remove your erroneous entries

8) remove your FixIt CD and reboot. Your computer should now boot to its GEOM Mirror as before.
lik
Founder
Founder
 
Posts: 497
Joined: Wed Dec 15, 2010 3:21 am

Sample /etc/pf.conf

Postby lik » Wed Aug 12, 2009 6:32 am

# cat /etc/pf.conf
# $FreeBSD: src/etc/pf.conf,v 1.2.2.1 2006/04/04 20:31:20 mlaier Exp $
# $OpenBSD: pf.conf,v 1.21 2003/09/02 20:38:44 david Exp $
#
# See pf.conf(5) and /usr/share/examples/pf for syntax and examples.
# Required order: options, normalization, queueing, translation, filtering.
# Macros and tables may be defined and used anywhere.
# Note that translation rules are first match while filter rules are last match.

# Macros: define common values, so they can be referenced and changed easily.
#ext_if="ext0" # replace with actual external interface name i.e., dc0
#int_if="int0" # replace with actual internal interface name i.e., dc1
#internal_net="10.1.1.1/8"
#external_addr="192.168.1.1"

# Tables: similar to macros, but more flexible for many addresses.
#table <foo> { 10.0.0.0/8, !10.1.0.0/16, 192.168.0.0/24, 192.168.1.18 }

# Options: tune the behavior of pf, default values are given.
#set timeout { interval 10, frag 30 }
#set timeout { tcp.first 120, tcp.opening 30, tcp.established 86400 }
#set timeout { tcp.closing 900, tcp.finwait 45, tcp.closed 90 }
#set timeout { udp.first 60, udp.single 30, udp.multiple 60 }
#set timeout { icmp.first 20, icmp.error 10 }
#set timeout { other.first 60, other.single 30, other.multiple 60 }
#set timeout { adaptive.start 0, adaptive.end 0 }
#set limit { states 10000, frags 5000 }
#set loginterface none
#set optimization normal
#set block-policy drop
#set require-order yes
#set fingerprints "/etc/pf.os"

# Normalization: reassemble fragments and resolve or reduce traffic ambiguities.
#scrub in all

# Queueing: rule-based bandwidth control.
#altq on $ext_if bandwidth 2Mb cbq queue { dflt, developers, marketing }
#queue dflt bandwidth 5% cbq(default)
#queue developers bandwidth 80%
#queue marketing bandwidth 15%

# Translation: specify how addresses are to be mapped or redirected.
# nat: packets going out through $ext_if with source address $internal_net will
# get translated as coming from the address of $ext_if, a state is created for
# such packets, and incoming packets will be redirected to the internal address.
#nat on $ext_if from $internal_net to any -> ($ext_if)

# rdr: packets coming in on $ext_if with destination $external_addr:1234 will
# be redirected to 10.1.1.1:5678. A state is created for such packets, and
# outgoing packets will be translated as coming from the external address.
#rdr on $ext_if proto tcp from any to $external_addr/32 port 1234 -> 10.1.1.1 port 5678

# rdr outgoing FTP requests to the ftp-proxy
#rdr on $int_if proto tcp from any to any port ftp -> 127.0.0.1 port 8021

# spamd-setup puts addresses to be redirected into table <spamd>.
#table <spamd> persist
#no rdr on { lo0, lo1 } from any to any
#rdr inet proto tcp from <spamd> to any port smtp -> 127.0.0.1 port 8025

# Filtering: the implicit first two rules are
#pass in all
#pass out all

# block all incoming packets but allow ssh, pass all outgoing tcp and udp
# connections and keep state, logging blocked packets.
#block in log all
#pass in on $ext_if proto tcp from any to $ext_if port 22 keep state
#pass out on $ext_if proto { tcp, udp } all keep state

# pass incoming packets destined to the addresses given in table <foo>.
#pass in on $ext_if proto { tcp, udp } from any to <foo> port 80 keep state

# pass incoming ports for ftp-proxy
#pass in on $ext_if inet proto tcp from any to $ext_if port > 49151 keep state

# Alternate rule to pass incoming ports for ftp-proxy
# NOTE: Please see pf.conf(5) BUGS section before using user/group rules.
#pass in on $ext_if inet proto tcp from any to $ext_if user proxy keep state

# assign packets to a queue.
#pass out on $ext_if from 192.168.0.0/24 to any keep state queue developers
#pass out on $ext_if from 192.168.1.0/24 to any keep state queue marketing
lik
Founder
Founder
 
Posts: 497
Joined: Wed Dec 15, 2010 3:21 am

How can I make the most of the data I see when my kernel pan

Postby lik » Sun Aug 30, 2009 5:04 am

Here is typical kernel panic:

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x40
fault code = supervisor read, page not present
instruction pointer = 0x8:0xf014a7e5
stack pointer = 0x10:0xf4ed6f24
frame pointer = 0x10:0xf4ed6f28
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 80 (mount)
interrupt mask =
trap number = 12
panic: page fault


When you see a message like this, it is not enough to just reproduce it and send it in. The instruction pointer value is important; unfortunately, it is also configuration dependent. In other words, the value varies depending on the exact kernel image that you are using. If you are using a GENERIC kernel image from one of the snapshots, then it is possible for somebody else to track down the offending function, but if you are running a custom kernel then only you can tell us where the fault occurred.

What you should do is this:
1. Write down the instruction pointer value. Note that the 0x8: part at the beginning is not significant in this case: it is the 0xf0xxxxxx part that we want.
2. When the system reboots, do the following:
Code: Select all
nm -n kernel.that.caused.the.panic | grep f0xxxxxx

where f0xxxxxx is the instruction pointer value. The odds are you will not get an exact match since the symbols in the kernel symbol table are for the entry points of functions and the instruction pointer address will be somewhere inside a function, not at the start. If you do not get an exact match, omit the last digit from the instruction pointer value and try again, i.e.:
Code: Select all
nm -n kernel.that.caused.the.panic | grep f0xxxxx

If that does not yield any results, chop off another digit. Repeat until you get some sort of output. The result will be a possible list of functions which caused the panic. This is a less than exact mechanism for tracking down the point of failure, but it is better than nothing.

However, the best way to track down the cause of a panic is by capturing a crash dump, then using kgdb(1) to generate a stack trace on the crash dump.

In any case, the method is this:

1. Make sure that the following line is included in your kernel configuration file (/usr/src/sys/arch/conf/MYKERNEL):
makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols

2. Change to the /usr/src directory:
Code: Select all
cd /usr/src

3. Compile the kernel:
Code: Select all
make buildkernel KERNCONF=MYKERNEL

4. Wait for make(1) to finish compiling.
5.
Code: Select all
make installkernel KERNCONF=MYKERNEL

6. Reboot.
Note: If you do not use the KERNCONF make variable a GENERIC kernel will be built and installed.


The make(1) process will have built two kernels. /usr/obj/usr/src/sys/MYKERNEL/kernel and /usr/obj/usr/src/sys/MYKERNEL/kernel.debug. kernel was installed as /boot/kernel/kernel, while kernel.debug can be used as the source of debugging symbols for kgdb(1).

To make sure you capture a crash dump, you need edit /etc/rc.conf and set dumpdev to point to your swap partition (or AUTO). This will cause the rc(8) scripts to use the dumpon(8) command to enable crash dumps. You can also run dumpon(8) manually. After a panic, the crash dump can be recovered using savecore(8); if dumpdev is set in /etc/rc.conf, the rc(8) scripts will run savecore(8) automatically and put the crash dump in /var/crash.

Note: FreeBSD crash dumps are usually the same size as the physical RAM size of your machine. That is, if you have 512 MB of RAM, you will get a 512 MB crash dump. Therefore you must make sure there is enough space in /var/crash to hold the dump. Alternatively, you run savecore(8) manually and have it recover the crash dump to another directory where you have more room. It is possible to limit the size of the crash dump by using options MAXMEM=N where N is the size of kernel's memory usage in KBs. For example, if you have 1 GB of RAM, you can limit the kernel's memory usage to 128 MB by this way, so that your crash dump size will be 128 MB instead of 1 GB.

Once you have recovered the crash dump, you can get a stack trace with kgdb(1) as follows:
kgdb /usr/obj/usr/src/sys/MYKERNEL/kernel.debug /var/crash/vmcore.0
(kgdb) backtrace

Note that there may be several screens worth of information; ideally you should use script(1) to capture all of them. Using the unstripped kernel image with all the debug symbols should show the exact line of kernel source code where the panic occurred. Usually you have to read the stack trace from the bottom up in order to trace the exact sequence of events that lead to the crash. You can also use kgdb(1) to print out the contents of various variables or structures in order to examine the system state at the time of the crash.

Tip: Now, if you are really insane and have a second computer, you can also configure kgdb(1) to do remote debugging such that you can use kgdb(1) on one system to debug the kernel on another system, including setting breakpoints, single-stepping through the kernel code, just like you can do with a normal user-mode program.

Note: If you have DDB enabled and the kernel drops into the debugger, you can force a panic (and a crash dump) just by typing panic at the ddb prompt. It may stop in the debugger again during the panic phase. If it does, type continue and it will finish the crash dump.

Taken from: FreeBSD advanced FAQ
lik
Founder
Founder
 
Posts: 497
Joined: Wed Dec 15, 2010 3:21 am


Return to FreeBSD specific

 


  • Related topics
    Replies
    Views
    Last post
cron