Situation: unfs3 server running on a small nfs server (machine named
tick (IP: 192.168.142.1; see ipstacks-config.txt for details)),
exports its /home to a machine named tack (IP: 192.168.142.2):

[root@tick ~]# cat /etc/exports
/home        tack(rw,no_subtree_check,root_squash)
/usr/local/src/mldonkey/incoming    tack(rw,root_squash)
#/var/tmp/xmule-files    tack(rw,root_squash)
[root@tick ~]# 

unfsd is run on tick like this:


olaf      1033  0.0  1.0   6912  5208 ?        Ss   06:10   0:01 /usr/sbin/unfsd -i /home/olaf/unfsd.pid -u -n 2049 -m 2049 -l 192.168.142.1 -s -p


On tack, this is mounted on /tickhome:

tick:/home on /tickhome type nfs (rw,udp,port=2049,mountport=2049,nfsvers=3,mountvers=3,addr=192.168.142.1)


The immediate problem is that a "ls /tickhome/olaf" on tack hangs
indefinetely.


When straceing the unfsd process, you can see that it receives NFS
readdir calls every few seconds, and it also (apparently) answers them
correctly with an NFS response packet.

However, when I sniff the connection using tcpdump or wireshark, it
looks like the response packets aren't valid UDP packets (at least
wireshark doesn't recognize them as such) (see screenshot in
wireshark.png). The hexdump area of the packet looks like it does
contain the correct file names of the directory to be listed, though
(I'm not a NFS protocol expert).

This leads me to believe that the ls call (or rather, the NFS client
in the kernel that's invoked by it) doesn't recognize/receive
the NFS response packets and thus hangs. The root of the problem would
be the fact that unfsd doesn't send valid UDP packets in this case.


UPDATE:

The problem was that I issued a "ip link set eth0 mtu 1400" call on
tack a few days earlier, reducing the MTU of the interface from 1500
to 1400 and thus, apparently, causing the IP stack of tick (which
recognized the changed MTU on tack?) to FRAGMENT the UDP packets sent
out by unfsd. This lead to the "incomplete" IP datagrams seen in
wireshark and thus to the problem of the NFS client on tack no longer
recognizing the fragments as UDP packets.

"ip link set eth0 mtu 1500" dealt with the problem.

I'm not sure whether the fact that unfs can't handle the smaller MTU
is to be considered a bug in unfs.

See
e.g. http://lists.apple.com/archives/darwin-development/2004/May/msg00069.html
(this is OSX though) -- according to that, the userspace would have to
find out the MTU and do the fragmentation itself -- so it is a unfs
bug?

It I understand this correctly, the sender must include an IP (and
UDP?) header in every IP fragment (not: packet) it sends. An IP
fragment is a part of an IP packet that fits into a layer 2 PDU. The
IP header contains a field "fragment offset" that specifies where in
the packet the fragment is located. Shouldn't the IP stack do all
this, even for UDP?


UPDATE2:

Looking at the strace again, it seems unfsd sent UDP packets with a
length of 4100 bytes. This would mean that UDP fragmentation should
occur even at MTU=1500. So maybe the IP stack on tick, not unfsd, is
the culprit after all? (for not finding out that tack's MTU was
smaller than 1500 -- so maybe path MTU discovery didn't work, possibly
because of some firewalling issue?)


working-ls.pcap contains the dump of a successful "ls
/tickhome/olaf/utils/" (after the MTU was changed to 1500)


UPDATE3:

Problem can be reproduced with netcat:

tack:~# ip link set eth0 mtu 1400
tack:~# nc -l -u -p 6543
(hangs, doesn't output anything)

[root@tick /etc/init.d]# ls -l ~/dead.letter 
-rw------- 1 root root 7941 2006-04-22 21:01 /root/dead.letter
[root@tick /etc/init.d]# cat ~/dead.letter | nc -u tack 6543
(hangs)

(only a fragment of the data is sent -- apparently the last 549 bytes
-- see dead.letter.udptransfer-mtu1500-to-1400.pcap)

(with mtu 1500 on tack it works as expected)


on tick (sender):

socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_DFL}, 8) = 0
alarm(0)                                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
connect(3, {sa_family=AF_INET, sin_port=htons(6543), sin_addr=inet_addr("192.168.142.2")}, 16) = 0
rt_sigaction(SIGALRM, {SIG_IGN}, {SIG_IGN}, 8) = 0
alarm(0)                                = 0
select(16, [0 3], NULL, NULL, NULL)     = 1 (in [0])
read(0, "To: root\nSubject: Debconf: Confi"..., 8192) = 7941
write(3, "To: root\nSubject: Debconf: Confi"..., 7941) = 7941     //Sending
select(16, [0 3], NULL, NULL, NULL)     = 1 (in [0])
read(0, "", 8192)                       = 0
close(0)                                = 0
select(16, [3], NULL, NULL, NULL


on tack (receiver):

socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(3, {sa_family=AF_INET, sin_port=htons(6543), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
rt_sigaction(SIGALRM, {SIG_IGN, [ALRM], SA_RESTORER|SA_RESTART, 0x7f0336d2d1f0}, {SIG_DFL, [], 0}, 8) = 0
alarm(0)                                = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
recvfrom(3, 

//(receives nothing (local IP stack didn't deliver the received
//fragment because it's incomplete))


UPDATE4: Apparently, the problem occurs only when sending from tick to
tack. When sending from teck (Ubuntu laptop, 192.168.142.130) to cat,
UDP works even in the presence of changed MTUs. So the bridge code on
tick is the culprit? (there's a linux kernel bridge on tick that
bridges the local ethernet and WLAN devices)


For details of the IP configurations of the participating machines,
see ipstacks-config.txt.