North American Network Operators Group

Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical

MTU problems with GRE tunnels (fwd)

  • From: Jens Schweikhardt
  • Date: Mon Jul 06 11:59:54 1998

To whom it may concern,

here is some email I received in the last months followed by
some of my observations which might be related to the problems
discussed. I have posted my obervations to comp.sys.dcom.cisco
and opened a trouble ticket with cisco's technical assistance center.

# Forwarded message:
# > From merit.edu!errors-nohumans Fri Jun  5 23:44:49 1998
# > Message-Id: <3.0.3.32.19980605095358.006ebd4c@mailhost.ip-plus.net>
# > X-Sender: bridge@mailhost.ip-plus.net
# > X-Mailer: QUALCOMM Windows Eudora Pro Version 3.0.3 (32)
# > Date: Fri, 05 Jun 1998 09:53:58 +0100
# > To: nanog@merit.edu
# > From: philip bridge <bridge@ip-plus.net>
# > Subject: MTU problems with GRE tunnels
# > Mime-Version: 1.0
# > Content-Type: text/plain; charset="us-ascii"
# > Sender: owner-nanog@merit.edu
# > Content-Length: 1881
# > 
# > I'm experiencing problems with fragmentation due to Cisco GRE tunnel
# > overhead: the way I understand it, the MTU if a GRE tunnel will always be
# > less than the MTU of the underlying IP cloud (in our case 1500 bytes) due
# > to the IP encapsulation overhead. So 1500 byte packets attempting to
# > traverse the tunnel will be fragmented, or dropped if the DF bit is set, in
# > which case an ICMP message is send back to the originating host
# > 
# > We're trying to use GRE tunnels extensivly in some fancy added-value
# > Internet services, and it seems that there is a small but significant
# > amount of application traffic out there that has problems when traversing a
# > GRE tunnel with MTU < 1500. We've seen two problems:
# > 
# > - 1500 byte packets with DF set. This is either application traffic, or MTU
# > path discovery is broken, because the same packets get sent repeatedly
# > - 1500 byte packets get fragmented, but the destination host cannot cope
# > with the fragmentation (firewall issues?)
# > 
# > We see this on a variety of platforms (from 2500, 7507) and a variety of
# > IOS releases (11.1(18)CC, 11.1(2), 11.2(5). Talking to another provider
# > indicates that the same problem exists with other vendors, and is having
# > the same severe impact.
# > 
# > Thinking about it, this is a problem is to be expected with IP tunnels of
# > all types, but I am surprised at the extent it's influence on our
# > customer's applications (such as large emails). I do not want to overstate
# > the proportion of traffic we see with this problem - but it does seem to be
# > enough to render GRE tunnels very problematic - to say the least. But I
# > know lots of people are using GRE for this or similar applications...so
# > what am I missing here.
# > 
# > thanks in advance for help/tips
# > 
# > Phil
# > 
# > 
# > 
# > ______________________________________________________________
# > Philip Bridge	
# > ++41 31 688 8262	bridge@ip-plus.net     www.ip-plus.ch
# > PGP: DE78 06B7 ACDB CB56 CE88 6165 A73F B703
# > 
# 
# 
# -- 
# Bernhard Kroenung, Bahnhofstr 8, 36157 Ebersburg/Rhoen, Germany +49 6656 910101
# @work : bernhard@kroenung.de                              Work: +49 661 9011777
# @home : horke@Rhoen.De       @school : Bernhard.Kroenung@Informatik.FH-Fulda.De
# 

hello, world\n

Here's something very strange I observe with GRE tunnels (the default
tunnel mode). It looks like cisco routers send IP datagrams violating RFC 791
[Internet Protocol] over GRE tunnels. In particular, the length field of
the IP header is computed incorrectly to *not* include the size of the
IP header. RFC 791 says about the length field:

<quote>

  Total Length:  16 bits

    Total Length is the length of the datagram, measured in octets,
    including internet header and data.  This field allows the length of
    a datagram to be up to 65,535 octets.  ...

</quote>

I have an application on my workstation that serves as one endpoint
of a GRE tunnel. In fact, it's such a tiny perl program that I have
appended it at the end of this mail.

Here's the tunnel config on my cisco, which is a
IOS (tm) 4500 Software (C4500-P-M), Version 11.2(9), RELEASE SOFTWARE (fc1):

interface Tunnel2
 description GRE Test Tunnel
 ip address 10.0.0.1 255.255.255.252
 tunnel source 193.174.247.254           !another iface of this cisco
 tunnel destination 193.174.247.193      !my workstation's address
 tunnel key 42                           !optional

Let's ping the other end of the tunnel:
io#ping 10.0.0.2
 
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.0.0.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)

Here's what the perl tunnel endpoint outputs:
Length of received packet: 128   <<<<<<<<< Note this
version:      4
header len:   5
tos:          0
length:       108                <<<<<<<<< Note this
id:           1586
flags:        0
offset:       0
ttl:          255
protocol:     47
chksum:       16895
source:       193.174.247.254
destination:  193.174.247.193
 20 00 08 00 00 00 00 2a 45 00 00 64 01 39 00 00
 ff 01 a6 5d 0a 00 00 01 0a 00 00 02 08 00 51 68
 00 00 23 a5 00 00 00 01 9a 8b 6e b0 ab cd ab cd
 ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
 ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
 ab cd ab cd ab cd ab cd ab cd ab cd ab cd ab cd
 ab cd ab cd ab cd ab cd ab cd ab cd

Or let's try a telnet session:
io#telnet 10.0.0.2
Trying 10.0.0.2 ... 

Length of received packet: 72   <<<<<<<<< Note this
version:      4
header len:   5
tos:          0
length:       52                <<<<<<<<< Note this
id:           1591
flags:        0
offset:       0
ttl:          255
protocol:     47
chksum:       16946
source:       193.174.247.254
destination:  193.174.247.193
 20 00 08 00 00 00 00 2a 45 00 00 2c 00 00 00 00
 ff 06 a7 c9 0a 00 00 01 0a 00 00 02 52 02 00 17
 52 c8 26 04 00 00 00 00 60 02 10 c0 a8 9a 00 00
 02 04 05 98

We note that the length as reported in the IP header is
always 20 octets less than what we receive on the socket.
This leads me to the question

  Do you cisco guys read RFCs? :-)

Regards,

	Jens Schweikhardt
-- 
##   Network Operation Center,  DFN-Verein Geschäftsstelle Stuttgart   ##
## http://www.noc.dfn.de/ finger trouble@noc.dfn.de wartung@noc.dfn.de ##
##               >>>>>>  mailto:  noc@noc.dfn.de  <<<<<<               ##



Here's my perl script:

#!/usr/local/bin/perl5 -w
#
# GRE Tunnel Endpoint; prints all GRE packets received.
#
# Author: Jens Schweikhardt <schweikh@noc.dfn.de>
#
# >>> You probably need root permission to open the raw socket. <<<

use Socket qw (SOCK_RAW PF_INET);
use strict;

my $gre = 47; # Generic Routing Encapsulation
my $rbits;    # bitmask with read file descriptors for select
my $out;      # writable copy of rbits for select to clobber
my $nready;   # return value from select

unless (socket (SOCKET, &PF_INET(), &SOCK_RAW(), $gre)) {
    print STDERR "gre socket: $!\n";
    exit 1;
}
$rbits = ''; vec ($rbits, fileno SOCKET, 1) = 1;
for (;;) {
    $nready = select ($out = $rbits, undef, undef, undef);
    last unless defined $nready; # Should not happen...
    &receive_packet () if $nready; # A packet is waiting
}
close SOCKET;
exit 0;

sub receive_packet {
    my $from_msg = '';
    my $from_saddr = recv (SOCKET, $from_msg, 1500, 0);
    unless (defined $from_saddr) {
        print STDERR "recv: $!\n";
        return 0;
    }
    print "\nLength of received packet: ", length ($from_msg), "\n";
    my ($delivery_ip_version,
        $delivery_ip_ihl,
        $delivery_ip_tos,
        $delivery_ip_length,
        $delivery_ip_id,
        $delivery_ip_flags,
        $delivery_ip_offset,
        $delivery_ip_ttl,
        $delivery_ip_proto,
        $delivery_ip_chksum,
        $delivery_ip_src,
        $delivery_ip_dst,
        $delivery_ip_options,
        $delivery_ip_data
    ) = &ip_unpack ($from_msg);

	print "version:      $delivery_ip_version\n";
	print "header len:   $delivery_ip_ihl\n";
	print "tos:          $delivery_ip_tos\n";
	print "length:       $delivery_ip_length\n";
	print "id:           $delivery_ip_id\n";
	print "flags:        $delivery_ip_flags\n";
	print "offset:       $delivery_ip_offset\n";
	print "ttl:          $delivery_ip_ttl\n";
	print "protocol:     $delivery_ip_proto\n";
	print "chksum:       $delivery_ip_chksum\n";
    printf "source:       %u.%u.%u.%u\n",
		unpack ('C4', pack ('L', $delivery_ip_src));
    printf "destination:  %u.%u.%u.%u\n",
		unpack ('C4', pack ('L', $delivery_ip_dst));
	&dump ($delivery_ip_data);
}

sub dump {
    my $len = length ($_[0]);
    if ($len > 0) {
        my @octet = split //, $_[0];
        my $i;
        for ($i = 1; $i <= $len; ++$i) {
            printf " %02x", unpack ('C', $octet[$i-1]);
            print "\n" unless $i % 16;
        }
		print "\n" if $i % 16;
    } else {
        print " [NO DATA]\n";
    }
}

# Format of an IP packet, RFC 791.
#
sub ip_unpack {
    my $packet = shift;
    if (length ($packet) < 20) {
        print STDERR "ip packet too short: ", length ($packet), " bytes\n";
        exit 1;
    }
    my (
        $version,
        $tos,
        $length,
        $id,
        $flags,
        $ttl,
        $proto,
        $chksum,
        $src,
        $dst
    ) = unpack ('CCnnnCCnNN', $packet);
    my $ihl = $version & 017;
    $version >>= 4;
    if ($version != 4) {
        print STDERR "ip version mismatch, expected 4, got $version\n";
        exit 1;
    }
    my $offset = $flags & 017777;
    $flags >>=13;
    my $options = substr ($packet, 20, $ihl * 4 - 20);
    my $data = substr ($packet, $ihl * 4);
    return (
        $version,
        $ihl,
        $tos,
        $length,
        $id,
        $flags,
        $offset,
        $ttl,
        $proto,
        $chksum,
        $src,
        $dst,
        $options,
        $data
    );
}