Re: [PATCH] new UDPCP Communication Protocol

Previous thread: [PATCH] watchdog: Improve failure message and documentation by Ben Hutchings on Sunday, January 2, 2011 - 3:00 pm. (3 messages)

Next thread: Measuring startup-time from userspace by Taras Glek on Sunday, January 2, 2011 - 3:40 pm. (3 messages)
From: stefani
Date: Sunday, January 2, 2011 - 3:39 pm

From: Stefani Seibold <stefani@seibold.net>

Changelog:
31.12.2010 first proposal
01.01.2011 code cleanup and fixes suggest by Eric Dumazet
02.01.2011 kick away UDP-Lite support
           change spin_lock_irq into spin_lock_bh
	   faster udpcp_release_sock
	   base is now linux-next
02.01.2011 fix camel style
           fix coding style
	   fix types in comments
	   add per socket max. connection limit (pevents against abuse)
	   make udpcp adjustable through /proc/sys/net/ipv4/udpcp_

UDPCP is a communication protocol specified by the Open Base Station
Architecture Initiative Special Interest Group (OBSAI SIG). The
protocol is based on UDP and is designed to meet the needs of "Mobile
Communcation Base Station" internal communications. It is widely used by
the major networks infrastructure supplier.

The UDPCP communication service supports the following features:

-Connectionless communication for serial mode data transfer
-Acknowledged and unacknowledged transfer modes
-Retransmissions Algorithm
-Checksum Algorithm using Adler32
-Fragmentation of long messages (disassembly/reassembly) to match to the MTU
 during transport:
-Broadcasting and multicasting messages to multiple peers in unacknowledged
  transfer mode

UDPCP supports application level messages up to 64 KBytes (limited by 16-bit
packet data length field). Messages that are longer than the MTU will be
fragmented to the MTU.

UDPCP provides a reliable transport service that will perform message
retransmissions in case transport failures occur.

The code is also a nice example how to implement a UDP based protocol as
a kernel socket modules.

Due the nature of UDPCP which has no sliding windows support, the latency has
a huge impact. The perfomance increase by implementing as a kernel module is
about the factor 10, because there are no context switches and data packets or
ACKs will be handled in the interrupt service.

There are no side effects to the network subsystems so i ask for merge it
into ...
From: Eric Dumazet
Date: Sunday, January 2, 2011 - 3:49 pm

Hmm, so 'connections' is increased, never decreased.

This seems a fatal flaw in this protocol, since a malicious user can
easily fill the list with garbage, and block regular communications.



--

From: Stefani Seibold
Date: Sunday, January 2, 2011 - 3:55 pm

You are right, there is now way to detect which connection is no longer
needed. I have not designed this protocol, so i cannot fix it. 

But in our environment this will be used together with an firewall
and/or ipsec. In this case it it safe.


--

From: Jesper Juhl
Date: Sunday, January 2, 2011 - 4:04 pm

Hmm, the first thing that springs into my head as a possible band-aid 
(which is probbaly wrong for many reasons I've not considered, so feel 
free to shoot it down) is; couldn't we use a timer (set to some outrageous 
high value by default and admin tunable) that would decrement 
'connections' (discount dead connections) when there has not been any 
acctivity for a huge period of time? Kill off connections that have been 
idle for ages.

Not perfect, but that would at least let the system recover after a while 
if a malicious client did something nasty with many connections...


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.
From: Stefani Seibold
Date: Monday, January 3, 2011 - 2:08 am

This will not work for two reasons:

- First there is no way to detect a dead connection. A connection can
stay for a very long time without data transfer.

- Second it will not save against a attack where all communication slots
will be eaten by an attacker and then new valid connections will be not
handled.

The only thing what is possible to make an ioctl call which allows the
user land client to cancel connections. 

But this will be in my opinion dead code, because white lists of trusted
address must be fostered and this will make the upgrading of a
infrastructure to complicate.


--

From: Eric Dumazet
Date: Monday, January 3, 2011 - 2:27 am

Yep, and as UDP messages can easily spoofed, this means you need more
than a list of trusted addresses. You also need to encapsulate the thing
in an secured layer.

Stefani, your implementation has very litle chance being added in
standard kernel, because it is not correctly layered, or documented.

Copying hundred (thousand ?) of lines from existing code only shows
there is a design error in your proposal. It means every time we have to
make a change in this code, we'll have to do it twice.

SUNRPC uses UDP/TCP sockets, and use callbacks to existing UDP/TCP code,
maybe you should take a look to implement an UDPCP stack in kernel.

For instance, a pure socket API seems not the correct choice for UDPCP,
since a transmit should give a report to user, of frame being
delivered/aknowledged or not to/by the remote side ?

With send(), this means you have only one message in transit, no
asynchronous handling.

At least you forgot to document the API, and restrictions.



--

From: Stefani Seibold
Date: Monday, January 3, 2011 - 2:54 am

I copied about 400 of 3000 lines with was heavy modified to need my
needs. And i use only document features of the linux IP stack. So it is
normal to have duplicate code for the basics.

How can you do a routing, how can you determinate the MTU of the route.
This are basics. Look into other code how this things will be handled is
in my opinion the right way, since there a no function provide to do
this.

Otherwise you can say the same about all the filesystem or PCI
drvivers , which do also a lot in the same way. But since this is the

I have looked around the whole LINUX source code, also in the SUNRPC

This will be done through the error queue. The user client will receive

No, the messages will be queued. You can have more than a messages in

API documentation is still there, i can these provide under
Documentation/udpcp.txt if you like.

Here is the API documentation:

Socket interface programming manual

The socket interface is a derivate of the UDP sockets. All setsockopt(),
getsockopt() and ioctl() kernel system calls  which are valid for UDP
sockets should work on UDPCP sockets. There are some extensions to the
sockopt and ioctl interface for the UDPCP sockets.

Include the C header file <net/udpcp.h> to use the UDPCP socket options
and ioctl calls.

A UDPCP can be opened with socket(PF_INET, SOCK_DGRAM, PF_UDPCP). All
operation which are valid for UDP sockets can also performed with UDPCP
sockets.

sockopt

The setsockopt and getsockopt are defined as following:

int getsockopt(int sockfd, int level, int optname, void *optval,
socklen_t *optlen);

int setsockopt(int sockfd, int level, int optname, const void *optval,
socklen_t optlen);

The level parameter for the UDPCP socket is SOL_UDPCP, where the
following options are defined:

UDPCP_OPT_TRANSFER_MODE - set default transfer mode. The optval is one
of the following:

UDPCP_NOACK - no ACK for the transmitted message is requiered

UDPCP_ACK - a ACK for each transmitted message fragment is ...
From: Eric Dumazet
Date: Monday, January 3, 2011 - 3:39 am

Hmm, how user land can perform this task then ?

Is there an open source implementation of UDPCP ?

What are its problems ? You say its dog slow, I really wonder why.
UDP stack is pretty scalable these days, yet some improvements are
possible.

Why not adding generic helpers if you believe you miss some

These drivers are here because of high performance on top of high
performance specs.

While UDPCP is only a layer above UDP. If the problem comes from UDP
being too slow, it'll be slow too.



--

From: Stefani Seibold
Date: Monday, January 3, 2011 - 7:08 am

Userspace is much more complicate and more overhead than kernel space.


UDP is fast... but UDPCP depends extremely on latency due the missing of

Maybe i don't have the knowledge, maybe i don't have the time. Getting
in new API functions into LINUX is much more complicate than getting new
driver into LINUX. I know what i am talk, it takes me one year to the

Because of latency. Handling the UDPCP into the data_read() bh function
is much faster:
- No context switch
- Assembly Multi-Fragment Message is very efficient using skb buffer
chaining.
- Immediately handling an ack or data message save a lot of latency

Implementing it in User Space is to slow, due the context switches. Also
the sunrpc approach is not faster due the using of kernel threads which
are not better than user space (okay, a little bit because not switching
the MMU).

The implementation is clean. I did fix all issues what i was asked for.
The protocol has now absolut no side effects. So i ask again for merge
into linux-next.

- Stefani


--

Previous thread: [PATCH] watchdog: Improve failure message and documentation by Ben Hutchings on Sunday, January 2, 2011 - 3:00 pm. (3 messages)

Next thread: Measuring startup-time from userspace by Taras Glek on Sunday, January 2, 2011 - 3:40 pm. (3 messages)