I/O Streams

The standard(s)

Orientation

The ISO C standard has a concept of "stream orientation": A stream can be either wide (i.e. wchar_t) or narrow (i.e. char) oriented.

A stream is created with no orientation. The first I/O to a file, or call to fwide with a non-zero orientation, sets the orientation of the stream. Thereafter, its orientation can only be changed by freopen.

Functions which open streams

The standard only defines the following functions which open streams:

  • fopen
  • tmpfile
  • popen

plus the standard I/O streams need consideration

On most platforms, these can be unified to one implementation, though special support for the standard I/O streams and processes may be desirable

POSIX defines the following additional functions:

  • fdopen
  • fmemopen
  • _open_wmemstream
  • _open_rmemstream

The first one may be unified, but non-POSIXY platforms with a POSIX emulation layer may wish to have a separate implementation for this.

The latter three most definitely need a differing implementation

Existing Implementations

  • Microsoft CRT: Ignores stream orientation. All I/O is done by converting to bytes
  • glibc: Streams are oriented. Byte operations on wide streams (and vise versa) are not guaranteed to work
  • FreeBSD: Streams are oriented

Considerations

Platform wide streams

Some platforms may wish to expose streams which are natively wide, for a number of reasons. For example, the Win32 console is natively Unicode, and byte IO to it is translated. This is especially interesting case, because the character set used for translation of console IO may be different from the character set preferred for other byte I/O

Therefore, it should be possible for a platform to have wide strings sent directly to the I/O routines.

Application behaviour

Applications are built of many components; those components may not check the orientation of the stream before doing I/O. While this is nonconforming, we should probably attempt to accommodate this as best as possible (probably providing a "strict" mode which would violently diagnose these errors)

Resolution

  • The need for differing I/O routines means we will support a "changable backends" system, with structure like C++ vtables
  • A backend declares that it supports native wide I/O by providing "wwrite" and "wread" routines, which read/write wide characters
    • If these are not provided, PDCLib will perform translation to the stream MBCS whenever they would be invoked
  • A backend is always expected to support narrow I/O
    • I/O by narrow oriented functions will always be forwarded through the narrow I/O routines
  • As an extension to the standard, PDCLib will automatically switch the orientation of a stream when
    • I/O of a different orientation is provided, and
    • The multibyte encoding/decoding state is in the initial shift state

Proposed Backend Interface

typedef union _PDCLIB_fd_t {
uintptr_t u;
intptr_t i;
void *p;
} _PDCLIB_fd_t;
struct _PDCLIB_file_vtbl {
size_t (*write) (_PDCLIB_fd_t fd, const char    *restrict buf, size_t len, fpos_t* restrict pos);
size_t (*read) (_PDCLIB_fd_t fd, char *restrict buf, size_t len, fpos_t* restrict pos); 
size_t (*wwrite)(_PDCLIB_fd_t fd, const wchar_t *restrict buf, size_t len, fpos_t* restrict pos);
size_t (*wread) (_PDCLIB_fd_t fd, wchar_t *restrict buf, size_t len, fpos_t* restrict pos);

// Other I/O functions - seek/close/etc 
}; 

All the functions take the following parameters:

bufThe buffer to read into/write the contents of
lenThe number of characters to read/write (in terms of the respective character type)
posThe stream position. The stream offset and shift state should be updated as appropriate

They should return the number of characters written, or EOF on failure.