I/O Streams
The standard(s)
Orientation
The ISO C standard has a concept of "stream orientation": A stream can be either wide (i.e. wchar_t
) or narrow (i.e. char
) oriented.
A stream is created with no orientation. The first I/O to a file, or call to fwide
with a non-zero orientation, sets the orientation of the stream. Thereafter, its orientation can only be changed by freopen
.
Functions which open streams
The standard only defines the following functions which open streams:
fopen
tmpfile
popen
plus the standard I/O streams need consideration
On most platforms, these can be unified to one implementation, though special support for the standard I/O streams and processes may be desirable
POSIX defines the following additional functions:
fdopen
fmemopen
_open_wmemstream
_open_rmemstream
The first one may be unified, but non-POSIXY platforms with a POSIX emulation layer may wish to have a separate implementation for this.
The latter three most definitely need a differing implementation
Existing Implementations
- Microsoft CRT: Ignores stream orientation. All I/O is done by converting to bytes
- glibc: Streams are oriented. Byte operations on wide streams (and vise versa) are not guaranteed to work
- FreeBSD: Streams are oriented
Considerations
Platform wide streams
Some platforms may wish to expose streams which are natively wide, for a number of reasons. For example, the Win32 console is natively Unicode, and byte IO to it is translated. This is especially interesting case, because the character set used for translation of console IO may be different from the character set preferred for other byte I/O
Therefore, it should be possible for a platform to have wide strings sent directly to the I/O routines.
Application behaviour
Applications are built of many components; those components may not check the orientation of the stream before doing I/O. While this is nonconforming, we should probably attempt to accommodate this as best as possible (probably providing a "strict" mode which would violently diagnose these errors)
Resolution
- The need for differing I/O routines means we will support a "changable backends" system, with structure like C++ vtables
- A backend declares that it supports native wide I/O by providing "wwrite" and "wread" routines, which read/write wide characters
- If these are not provided, PDCLib will perform translation to the stream MBCS whenever they would be invoked
- A backend is always expected to support narrow I/O
- I/O by narrow oriented functions will always be forwarded through the narrow I/O routines
- As an extension to the standard, PDCLib will automatically switch the orientation of a stream when
- I/O of a different orientation is provided, and
- The multibyte encoding/decoding state is in the initial shift state
Proposed Backend Interface
typedef union _PDCLIB_fd_t {
uintptr_t u;
intptr_t i;
void *p;
} _PDCLIB_fd_t;
struct _PDCLIB_file_vtbl {
size_t (*write) (_PDCLIB_fd_t fd, const char *restrict buf, size_t len, fpos_t* restrict pos);
size_t (*read) (_PDCLIB_fd_t fd, char *restrict buf, size_t len, fpos_t* restrict pos);
size_t (*wwrite)(_PDCLIB_fd_t fd, const wchar_t *restrict buf, size_t len, fpos_t* restrict pos);
size_t (*wread) (_PDCLIB_fd_t fd, wchar_t *restrict buf, size_t len, fpos_t* restrict pos);
// Other I/O functions - seek/close/etc
};
All the functions take the following parameters:
buf | The buffer to read into/write the contents of |
---|---|
len | The number of characters to read/write (in terms of the respective character type) |
pos | The stream position. The stream offset and shift state should be updated as appropriate |
They should return the number of characters written, or EOF on failure.