2021年3月25日星期四

Unspecified read() behaviour if concurrently used on UTF8 glyphs

POSIX STANDARD

Posix defines prototype for the read() function:

ssize_t read(int fildes, void *buf, size_t nbyte);  

Posix standard also states that:

The read() function shall attempt to read nbyte bytes from the file associated with the open file descriptor, fildes, into the buffer pointed to by buf. The behavior of multiple concurrent reads on the same pipe, FIFO, or terminal device is unspecified.

QUESTION

Now I question, how should a person then read for example UTF8 glyphs (that can have even 1-4 bytes) that someone sends on a terminal that is configured as raw (it immediately transmits whatever characters are inserted).

In other words, how do I make my C program work?

C EXAMPLE

I tried writting a simple C program:

// C headers  #include <stdio.h>  #include <stdlib.h>  #include <string.h>  #include <errno.h>    // POSIX headers  #include <fcntl.h>  #include <unistd.h>  #include <termio.h>      ////////////////////////////////////////////////////////////////////////////////////////////////////  // Function's paragraphs:  //  // A: O_NOCTTY flag tells UNIX that this program doesn't want to be the "controlling terminal" for  //    that port. If you don't specify this then any input (such as keyboard abort signals and so  //    forth) will affect your process. Normally a user program does not want this behavior.  //  //    O_NDELAY flag tells UNIX that this program doesn't care what state the "data carrier detect"  //    (DCD) signal line is in - whether the other end of the port is up and running. If you do not  //    specify this flag, your process will be put to sleep until the DCD signal line is the space  //    voltage.  //  // B: Notify user if port wasn't opened sucesfully. If port was opened sucesfully get the current  //    port "status" flags (O_APPEND, O_DSYNC, O_NONBLOCK, O_RSYNC, O_SYNC) and set them all to 0.  //  // C: Get attributres for "fd".  //  // D: Set input/output baud rate.  //  // E: Enable "c_cflag" bits:  //  //    CREAD = 1   Enable reciever,  //    CLOCAL = 1  Enable local mode.  //  // F: Unset entire CSIZE field and then only enable apply apply flag CS8:  //  //    CSIZE = 0   Mask for all the bits that set the "data bits"  //    CS8 = 1     Enable 8 data bits.  //  // G: Unset PARENB bit and CSTOPB bit.  //  //    PARENB = 0  We select not to support "parity".  //    CSTOPB = 0  We select not to implement 2 "stop bits", but only one.  //  // H: We select "raw" input opposed to "cannonical". "raw" means that every character entered in the  //    terminal is immediately sent. On "cannonical" we can insert and delete characters until we  //    press enter.  //  //    ICANON = 0  //    ECHO = 0  //    ECHOE = 0  //    ISIG = 0  //  // I: Dissable software flow control. Therefore no characters sent over the serial are interpretted  //    as commands.  //  // J: Immediately (TCSANOW) apply the changed attributes.  //  // Return the file descriptor for the opened terminal.  ////////////////////////////////////////////////////////////////////////////////////////////////////  int open_port(char * terminal){        int fd;      int fl;      struct termios options;      // A:      fd = open(terminal, O_RDWR | O_NOCTTY | O_NDELAY);      // B:      if(fd == -1){          printf("E: open_port(): Failed opening port %s: %s\n", terminal, strerror(errno));      }      else{          printf("I: open_port(): Succeeded opening port %s\n", terminal);          fl = fcntl(fd, F_GETFL);          fcntl(fd, F_SETFL, fl & ~(O_APPEND | O_DSYNC | O_NONBLOCK | O_RSYNC | O_SYNC));      }      // C:      tcgetattr(fd, &options);      // D:      cfsetispeed(&options, B19200);      cfsetospeed(&options, B19200);      // E:      options.c_cflag |= (CLOCAL | CREAD);      // F:      options.c_cflag &= ~CSIZE;      options.c_cflag |= CS8;      // G:      options.c_cflag &= ~PARENB;      options.c_cflag &= ~CSTOPB;      // H:      options.c_lflag &= ~(ICANON | ECHO | ECHOE | ISIG);      // I:      options.c_iflag &= ~(IXON | IXOFF | IXANY);      // J:      tcsetattr(fd, TCSANOW, &options);      return(fd);  }      ////////////////////////////////////////////////////////////////////////////////////////////////////  // Function's paragraphs:  //  // A: Notify user to pass an argument i.e. path to a port file.  // B: Open the port file and store it's file descriptor in "port".  // C: Write to the "port".  // D: Function read() stops the program untill it reads anything and puts it in the "buffer".  //    Read a single glyph from the "port" indefinitely untill x is received. Single UTF-8 glyph can  //    take 4 bytes and glyphs have no '\0' at the end. Only strings have terminating '\0'!  // E: Close the port and return 0.  ////////////////////////////////////////////////////////////////////////////////////////////////////  int main(int argc, char * argv[]){        int port;      int status;      char buffer[5];      int buffer_size = sizeof(buffer);      char * string = "WRITE SOMETHING HERE: ";      unsigned long size;      // A:      if(argc < 2){          printf("E: main(): Supply exactly one argument that points to a port");          return EXIT_FAILURE;      }      // B:      port = open_port(argv[1]);      if(port == -1){          return EXIT_FAILURE;      }      // C:      size = write(port, string, strlen(string));      if(size == -1){          printf("E: write(): Writting %ld bytes failed: %s\n", size, strerror(errno));          close(port);          return EXIT_FAILURE;      }      else{          printf("I: write(): Writting %ld bytes successfully\n", size);      }      // D:      while(1){          memset(buffer, '\0', buffer_size);          status = read(port, buffer, buffer_size);            if(buffer[0] == 'x') break;            if(status == -1){              printf("E: read(): Reading failed: %s\n", strerror(errno));              close(port);              return EXIT_FAILURE;          }          else printf("I: read(): Reading suceeded! Read %d byte(s): %s\n", status, buffer);      }      // E:      close(port);      return EXIT_SUCCESS;  }  

This compiles and runs in a terminal A that reads from the terminal B (/dev/pts/0).

Program behaves almost as expected. When I send ASCII glyphs using terminal B and terminal A reads them (using read()) 99% of the time. But sometimes glyph is not read and it is echoed in the terminal B.

Situation gets worse when I try to send & read UTF8 glyphs. When press and hold the keyboard key ž (this glyph has 2 bytes) for some reason 1st byte is transmitted and 2nd byte is not. Consequently program reads in terminal A and is echoed in terminal B. Sometimes ž glyphs are successfully sent and aren't displayed in terminal B.

enter image description here

https://stackoverflow.com/questions/66806476/unspecified-read-behaviour-if-concurrently-used-on-utf8-glyphs March 26, 2021 at 03:41AM

没有评论:

发表评论