C语言 getchar() 和标准输入

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7741930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 09:53:04  来源:igfitidea点击:

getchar() and stdin

c

提问by ybakos

A related question is here, but my question is different.

一个相关的问题是here,但我的问题是不同的。

But, I'd like to know more about the internals of getchar() and stdin. I know that getchar() just ultimately calls fgetc(stdin).

但是,我想更多地了解 getchar() 和 stdin 的内部结构。我知道 getchar() 最终只会调用 fgetc(stdin)。

My question is about buffering, stdin and getchar() behavior. Given the classic K&R example:

我的问题是关于缓冲、标准输入和 getchar() 行为。鉴于经典的 K&R 示例:

#include <stdio.h>

main()
{
    int c;

    c = getchar();
    while (c != EOF) {
        putchar(c);
        c = getchar();
    }
}

It seems to me that getchar()'s behavior could be described as follows:

在我看来,getchar() 的行为可以描述如下:

If there's nothing in the stdin buffer, let the OS accept user input until [enter] is pressed. Then return the first character in the buffer.

如果 stdin 缓冲区中没有任何内容,请让操作系统接受用户输入,直到按下 [enter]。然后返回缓冲区中的第一个字符。

Assume the program is run and the user types "anchovies."

假设程序运行并且用户键入“anchovies”。

So, in the above code listing, the first call to getchar() awaits user input and assigns the first character in the buffer to variable c. Inside the loop, the first iteration's call to getchar() says "Hey, there's stuff in the buffer, return the next character in the buffer." But the Nth iteration of the while loop results in getchar() saying "Hey, there's nothing in the buffer, so let stdin gather what the user types.

因此,在上面的代码清单中,对 getchar() 的第一次调用等待用户输入并将缓冲区中的第一个字符分配给变量 c。在循环内部,第一次迭代调用 getchar() 说“嘿,缓冲区中有东西,返回缓冲区中的下一个字符。” 但是 while 循环的第 N 次迭代导致 getchar() 说“嘿,缓冲区中什么都没有,所以让 stdin 收集用户键入的内容。

I've spend a little time with the c source, but it seems this is more of a behavioral artifact of stdin rather than fgetc().

我花了一些时间在 c 源代码上,但似乎这更像是 stdin 而不是 fgetc() 的行为工件。

Am I wrong here? Thanks for your insight.

我在这里错了吗?感谢您的洞察力。

采纳答案by Fred Foo

I know that getchar()just ultimately calls fgetc(stdin).

我知道getchar()只是最终调用fgetc(stdin).

Not necessarily. getcharand getcmight as well expand to the actual procedure of reading from a file, with fgetcimplemented as

不必要。getchargetc可以扩展到从文件中读取的实际过程,fgetc实现为

int fgetc(FILE *fp)
{
    return getc(fp);
}

Hey, there's nothing in the buffer, so let stdin gather what the user types. [...] it seems this is more of a behavioral artifact of stdinrather than fgetc().

嘿,缓冲区中什么都没有,所以让 stdin 收集用户键入的内容。[...] 似乎这更像是一种行为神器,stdin而不是fgetc()

I can only tell you what I know, and that is how Unix/Linux works. On that platform, a FILE(including the thing that stdinpoints to) holds a file descriptor (an int) that is passed to the OS to indicate from which input source the FILEgets data, plus a buffer and some other bookkeeping stuff.

我只能告诉你我所知道的,这就是 Unix/Linux 的工作原理。在该平台上, a FILE(包括stdin指向的事物)保存一个文件描述符 (an int),该描述符传递给操作系统以指示从哪个输入源FILE获取数据,以及一个缓冲区和一些其他簿记内容。

The "gather" part then means "call the readsystem call on the file descriptor to fill the buffer again". This varies per implementation of C, though.

“收集”部分意味着“调用read文件描述符上的系统调用以再次填充缓冲区”。但是,这因 C 的实现而异。

回答by ott--

getchar()'s input is line-buffered, and the input-buffer is limited, usually it's 4 kB. What you see at first is the echo of each character you're typing. When your press ENTER, then getchar() starts returning characters up to the LF (which is converted to CR-LF). When you keep on pressing keys without LF for some time, it stops echoing after 4096 characters, you have to press ENTER to continue.

getchar() 的输入是行缓冲的,输入缓冲是有限的,通常是 4 kB。您首先看到的是您键入的每个字符的回声。当您按 ENTER 时,getchar() 开始将字符返回到 LF(转换为 CR-LF)。当您在没有 LF 的情况下继续按下按键一段时间时,它会在 4096 个字符后停止回显,您必须按 ENTER 才能继续。

回答by weibeld

The behaviour you're observing has nothing to do with C and getchar(), but with the teletype (TTY) subsystem in the OS kernel.

您观察到的行为与 C 和 无关getchar(),而是与操作系统内核中的电传 (TTY) 子系统有关。

For this you need to know how processes get their input from your keyboard and how they write their output to your terminal window (I assume you use UNIX and the following explanations apply specifically to UNIX, i.e. Linux, macOS, etc.):

为此,您需要了解进程如何从您的键盘获取输入以及它们如何将输出写入您的终端窗口(我假设您使用 UNIX,以下解释专门适用于 UNIX,即 Linux、macOS 等):

enter image description here

在此处输入图片说明

The box entitled "Terminal" in above diagram is your terminal window, e.g. xterm, iTerm, or Terminal.app. In the old times, terminals where separate hardware devices, consisting of a keyboard and a screen, and they were connected to a (possibly remote) computer over a serial line (RS-232). Every character typed on the terminal keyboard was sent over this line to the computer and consumed by an application that was connected to the terminal. And every character that the application produced as output was sent over the same line to the terminal which displayed it on the screen.

上图中标题为“终端”的框是您的终端窗口,例如 xterm、iTerm 或 Terminal.app。在过去,终端是单独的硬件设备,由键盘和屏幕组成,它们通过串行线路 (RS-232) 连接到(可能是远程的)计算机。在终端键盘上输入的每个字符都通过这条线发送到计算机,并由连接到终端的应用程序使用。应用程序作为输出生成的每个字符都通过同一行发送到在屏幕上显示的终端。

Nowadays, terminals are not hardware devices anymore, but they moved "inside" the computer and became processes that are referred to as terminal emulators. xterm, iTerm2, Terminal.app, etc., are all terminal emulators.

如今,终端不再是硬件设备,而是在计算机“内部”移动并成为被称为终端仿真器的进程。xterm、iTerm2、Terminal.app等,都是终端模拟器。

However, the communication mechanism between applications and terminal emulators stayed the sameas it was for hardware terminals. Terminal emulators emulatehardware terminals. That means, from the point of view of an application, talking to a terminal emulator today (e.g. iTerm2) works the same as talking to a real terminal (e.g. a DEC VT100) back in 1979. This mechanism was left unchanged so that applications developed for hardware terminals would still work with software terminal emulators.

但是,应用程序和终端仿真器之间的通信机制与硬件终端的通信机制保持一致。终端模拟器模拟硬件终端。这意味着,从应用程序的角度来看,今天与终端仿真器(例如iTerm2)交谈的工作方式与1979 年与真实终端(例如DEC VT100)交谈的工作原理相同。这种机制保持不变,因此应用程序开发对于硬件终端仍然可以与软件终端模拟器一起使用。

So how does this communication mechanism work? UNIX has a subsystem called TTYin the kernel (TTY stands for teletype, which was the earliest form of computer terminals that didn't even have a screen, just a keyboard and a printer). You can think of TTY as a generic driverfor terminals. TTY reads bytes from the port to which a terminal is connected (coming from the keyboard of the terminal), and writes bytes to this port (being sent to the display of the terminal).

那么这种通信机制是如何工作的呢?UNIX在内核中有一个名为TTY的子系统(TTY 代表电传打字机,这是最早的计算机终端形式,甚至没有屏幕,只有键盘和打印机)。您可以将 TTY 视为终端的通用驱动程序。TTY 从终端连接的端口(来自终端的键盘)读取字节,并将字节写入该端口(发送到终端的显示器)。

There is a TTY instance for every terminal that is connected to a computer (or for every terminal emulator process running on the computer). Therefore, a TTY instance is also referred to as a TTY device(from the point of view of an application, talking to a TTY instance is like talking to a terminal device). In the UNIX manner of making driver interfaces available as files, these TTY devices are surfaced as /dev/tty*in some form, for example, on macOS they are /dev/ttys001, /dev/ttys002, etc.

每个连接到计算机的终端(或计算机上运行的每个终端模拟器进程)都有一个 TTY 实例。因此,TTY 实例也称为TTY 设备(从应用程序的角度来看,与 TTY 实例交谈就像与终端设备交谈)。在制造驱动器的接口可作为文件的UNIX方式,这些TTY设备浮出水面如/dev/tty*在某种形式的,例如,在MacOS它们是/dev/ttys001/dev/ttys002

An application can have its standard streams (stdin, stdout, stderr) directed to a TTY device (in fact, this is the default, and you can find out to which TTY device your shell is connected with the ttycommand). This means that whatever the user types on the keyboard becomes the standard input of the application, and whatever the application writes to its standard output is sent to the terminal screen (or terminal window of a terminal emulator). All this happens through the TTY device, that is, the application only communicates with the TTY device (this type of driver) in the kernel.

应用程序可以将其标准流(stdin、stdout、stderr)定向到 TTY 设备(实际上,这是默认设置,您可以通过命令找出您的 shell 连接到哪个 TTY 设备tty)。这意味着用户在键盘上输入的任何内容都会成为应用程序的标准输入,而应用程序写入其标准输出的任何内容都会发送到终端屏幕(或终端模拟器的终端窗口)。这一切都是通过TTY设备发生的,即应用程序只与内核中的TTY设备(这种类型的驱动程序)进行通信。

Now, the crucial point: the TTY device does more than just passing every input character to the standard input of the application. By default, the TTY device applies a so-called line disciplineto the received characters. That means, it locally buffers them and interprets delete, backspaceand other line editing characters, and only passes them to standard input of the application when it receives a carriage returnor line feed, which means that the user has finished entering and editing a whole line.

现在,关键点是:TTY 设备不仅仅是将每个输入字符传递给应用程序的标准输入。默认情况下,TTY 设备对接收到的字符应用所谓的线路规则。也就是说,它在本地缓存它们并解释删除退格和其他行编辑字符,并且只有在收到回车换行时才将它们传递给应用程序的标准输入,这意味着用户已经完成了整个输入和编辑线。

That means until the user hits return, getchar()doesn't see anything in stdin. It's like nothing had been typed so far. Only when the user hits return, the TTY device sends these characters to the standard input of the application, where getchar()immediately reads them as.

这意味着在用户点击return 之前getchar()在标准输入中看不到任何内容。就好像到目前为止没有输入任何内容。只有当用户点击return 时,TTY 设备才会将这些字符发送到应用程序的标准输入,在那里getchar()立即读取它们。

In that sense, there is nothing special about the behaviour of getchar(). It just immediately reads characters in stdin as they become available. The line buffering that you observe happens in the TTY device in the kernel.

从这个意义上说, 的行为没有什么特别之处getchar()。它只是在 stdin 中的字符可用时立即读取它们。您观察到的行缓冲发生在内核的 TTY 设备中。

Now to the interesting part: this TTY device can be configures. You can do it, for example, from a shell with the sttycommand. This allows you to configure almost every aspect of the line discipline that the TTY device applies to incoming characters. Or you can disable any processing whatsoever by setting the TTY device to raw mode. In this case, the TTY device forwards every received character immediately to stdin of the application without any form of editing.

现在到了有趣的部分:可以配置这个 TTY 设备。例如,您可以使用stty命令从 shell执行此操作。这允许您配置 TTY 设备应用于传入字符的线路规则的几乎所有方面。或者您可以通过将 TTY 设备设置为原始模式来禁用任何处理。在这种情况下,TTY 设备会立即将每个接收到的字符转发到应用程序的 stdin,无需任何形式的编辑。

If you enable raw mode in the TTY device, you will see that getchar()immediatelyreceives every character that you type on the keyboard. The following C program demonstrates this:

如果您在 TTY 设备中启用原始模式,您将看到getchar()立即接收您在键盘上键入的每个字符。以下 C 程序演示了这一点:

#include <stdio.h>
#include <unistd.h>   // STDIN_FILENO, isatty(), ttyname()
#include <stdlib.h>   // exit()
#include <termios.h>

int main() {
    struct termios tty_opts_backup, tty_opts_raw;

    if (!isatty(STDIN_FILENO)) {
      printf("Error: stdin is not a TTY\n");
      exit(1);
    }
    printf("stdin is %s\n", ttyname(STDIN_FILENO));

    // Back up current TTY settings
    tcgetattr(STDIN_FILENO, &tty_opts_backup);

    // Change TTY settings to raw mode
    cfmakeraw(&tty_opts_raw);
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_raw);

    // Read and print characters from stdin
    int c, i = 1;
    for (c = getchar(); c != 3; c = getchar()) {
        printf("%d. 0x%02x (0%02o)\r\n", i++, c, c);
    }
    printf("You typed 0x03 (003). Exiting.\r\n");

    // Restore previous TTY settings
    tcsetattr(STDIN_FILENO, TCSANOW, &tty_opts_backup);
}

The program sets the current process' TTY device to raw mode, then uses getchar()to read and print characters from stdin in a loop. The characters are printed as ASCII codes in hexadecimal and octal notation. The program specially interprets the ETXcharacter (ASCII code 0x03) as a trigger to terminate. You can produce this character on your keyboard by typing Ctrl-C.

该程序将当前进程的 TTY 设备设置为原始模式,然后使用getchar()循环从 stdin 读取和打印字符。字符以十六进制和八进制表示法打印为 ASCII 代码。程序专门将ETX字符(ASCII 码 0x03)解释为触发终止。您可以通过键入在键盘上生成此字符Ctrl-C