C语言 为什么C语言中char是1个字节

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30166112/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 11:56:13  来源:igfitidea点击:

Why char is of 1 byte in C language

ccharlanguage-lawyer

提问by daniyalahmad

Why is a char1byte long in C? Why is it not 2bytes or 4bytes long?

为什么一个char1字节长在 C 中?为什么不是2字节或4字节长?

What is the basic logic behind it to keep it as 1byte? I know in Java a charis 2bytes long. Same question for it.

将其保留为1字节的基本逻辑是什么?我知道在 Java 中 achar2字节长。同样的问题。

回答by Sourav Ghosh

charis 1 byte in Cbecause it is specified so in standards.

char是 1 个字节,C因为它是在标准中指定的。

The most probable logic is. the (binary) representation of a char(in standard character set) can fit into 1byte. At the time of the primary development of C, the most commonly available standards were ASCIIand EBCDICwhich needed 7 and 8 bit encoding, respectively. So, 1byte wassufficient to represent the whole character set.

最可能的逻辑是。a char(在标准字符集中)的(二进制)表示可以适合1字节。在 初级开发时C,最常用的标准是ASCIIEBCDIC分别需要 7 位和 8 位编码。所以,1一个字节足以代表整个字符集。

OTOH, during the time Javacame into picture, the concepts of extended charcater sets and unicodewere present. So, to be future-proofand support extensibility, charwas given 2 bytes, which is capable of handling extendedcharacter set values.

OTOH,在出现的时候Java,扩展字符集的概念unicode出现了。因此,为了面向未来并支持可扩展性,char给出了2 bytes能够处理扩展字符集值的 。

回答by Nidhoegger

Why would a charhold more than 1byte? A char normally represents an ASCII character. Just have a look at an ASCII table, there are only 256 characters in the (extended) ASCII Code. So you need only to represent numbers from 0 to 255, which comes down to 8bit = 1byte.

为什么会char持有超过 1byte?一个字符通常代表一个 ASCII 字符。只需看一下 ASCII 表,(扩展的)ASCII 代码中只有 256 个字符。所以你只需要表示从 0 到 255 的数字,归结为 8bit = 1byte。

Have a look at an ASCII Table, e.g. here: http://www.asciitable.com/

看看一个 ASCII 表,例如这里:http: //www.asciitable.com/

Thats for C. When Java was designed they anticipated that in the future it would be enough for any character (also Unicode) to be held in 16bits = 2bytes.

那就是 C。在设计 Java 时,他们预计将来任何字符(也是 Unicode)都足以保存在 16 位 = 2 字节中。

回答by Pavel Gatnar

It is because the C languange is 37 years old and there was no need to have more bytes for 1 char, as only 128 ASCII characters were used (http://en.wikipedia.org/wiki/ASCII).

这是因为 C 语言已有 37 年的历史,1 个字符不需要更多字节,因为只使用了 128 个 ASCII 字符(http://en.wikipedia.org/wiki/ASCII)。

回答by arcy

When C was developed (the first book on it was published by its developers in 1972), the two primary character encoding standards were ASCII and EBCDIC, which were 7 and 8 bit encodings for characters, respectively. And memory and disk space were both of greater concerns at the time; C was popularized on machines with a 16-bit address space, and using more than a byte for strings would have been considered wasteful.

在开发 C 时(其开发人员于 1972 年出版了关于它的第一本书),两个主要的字符编码标准是 ASCII 和 EBCDIC,它们分别是字符的 7 位和 8 位编码。内存和磁盘空间在当时都是更受关注的问题。C 在具有 16 位地址空间的机器上普及,使用超过一个字节的字符串会被认为是浪费。

By the time Java came along (mid 1990s), some with vision were able to perceive that a language could make use of an international stnadard for character encoding, and so Unicode was chosen for its definition. Memory and disk space were less of a problem by then.

到 Java 出现时(1990 年代中期),一些有远见的人能够意识到一种语言可以使用国际标准进行字符编码,因此选择了 Unicode 作为其定义。到那时,内存和磁盘空间已经不是什么问题了。

回答by John Bode

The C language standard defines a virtual machine where all objects occupy an integral number of abstract storage unitsmade up of some fixed number of bits (specified by the CHAR_BITmacro in limits.h). Each storage unit must be uniquely addressable. A storage unit is defined as the amount of storage occupied by a single character from the basic character set1. Thus, by definition, the size of the chartype is 1.

C 语言标准定义了一个虚拟机,其中所有对象都占用整数个抽象存储单元,这些单元由一些固定数量的位(由CHAR_BITlimits.h 中的宏指定)组成。每个存储单元必须是唯一可寻址的。存储单元定义为基本字符集1 中的单个字符所占用的存储量。因此,根据定义char类型的大小为 1。

Eventually, these abstract storage units have to be mapped onto physical hardware. Most common architectures use individually addressable 8-bit bytes, so charobjects usually map to a single 8-bit byte.

最终,这些抽象的存储单元必须映射到物理硬件上。大多数常见架构使用可单独寻址的 8 位字节,因此char对象通常映射到单个 8 位字节。

Usually.

通常。

Historically, native byte sizes have been anywhere from 6 to 9 bits wide. In C, the chartype must be at least8 bits wide in order to represent all the characters in the basic character set, so to support a machine with 6-bit bytes, a compiler may have to map a charobject onto two native machine bytes, with CHAR_BITbeing 12. sizeof (char)is still 1, so types with size Nwill map to 2 * Nnative bytes.

从历史上看,本机字节大小一直是 6 到 9 位宽。在 C 中,char类型必须至少有8 位宽才能表示基本字符集中的所有字符,因此为了支持具有 6 位字节的机器,编译器可能必须将char对象映射到两个本地机器字节上,与CHAR_BIT被12. sizeof (char)仍为1,所以类型的具有大小N将映射到2 * N本机字节。



1. The basic character set consists of all 26 English letters in both upper- and lowercase, 10 digits, punctuation and other graphic characters, and control characters such as newlines, tabs, form feeds, etc., all of which fit comfortably into 8 bits.
1. 基本字符集由全部26个英文大小写字母、10个数字、标点符号和其他图形字符以及换行符、制表符、换页符等控制字符组成,所有这些字符都可以轻松地放入8个字符中位。

回答by vmonteco

You don't need more than a byte to represent the whole ascii table (128 characters).

您不需要超过一个字节来表示整个 ascii 表(128 个字符)。

But there are other C types which have more room to contain data, like inttype (4 bytes) or long doubletype (12 bytes).

但是还有其他 C 类型有更多的空间来包含数据,例如int类型(4 个字节)或long double类型(12 个字节)。

All of these contain numerical values (even chars! even if they're represented as "letters", they're "numbers", you can compare it, add it...).

所有这些都包含数值(甚至是字符!即使它们表示为“字母”,它们也是“数字”,您可以比较它,添加它......)。

These are just different standard sizes, like cm and m for lenght, .

这些只是不同的标准尺寸,例如 cm 和 m 表示长度,.