如何从 C++ 程序读取直到读取一个字符然后跳过一些字符并继续读取

Question

提问by ark

How do i read from a file till a particular character is reached and then seek to the next character and continue reading in c++.

我如何从文件中读取直到到达特定字符，然后寻找下一个字符并继续在 C++ 中读取。

In my program i am using some HTML syntax and generating a htm file... So in my c++ code i have added the tags... but when i read from my htm file i want it to not include the tags.

在我的程序中，我使用了一些 HTML 语法并生成了一个 htm 文件......所以在我的 C++ 代码中我添加了标签......但是当我从我的 htm 文件中读取时，我希望它不包含标签。

What i plan on doing is reading the file till '<' is encountered then just seek to the point till '>' is encountered and continue reading from there..

我打算做的是阅读文件直到遇到“<”然后只是寻找点直到遇到“>”并从那里继续阅读..

Please help me and guide me with this.. I am not very experienced with file input output in c++.. Thank You..:)

请帮助我并指导我..我对 C++ 中的文件输入输出不是很有经验..谢谢..:)

Answer 1

回答by CashCow

In general, to read a file until a particular character is reached you use std::getlineand you set the second parameter to your terminator so if you are reading until a '<' character you can do

一般来说，要读取文件直到到达特定字符，您可以使用std::getline并将第二个参数设置为终止符，因此如果您正在读取“<”字符，您可以这样做

std::getline( infile, str, '<' );

you can then do the same with a >character

然后你可以对一个>角色做同样的事情

In your case if you are parsing HTML then there are probably specific parsers for it already. I think HTML1.1 is XML compliant but HTML1.0 isn't as it was not always necessary to close all your tags, so an XML parser will not necessarily work.

在您的情况下，如果您正在解析 HTML，那么可能已经有特定的解析器。我认为 HTML1.1 是 XML 兼容的，但 HTML1.0 不是，因为它并不总是需要关闭所有标签，所以 XML 解析器不一定能工作。

You would need to assume that open and close tags are not part of comments or quoted text and the methodology I described above would not promise you that so you'd need a full state machine.

您需要假设打开和关闭标签不是评论或引用文本的一部分，并且我上面描述的方法不会向您保证，因此您需要一个完整的状态机。

Answer 2

回答by Jerry Coffin

First of all, you should be aware that doing this correctly is quite a bit trickier than you apparently think.

首先，您应该意识到正确地执行此操作比您显然认为的要棘手得多。

Just answering the question as you asked it, you can use istream::getto read a character at a time until you get a '<'. You can use ignoreto ignore characters up to the next '>' in the stream.

只需按您提出的问题回答问题，您就可以istream::get一次读取一个字符，直到得到“<”。您可以使用ignore忽略流中直到下一个 '>' 的字符。

Getting back to the first point, however, that generally won't work correctly. In particular, it's entirely possible for a tag to contain a string, and the string (in turn) contain a '>' that is notthe close of the tag. As such, to have any hope of parsing the HTML correctly, you need to parse for strings inside the tags, and when you find them, skip across their contents rather than treating any '>' they might contain as ending the tag.

然而，回到第一点，这通常不会正常工作。特别是，标签完全有可能包含一个字符串，而该字符串（反过来）包含一个不是标签结尾的“>” 。因此，要想正确解析 HTML，您需要解析标签内的字符串，当您找到它们时，跳过它们的内容，而不是将它们可能包含的任何 '>' 视为结束标签。

Answer 3

回答by Luchian Grigore

Here are some guidelines.

这里有一些指导方针。

You can read the file line by line with getLinefrom a ifstream, and keep each line in a std::string
You can use std::string.find()method to find <and >characters.
You can use std::string.substr()method to get substrings.
You can group the strings, if required, in a std::vector.

您可以使用getLinefrom a逐行读取文件ifstream，并将每一行保存在一个std::string
您可以使用std::string.find()方法来查找<和>字符。
您可以使用std::string.substr()方法来获取子字符串。
如果需要，您可以将字符串分组在一个std::vector.

You're not going to get a full implementation here, but this should be enough to get you started.

您不会在这里获得完整的实现，但这应该足以让您入门。

Answer 4

回答by Scott Hunter

The following reads from standard input; modify/reaplace the calls to getchar() to read from somewhere else.

以下内容来自标准输入；修改/重新放置对 getchar() 的调用以从其他地方读取。

int c;

c = getchar();
while ( c != EOF ) {
    while ( c != '<' && c != EOF) {
        /* Do something with character outside tag? */
        c = getchar();
    }
    while ( c != '>' && c != EOF ) {
        /* Do something with character inside tag? */
        c = getchar();
    }
}

如何从 C++ 程序读取直到读取一个字符然后跳过一些字符并继续读取

提问by ark

回答by CashCow

回答by Jerry Coffin

回答by Luchian Grigore

回答by Scott Hunter

相关推荐

最近更新

标签

如何从 C++ 程序读取直到读取一个字符然后跳过一些字符并继续读取

提问by ark

回答by CashCow

回答by Jerry Coffin

回答by Luchian Grigore

回答by Scott Hunter

相关推荐

是否有 C++ 库可以从 PDF 文件（如 Java PDFBox）中提取文本？

什么是 C++ 的最佳开放 XML 解析器？

在 C++ 中初始化一个空数组

C++ #warning 预处理器指令的可移植性

相关推荐

最近更新

标签