php 如何从文本中删除变音符号？

Question

提问by P?r Wieslander

I am making a swedish website, and swedish letters are ?, ?, and ?.

我正在制作一个瑞典语网站，瑞典语字母是 ?、? 和 ?。

I need to make a string entered by a user to become url-safe with PHP.

我需要使用户输入的字符串成为使用 PHP 的 url-safe。

Basically, need to convert all characters to underscore, all EXCEPT these:

基本上，需要将所有字符转换为下划线，除了这些：

 A-Z, a-z, 1-9

and all swedish should be converted like this:

所有瑞典语都应该像这样转换：

'?' to 'a' and '?' to 'a' and '?' to 'o' (just remove the dots above).

'？'a' 和 '?' 'a' 和 '?' 到“o”（只需删除上面的点）。

The rest should become underscores as I said.

正如我所说，其余的应该变成下划线。

Im not good at regular expressions so I would appreciate the help guys!

我不擅长正则表达式，所以我很感激你们的帮助！

Thanks

谢谢

NOTE: NOT URLENCODE...I need to store it in a database... etc etc, urlencode wont work for me.

注意：不是 URLENCODE……我需要将它存储在数据库中……等等，urlencode 对我不起作用。

Answer 1

采纳答案by Jeremy L

// normalize data (remove accent marks) using PHP's *intl* extension
$data = normalizer_normalize($data);

// replace everything NOT in the sets you specified with an underscore
$data = preg_replace("#[^A-Za-z1-9]#","_", $data);

Answer 2

回答by user1518659

This should be useful which handles almost all the cases.

这应该很有用，可以处理几乎所有情况。

function Unaccent($string)
{
    return preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml|caron);~i', '', htmlentities($string, ENT_COMPAT, 'UTF-8'));
}

Answer 3

回答by P?r Wieslander

Use iconvto convert strings from a given encoding to ASCII, then replace non-alphanumeric characters using preg_replace:

使用iconv将字符串从给定编码转换为 ASCII，然后使用preg_replace替换非字母数字字符：

$input = 'r?ksm?rg?s och k?ttbullar'; // UTF8 encoded
$input = iconv('UTF-8', 'ASCII//TRANSLIT', $input);
$input = preg_replace('/[^a-zA-Z0-9]/', '_', $input);
echo $input;

Result:

结果：

raksmorgas_och_kottbullar

Answer 4

回答by BalusC

and all swedish should be converted like this:
'?' to 'a' and '?' to 'a' and '?' to 'o' (just remove the dots above).

所有瑞典语都应该像这样转换：
'？'a' 和 '?' 'a' 和 '?' 到“o”（只需删除上面的点）。

Use normalizer_normalize()to get rid of diacritical marks.

使用normalizer_normalize()摆脱区别标记。

The rest should become underscores as I said.

正如我所说，其余的应该变成下划线。

Use preg_replace()with a pattern of [\W](i.o.w: any character which doesn't match letters, digits or underscore) to replace them by underscores.

使用preg_replace()带有图案[\W]（督察：它不匹配字母的任意字符，数字或下划线）用下划线替换它们。

Final result should look like:

最终结果应如下所示：

$data = preg_replace('[\W]', '_', normalizer_normalize($data));

Answer 5

回答by Dominic Rodger

If you're just interested in making things URL safe, then you want urlencode.

如果您只是对使 URL 安全感兴趣，那么您需要urlencode.

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the ? RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

返回一个字符串，其中除 -_ 之外的所有非字母数字字符。已替换为百分号 (%) 后跟两个十六进制数字和编码为加号 (+) 符号的空格。它的编码方式与来自 WWW 表单的发布数据的编码方式相同，即与 application/x-www-form-urlencoded 媒体类型中的方式相同。这与 ? RFC 1738 编码（参见 rawurlencode()），因为历史原因，空格被编码为加号 (+)。

If you really want to strip all non A-Z, a-z, 1-9 (what's wrong with 0, by the way?), then you want:

如果您真的想删除所有非 AZ、az、1-9（0顺便说一下，有什么问题？），那么您需要：

$mynewstring = preg_replace('/[^A-Za-z1-9]/', '', $str);

Answer 6

回答by dryobs

If intl php extension is enabled, you can use Transliterator like this :

如果启用了 intl php 扩展，您可以像这样使用 Transliterator：

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::create('NFD; [:Nonspacing Mark:] Remove; NFC;');
    return $transliterator->transliterate($string);
}

To remove other special chars (not diacritics only like '?')

删除其他特殊字符（不是像“？”这样的变音符号）

protected function removeDiacritics($string)
{
    $transliterator = \Transliterator::createFromRules(
        ':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;',
        \Transliterator::FORWARD
    );
    return $transliterator->transliterate($string);
}

Answer 7

回答by user187291

as simple as

就这么简单

 $str = str_replace(array('?', '?', '?'), array('a', 'a', 'o'), $str); 
 $str = preg_replace('/[^a-z0-9]+/', '_', strtolower($str));

assuming you use the same encoding for your data and your code.

假设您对数据和代码使用相同的编码。

Answer 8

回答by Mihail Dimitrov

One simple solution is to use str_replacefunction with search and replace letter arrays.

一种简单的解决方案是将str_replace函数与搜索和替换字母数组一起使用。

Answer 9

回答by danii

You don't need fancy regexps to filter the swedish chars, just use the strtr functionto "translate" them, like:

您不需要花哨的正则表达式来过滤瑞典字符，只需使用strtr 函数来“翻译”它们，例如：

$your_URL = "www.m???.com";
$good_URL = strtr($your_URL, "???? etc...", "aaoe etc...");
echo $good_URL;

->output: www.maao.com :)

-> 输出：www.maao.com :)

php 如何从文本中删除变音符号？

提问by P?r Wieslander

采纳答案by Jeremy L

回答by user1518659

回答by P?r Wieslander

回答by BalusC

回答by Dominic Rodger

回答by dryobs

回答by user187291

回答by Mihail Dimitrov

回答by danii

相关推荐

最近更新

标签

php 如何从文本中删除变音符号？

提问by P?r Wieslander

采纳答案by Jeremy L

回答by user1518659

回答by P?r Wieslander

回答by BalusC

回答by Dominic Rodger

回答by dryobs

回答by user187291

回答by Mihail Dimitrov

回答by danii

相关推荐

php 将数组作为 url 参数传递

php 不上传任何内容时，$_FILES 数组不为空

php move_uploaded_file() 无法将文件从 tmp 移动到 dir

php 如何在选定页面上添加动态 Active 类

相关推荐

最近更新

标签