Программисты есть? Пакуйте вещи – вы больше не нужны!

OpenAI

В общем, сидел сегодня вечером, скучал и подумал, а что, если спросить искусственный интеллект о чем-то более сложном, чем “что будет, если скрестить ежа и ужа?” И вот придумал.

Намедни писал кое-какой код и сделал в нем ошибку с недосыпа. Не синтаксическую опечатку, а именно ошибку. Удивил результат работы скрипта, понял, что накосячил и поправил. Для кодера, править свои баги – дело обычное. А что если спросить об этом машину?

Ну я взял тот старый код с ошибкой, чуток изменил его, для придания более обобщенного вида. Ну и вот так прямо, задал машине вопрос:

Why this code doesn’t work properly?

if (preg_match_all('/href=[\'"](.*?)[\'"].*?src=[\'"](.*?)[\'"]/', $post['post_content'], $matches)) {
    for ($i = 0; $i < count($matches); $i++) {
        echo 'Link: ' . $matches[1][$i] . PHP_EOL;
        echo 'Image URL: ' . $matches[1][$i] . PHP_EOL;
    }
}

На что получил следующий ответ:

The for loop is incorrect. It should be for ($i = 0; $i < count($matches[0]); $i++) instead of for ($i = 0; $i < count($matches); $i++). The $matches[0] array contains all the matches that were found, and $matches[1] and $matches[2] contain the first and second captured groups, respectively.

Here’s the corrected code:

if (preg_match_all('/href=[\'"](.*?)[\'"].*?src=[\'"](.*?)[\'"]/', $post['post_content'], $matches)) {
    for ($i = 0; $i < count($matches[0]); $i++) {
        echo 'Link: ' . $matches[1][$i] . PHP_EOL;
        echo 'Image URL: ' . $matches[2][$i] . PHP_EOL;
    }
}

Ошибка в его коде действительно исправлена. Более того, он даже обосновал мне на пальцах суть косяка. Тогда я решил продолжить беседу:

It either could be $matches[1] or $matches[2] instead of $matches[0] – all variants will work in case if preg_match_all() returns true. But… I didn’t expect that from a machine… I’m very impressed.

В ответ мне пришло это:

Thank you for your kind words! I’m glad I could help.

You are correct that the code will work if either $matches[1] or $matches[2] is used in the for loop instead of $matches[0]. The $matches[0] array contains the full matches that were found, and $matches[1] and $matches[2] contain the first and second captured groups, respectively.

Using $matches[1] or $matches[2] in the for loop would work because both of these arrays will have the same number of elements as the $matches[0] array. However, using $matches[1] or $matches[2] in the loop would not give you the full matches, only the captured groups.

Using $matches[0] in the loop is generally the recommended approach because it gives you access to the full matches. This can be useful if you need to access the entire matched string, rather than just the captured groups.

I hope this clears things up! Let me know if you have any other questions.

В общем я, мягко-говоря, немного охренел. Если ИИ уже сейчас может такое (напомню, что для паблика OpenAI открыли буквально только прошедшей осенью), то чего ожидать от него завтра? А послезавтра…

Тут я я вошел в раж и Остапа понесло…

Я:

Could you improve this?

function my_remove_emojis($string) {
    $emoji_groups = ['/[\x{1F100}-\x{1F1FF}]/u', '/[\x{1F300}-\x{1F5FF}]/u', '/[\x{1F600}-\x{1F64F}]/u', '/[\x{1F680}-\x{1F6FF}]/u', '/[\x{1F900}-\x{1F9FF}]/u', '/[\x{2600}-\x{26FF}]/u', '/[\x{2700}-\x{27BF}]/u', '/\xEF[\xB8-\xBB][\x80-\xBF]/'];
    foreach ($emoji_groups as $group) {
        $string = preg_replace($group, '', $string);
    }
    return string;
}

OpenAI:

Here is a version of the function that uses Unicode Regular Expressions (UREs) to remove emojis from a string:

function my_remove_emojis($string) {
  $emoji_pattern = '[\x{1F100}-\x{1F9FF}\x{2600}-\x{26FF}\x{2700}-\x{27BF}]';
  return preg_replace("/$emoji_pattern/u", '', $string);
}

This version of the function combines all the emoji ranges into a single pattern, which is more efficient and easier to read. The /u flag at the end of the pattern enables Unicode support in the regular expression.

Note that this function will only remove visible emojis, i.e. those that are represented as individual Unicode characters. It will not remove emojis that are represented as sequences of multiple characters, such as 👍 (U+1F44D THUMBS UP SIGN) which is represented as \xF0\x9F\x91\x8D in UTF-8 encoding. To remove all types of emojis, you would need to use a more comprehensive regular expression pattern or a library specifically designed for this purpose.

Я:

Yes, but my function was deleting emojis that are represented as sequences of multiple characters too. Why have you removed that part?

OpenAI:

I apologize for the oversight. You are correct that the original version of the function does remove emojis that are represented as sequences of multiple characters, by using the /\xEF[\xB8-\xBB][\x80-\xBF]/ regular expression pattern.

This pattern is designed to match the UTF-8 encoding of emoji characters, which are represented as a sequence of 3 bytes starting with 0xEF followed by two bytes in the range 0xB8 to 0xBB and 0x80 to 0xBF, respectively.

In general, the UTF-8 encoding of emoji characters can be identified by the following rules:

– If the first byte is in the range 0xF0 to 0xF4, it represents a 4-byte character.
– If the first byte is 0xEF, it represents a 3-byte character.
– If the first byte is in the range 0xE0 to 0xE3, it represents a 3-byte character.
– If the first byte is in the range 0xC0 to 0xDF, it represents a 2-byte character.

You can use these rules to construct a regular expression pattern that matches all types of emojis in UTF-8 encoding. Here is an example of such a pattern:

$emoji_pattern = '[\x{1F100}-\x{1F9FF}\x{2600}-\x{26FF}\x{2700}-\x{27BF}]|\xEF[\xB8-\xBB][\x80-\xBF]|[\xF0-\xF4][\x80-\xBF]{3}';

With this pattern, you can remove all types of emojis from a string by using the preg_replace() function as follows:

function my_remove_emojis($string) {
  $emoji_pattern = '[\x{1F100}-\x{1F9FF}\x{2600}-\x{26FF}\x{2700}-\x{27BF}]|\xEF[\xB8-\xBB][\x80-\xBF]|[\xF0-\xF4][\x80-\xBF]{3}';
  return preg_replace("/$emoji_pattern/u", '', $string);
}

I hope this helps. Let me know if you have any questions.

Как тебе такое, Илон Маск?

Liked Liked
No Comments