Support

Site Restoration

#41650 Collation conversion error with CLI restore in Akeeba 10.0.1

Posted in ‘Site restoration’
This is a public ticket

Everybody will be able to see its contents. Do not include usernames, passwords or any other sensitive information.

Environment Information

PHP version
8.2.x
CMS Type
Joomla!
CMS Version
5.2.4
Backup Tool Version
10.0.1
Kickstart version
8.0.6

Latest post by nicholas on Thursday, 06 March 2025 10:09 CST

jduerscheid

The cli-based installation of a specific Joomla 5.x site using Akeeba 10.0.1 fails. As far as I can see, it seems to be related to the rather awkward table schema used by convertforms, with one single column enforcing utfmb3 instead of utfmb4:

CREATE TABLE `ul1q8_convertforms_tasks` (
`id` mediumint NOT NULL,
`form_id` mediumint NOT NULL,
`title` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
`state` tinyint NOT NULL DEFAULT '0',
`action` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
`app` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL,
`trigger` varchar(100) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL,
`connection_id` mediumint DEFAULT NULL,
`options` text COLLATE utf8mb4_unicode_ci,
`conditions` text COLLATE utf8mb4_unicode_ci,
`silentfail` tinyint NOT NULL DEFAULT '0',
`modified` datetime DEFAULT NULL,
`created` datetime NOT NULL,
`created_by` mediumint NOT NULL,
`ordering` smallint NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

 

When restoring this site, I get the following message:

Database server error reply:
ErrNo #1253
COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'utf8'
SQL=CREATE TABLE `bmwe_convertforms_tasks` (
`id` mediumint NOT NULL AUTO_INCREMENT,
`form_id` mediumint NOT NULL,
`title` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT
NULL,
`state` tinyint NOT NULL DEFAULT '0',
`action` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT
NULL,
`app` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT
NULL,
`trigger` varchar(100) CHARACTER SET utf8 COLLATE utf8mb4_unicode_ci NOT
NULL,
`connection_id` mediumint DEFAULT NULL,
`options` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`conditions` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`silentfail` tinyint NOT NULL DEFAULT '0',
`modified` datetime DEFAULT NULL,
`created` datetime NOT NULL,
`created_by` mediumint NOT NULL,
`ordering` smallint NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

 

The settings used for the CLI yaml are:

utf8db: true
utf8tables: true
utf8mb4: true
charset_conversion: true

It seems like the mixed collations cause an invalid collation set with the used settings in the CLI restore process.

 

nicholas
Akeeba Staff
Manager

You should probably use:

utf8tables: false
charset_conversion: false

The charset_conversion tries to translate character sets to what is the default representation of that character set in your database server. The original charset of the column was utf8mb3 which translates to utf8 because of charset_conversion.

Using utf8tables means that you want the table, and all its text columns, to be converted to a utf8mb4_* collation. This changes the column's collation to utf8mb4_unicode_ci.

However, this is an illegal combination. Hence the problem.

You can probably use charset_conversion by itself, but not together with charset_conversion in this case. Ideally, the extension should not be using a mix of character sets like that. They should either have the entire table use UTF8(MB3) or UTF8MB4. The latter is a superset of the former. There is also no point in setting both charset and collation. If they need to specify the collation they should specify the collation only; it implies the charset in a deterministic manner.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

jduerscheid

Hi,

I played around with the parameters:
* with charset_conversion and utf8tables set to false, restore fails because the server where I would like to restore the site doesn't support the source servers's collation: Unknown collation: 'utf8mb4_0900_ai_ci'
* with charset_conversion set to true but utf8tables set to false, restore fails with the exact same error message as in the introduction post

So, neither of the suggested configuration settings does work.

Before switching to the CLI installer, we've been using unite for the exact same site and with the same settings, where the extraction worked as expected.

Best Regards!

nicholas
Akeeba Staff
Manager

The CLI installer is not a different application; it's the same application. Instead of having a web interface it has a CLI interface. The actual restoration code (the Model in the Model-View-Contoller) is the same in both cases. The configuration you provide is consumed the same way.

I think that the problem is ultimately some mistake made in the original site. Having the character set and collation set on the table and each column is weird. Having the character set of one specific column be UTF8 3-byte when everywhere else is 4-byte is also very weird. Can you please check with the extension developer that this is really what they are doing? If not, the problem must've come from some issue during the maintenance of the site. Changing the one column's charset and collation from utf8mb3 to utf8mb4 and taking a new backup would fix that problem.

We cannot include a SQL lexer and parser to address the situation where a column has both a charset and a collation. The SQL lexers written in PHP are very iffy; SQL is very similar to COBOL and has a very loose syntax. If the lexer doesn't fail completely, the parser would have a hard time determining the constituent information of the column definition since the lexer tree could be any number of weird shapes (SQL's loose syntax means that a number of possible combination of options can create wildly different tree shapes).

I might be able to come up with a way to address situations like these without a lexer and parser, but I really can't promise that.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

nicholas
Akeeba Staff
Manager

I am having a really hard time reproducing your issue in the way you have written it down.

When the column definition is:

`trigger` varchar(100) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci NOT NULL,

then the MySQL I get is

`trigger` varchar(100) CHARACTER SET utf8mb3 COLLATE utf8mb4_general_ci NOT NULL,

The only way where I can get the exact error message you provided is if the original table has the definition:

`trigger` varchar(100) CHARACTER SET utf8 COLLATE utf8mb3_general_ci NOT NULL,

in which case the resulting MySQL is indeed

`trigger` varchar(100) CHARACTER SET utf8 COLLATE utf8mb4_general_ci NOT NULL,

Are you absolutely sure that the CREATE and error message SQL you sent me are copied verbatim?

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

jduerscheid

> Are you absolutely sure that the CREATE and error message SQL you sent me are copied verbatim?

The CREATE statement is copy&pasted from a mysqldump output for the site in question. The error message is copy & pasted from the CLI output.

nicholas
Akeeba Staff
Manager

OK, that explains it. The column character set is defined as utf8, and mysqldump converts it to utf8mb3 to disambiguate (since the collation is utf8mb3_general_ci).

The database dump we are working on, however, is read verbatim from MySQL which returns the exact character set used for the column definition. That's just utf8.

Yes, with these conditions I can both reproduce and solve the issue you have.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

jduerscheid

Perfect, very happy to hear that! Let me know when you want me to test a patch!

nicholas
Akeeba Staff
Manager

Please try installing https://www.akeeba.com/download/akeebabackuppro-dev/10-0-3-dev202503032002-revcf6191d.html and taking a new backup. Restoring that backup –using the options you were previously using– will now work.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

jduerscheid

Perfect, issue is indeed resolved using the new version!

nicholas
Akeeba Staff
Manager

Thank you for the feedback! I'll release a new version next week.

Nicholas K. Dionysopoulos

Lead Developer and Director

🇬🇷Greek: native 🇬🇧English: excellent 🇫🇷French: basic • 🕐 My time zone is Europe / Athens
Please keep in mind my timezone and cultural differences when reading my replies. Thank you!

Support Information

Working hours: We are open Monday to Friday, 9am to 7pm Cyprus timezone (EET / EEST). Support is provided by the same developers writing the software, all of which live in Europe. You can still file tickets outside of our working hours, but we cannot respond to them until we're back at the office.

Support policy: We would like to kindly inform you that when using our support you have already agreed to the Support Policy which is part of our Terms of Service. Thank you for your understanding and for helping us help you!