Got error 4009 'Cluster Failure' from NDBCLUSTER

Ok, this is not possible, there should be numerous printouts with [RonDB] in the MySQL error log
even if there are no failures. There should be log messages each time a data node connects,
disconnects and so forth.

So the only possible solution to this is that you are contacting some other MySQL Server or
that the log messages goes somewhere else.

Some machines have a MySQL Server service that is automatically started, I have this in
my Mac e.g. Maybe something like that is happening to you?

So essentially this version cannot have a MySQL Server connected without messages
hitting the log file.

Another option could be that the MySQL Server doesn’t have write permissions
to the error log file.

the only logs in the mysql error log is this:

2021-10-07T08:43:14.370640Z 0 [System] [MY-010116] [Server] /home/centos/rondb-21.04.2-linux-glibc2.17-x86_64/bin/mysqld (mysqld 21.04.2-cluster) starting as process 13816
2021-10-07T08:43:14.379432Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2021-10-07T08:43:14.654143Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2021-10-07 08:43:15:[RonDB] (N67) Node 1 Connected
2021-10-07 08:43:15:[RonDB] (N67) Node 2 Connected
2021-10-07 08:43:15:[RonDB] (N67) Node 1 is now alive, Our version: RonDB-21.04.2 is compatible with node version: RonDB-21.04.1, node is started
2021-10-07 08:43:15:[RonDB] (N67) Node 2 is now alive, Our version: RonDB-21.04.2 is compatible with node version: RonDB-21.04.1, node is started
2021-10-07T08:43:15.383509Z 0 [System] [MY-011323] [Server] X Plugin ready for connections. Bind-address: '::' port: 33060, socket: /tmp/mysqlx.sock
2021-10-07T08:43:15.399880Z 0 [System] [MY-010229] [Server] Starting XA crash recovery...
2021-10-07T08:43:15.405841Z 0 [System] [MY-010232] [Server] XA crash recovery finished.
2021-10-07T08:43:15.478060Z 0 [Warning] [MY-010068] [Server] CA certificate ca.pem is self signed.
2021-10-07T08:43:15.478290Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2021-10-07T08:43:15.495644Z 0 [System] [MY-010931] [Server] /home/centos/rondb-21.04.2-linux-glibc2.17-x86_64/bin/mysqld: ready for connections. Version: '21.04.2-cluster'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution.
2021-10-07 08:43:15 [NdbApi] INFO     -- Flushing incomplete GCI:s < 251226/19
2021-10-07 08:43:15 [NdbApi] INFO     -- Flushing incomplete GCI:s < 251226/19```

during the error we see on running some load we do not see anything else.

if we run 2000 select queries… 100 run fine but the rest fail with :thinking:

Got error 4009 'No data node(s) available, check Cluster state' from NDBCLUSTER

Ok, since these new error messages are written into the error log and
the node is alive and kicking the only remaining option is that the
failing queries are sent to some other MySQL server. The error 4009
leads to a write into the MySQL error log with some information about
the state when this error happened.

So since the error message about 4009 is missing, it must have happened
in another MySQL server. That is the only explanation that I can come
up with since you cannot get 4009 in a MySQL Server in this version without
getting a printout in the error log.

I also note that only Node 67 is in the log here.
So this indicates that Node 68 is somewhere else. In your setup previously
at least you started the mysqld with 2 nodes 67 and 68.

I recall that early in the discussion you mentioned that you had 2 VMs
for MySQL Servers. If you have started both with the same command
then you have competition on node ids. The 2 MySQL Servers need to
use their own node ids. So if this is the case you should ensure that
one of them uses 67 and the other one uses 68.

Hope you managed to find the solution to this problem.
The new log messages and improved error handling will be a useful addition to
RonDB that will be present in the next releases coming out this month.

The issue was with the way i was passing the node ids as part of the ndb-cluster-connection-pool-nodeids one of the id was assigned to another IP… this was causing the 4009 is what my guess is … as after i fixed this mistake all started to work fine …

the newer logging did help a little :slight_smile:

thanks @mikaelronstrom

Great to hear that you finally found the problem. The new logging will
be of assistance for others as well, so definitely a step forward.