Got error 4009 'Cluster Failure' from NDBCLUSTER

mysql> insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> select * from t1;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> DROP TABLE t1;
Query OK, 0 rows affected (0.24 sec)

mysql>
mysql>

working as expected when i am running the simple commands :thinking:

Ok, one more question, how does the config.ini looks in the API/MYSQLD
node parts

[mysqld]
NodeId=67
Hostname=xxx.xxx.xxx.xxx

[mysqld]
NodeId=68
Hostname=xxx.xxx.xxx.xxx

[api]
NodeId=231

Ok, so we can conclude that the cluster is up and running, and serves query.
So then we need to understand what happens when you run queries against it.

What I sometimes do then is that I issue SHOW PROCESSLIST; once per second to see how many concurrent queries it is running and what queries there are.

Could be some kind of overload situations possibly

What is the type of queries that you issue against the cluster and how many parallel queries. This is what this SHOW PROCESSLIST; usually shows

tried the same thing already (SHOW PROCESSLIST)ā€¦ nothing else was running while i was seeing the errorā€¦

i was issuing hardly 10-20 parallel primary key lookup queries, nothing too fancy and not too much load

What i also would do is:

  1. Run top on the nodes to see if one sees something weird
  2. Check data node logs to check how many threads are setup in the data nodes.

the only thing that was happening was i had 10 nodes of a rest service creating a connection pool (20) to the mysql cluster ā€¦ and i was hitting the rest service which intern uses the connection pool to query mysql and return the result.

but this was working with another mysql setup that we had ā€¦ so ideally not sure if this couls cause any issues here

Sounds like a very normal use case that normally should work perfectly fine.
But always good to look at CPUs to see if any abnormal behaviours are seen.

2021-10-02 12:49:00 [ndbd] INFO     -- Use automatic thread configuration
2021-10-02 12:49:00 [ndbd] INFO     -- Auto thread config uses:
 4 LDM threads,
 4 Query threads,
 4 tc threads,
 8 Recover threads,
 1 main threads,
 0 rep threads,
 2 recv threads,
 1 send threads
2021-10-02 12:49:00 [ndbd] INFO     -- Number of RR Groups = 1
Automatic Thread Config: LockExecuteThreadToCPU:  => parsed: main={cpubind=12},ldm={cpubind=0},ldm={cpubind=1},ldm={cpubind=2},ldm={cpubind=3},recv={cpubind=13},recv={cpubind=14},tc={cpubind=4},tc={cpubind=6},tc={cpubind=7},tc={cpubind=15},send={cpubind=5},query={cpubind=8},query={cpubind=9},query={cpubind=10},query={cpubind=11}

I presume that the data node and MGM server have no other processes running on the VM?

nope no other process are runnning on the MGM and data nodesā€¦ even the mysql nodes have no other processes running.

running top confirms the sameā€¦ nothing abnormal running that is hogging up CPU and memory

Let me check if there are any interesting ndbinfo tables that we can query to see anything interesting

First letā€™s do the basic stuff
Connect to the MGM client and run the show command while running the load and see if all nodes shows up as connected or if there are glitches there

the output of show looks goodā€¦ all nodes show up as connected :+1:

Next using a MySQL client connect to ndbinfo data base through:
use ndbinfo
Check contents of process table:
select * from processes;

mysql> select * from ndbinfo.processes;
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
| node_id | node_type | node_version  | process_id | angel_process_id | process_name | service_URI                            |
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
|       1 | NDB       | RonDB-21.04.1 |      11440 |            11439 | ndbmtd       | ndb://xxx.xxx.xxx.xxx                    |
|       2 | NDB       | RonDB-21.04.1 |      11288 |            11287 | ndbmtd       | ndb://xxx.xxx.xxx.xxx                     |
|      65 | MGM       | RonDB-21.04.1 |       7545 |             NULL | ndb_mgmd     | ndb://xxx.xxx.xxx.xxx:1186               |
|      67 | API       | RonDB-21.04.1 |      23347 |             NULL | mysqld       | mysql://xxx.xxx.xxx.xxx:3306/?server-id=1 |
|      68 | API       | RonDB-21.04.1 |      19175 |             NULL | mysqld       | mysql://xxx.xxx.xxx.xxx:3306/?server-id=1 |
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
5 rows in set (0.06 sec)

mysql>

The following query might be interesting to see how latencies look:
select * from tc_time_track_stats where node_id = 2 and block_instance = 1;