Got error 4009 'Cluster Failure' from NDBCLUSTER

ruturaj · October 5, 2021, 6:55pm

mysql> insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)

mysql>
mysql> select * from t1;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> DROP TABLE t1;
Query OK, 0 rows affected (0.24 sec)

mysql>
mysql>

working as expected when i am running the simple commands

mikaelronstrom · October 5, 2021, 6:57pm

Ok, one more question, how does the config.ini looks in the API/MYSQLD
node parts

ruturaj · October 5, 2021, 6:58pm

[mysqld]
NodeId=67
Hostname=xxx.xxx.xxx.xxx

[mysqld]
NodeId=68
Hostname=xxx.xxx.xxx.xxx

[api]
NodeId=231

mikaelronstrom · October 5, 2021, 6:59pm

Ok, so we can conclude that the cluster is up and running, and serves query.
So then we need to understand what happens when you run queries against it.

mikaelronstrom · October 5, 2021, 7:00pm

What I sometimes do then is that I issue SHOW PROCESSLIST; once per second to see how many concurrent queries it is running and what queries there are.

mikaelronstrom · October 5, 2021, 7:00pm

Could be some kind of overload situations possibly

mikaelronstrom · October 5, 2021, 7:01pm

What is the type of queries that you issue against the cluster and how many parallel queries. This is what this SHOW PROCESSLIST; usually shows

ruturaj · October 5, 2021, 7:03pm

tried the same thing already (SHOW PROCESSLIST)… nothing else was running while i was seeing the error…

i was issuing hardly 10-20 parallel primary key lookup queries, nothing too fancy and not too much load

mikaelronstrom · October 5, 2021, 7:05pm

What i also would do is:

Run top on the nodes to see if one sees something weird
Check data node logs to check how many threads are setup in the data nodes.

ruturaj · October 5, 2021, 7:06pm

the only thing that was happening was i had 10 nodes of a rest service creating a connection pool (20) to the mysql cluster … and i was hitting the rest service which intern uses the connection pool to query mysql and return the result.

but this was working with another mysql setup that we had … so ideally not sure if this couls cause any issues here

mikaelronstrom · October 5, 2021, 7:08pm

Sounds like a very normal use case that normally should work perfectly fine.
But always good to look at CPUs to see if any abnormal behaviours are seen.

ruturaj · October 5, 2021, 7:10pm

2021-10-02 12:49:00 [ndbd] INFO     -- Use automatic thread configuration
2021-10-02 12:49:00 [ndbd] INFO     -- Auto thread config uses:
 4 LDM threads,
 4 Query threads,
 4 tc threads,
 8 Recover threads,
 1 main threads,
 0 rep threads,
 2 recv threads,
 1 send threads
2021-10-02 12:49:00 [ndbd] INFO     -- Number of RR Groups = 1
Automatic Thread Config: LockExecuteThreadToCPU:  => parsed: main={cpubind=12},ldm={cpubind=0},ldm={cpubind=1},ldm={cpubind=2},ldm={cpubind=3},recv={cpubind=13},recv={cpubind=14},tc={cpubind=4},tc={cpubind=6},tc={cpubind=7},tc={cpubind=15},send={cpubind=5},query={cpubind=8},query={cpubind=9},query={cpubind=10},query={cpubind=11}

mikaelronstrom · October 5, 2021, 7:11pm

I presume that the data node and MGM server have no other processes running on the VM?

ruturaj · October 5, 2021, 7:13pm

nope no other process are runnning on the MGM and data nodes… even the mysql nodes have no other processes running.

running top confirms the same… nothing abnormal running that is hogging up CPU and memory

mikaelronstrom · October 5, 2021, 7:15pm

Let me check if there are any interesting ndbinfo tables that we can query to see anything interesting

mikaelronstrom · October 5, 2021, 7:17pm

First let’s do the basic stuff
Connect to the MGM client and run the show command while running the load and see if all nodes shows up as connected or if there are glitches there

ruturaj · October 5, 2021, 7:19pm

the output of show looks good… all nodes show up as connected

mikaelronstrom · October 5, 2021, 7:19pm

Next using a MySQL client connect to ndbinfo data base through:
use ndbinfo
Check contents of process table:
select * from processes;

ruturaj · October 5, 2021, 7:22pm

mysql> select * from ndbinfo.processes;
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
| node_id | node_type | node_version  | process_id | angel_process_id | process_name | service_URI                            |
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
|       1 | NDB       | RonDB-21.04.1 |      11440 |            11439 | ndbmtd       | ndb://xxx.xxx.xxx.xxx                    |
|       2 | NDB       | RonDB-21.04.1 |      11288 |            11287 | ndbmtd       | ndb://xxx.xxx.xxx.xxx                     |
|      65 | MGM       | RonDB-21.04.1 |       7545 |             NULL | ndb_mgmd     | ndb://xxx.xxx.xxx.xxx:1186               |
|      67 | API       | RonDB-21.04.1 |      23347 |             NULL | mysqld       | mysql://xxx.xxx.xxx.xxx:3306/?server-id=1 |
|      68 | API       | RonDB-21.04.1 |      19175 |             NULL | mysqld       | mysql://xxx.xxx.xxx.xxx:3306/?server-id=1 |
+---------+-----------+---------------+------------+------------------+--------------+----------------------------------------+
5 rows in set (0.06 sec)

mysql>

mikaelronstrom · October 5, 2021, 7:25pm

The following query might be interesting to see how latencies look:
select * from tc_time_track_stats where node_id = 2 and block_instance = 1;