WEBVTT 00:00.000 --> 00:09.000 So, I'll skip this side. 00:09.000 --> 00:16.000 So, most of you are probably familiar with the way mySQL and Postgres to replication. 00:16.000 --> 00:19.000 So, they have a single node. Everybody writes that node. 00:19.000 --> 00:21.000 The transactions are committed. 00:21.000 --> 00:24.000 It's written to the binlog in mySQL. 00:24.000 --> 00:27.000 And in Postgres, there are like 500 ways to do it. 00:27.000 --> 00:29.000 The right-ahead log being one of them. 00:29.000 --> 00:32.000 But the concept is roughly the same. 00:32.000 --> 00:34.000 So, the transaction commits. 00:34.000 --> 00:37.000 So, you can think of it as a queue with a single producer. 00:37.000 --> 00:39.000 Multiple consumers. That's the abstraction. 00:39.000 --> 00:44.000 New big deal. Quite straightforward. 00:44.000 --> 00:46.000 Oh, sorry. 00:46.000 --> 00:53.000 TIDB is the database that developed by Pink App. 00:53.000 --> 00:58.000 So, it is a multi-writer, but not in the sense of Postgres and MySQL. 00:58.000 --> 01:01.000 It is a disaggregated compute and stored. 01:01.000 --> 01:06.000 So, you can have n number of nodes, n number of compute nodes for SQL, for example. 01:06.000 --> 01:09.000 And n number of nodes for storage. 01:09.000 --> 01:13.000 It doesn't have the concept of a primary node. 01:13.000 --> 01:16.000 So, the data is automatically sharded. 01:16.000 --> 01:19.000 And the shards are spread across the storage nodes. 01:19.000 --> 01:23.000 And each of those storage nodes, think of it as a page conceptually. 01:23.000 --> 01:26.000 It's called a region or a tablet in Spanner. 01:26.000 --> 01:28.000 It's inspired by Spanner. 01:28.000 --> 01:32.000 And each of the pages is a rough group. 01:32.000 --> 01:35.000 So, the scale at which TIDB is usually used is about, 01:35.000 --> 01:39.000 let's say, 250 terabytes is quite common. 01:39.000 --> 01:45.000 And with the default page size of 256, that's roughly about a million pages. 01:45.000 --> 01:48.000 So, that's a million rough groups. 01:48.000 --> 01:53.000 Each page is made up of three other pages that are part of a rough group. 01:53.000 --> 02:00.000 So, for TIDB, it's not as simple as Postgres or MySQL. 02:00.000 --> 02:02.000 So, you're mental model has to change. 02:02.000 --> 02:06.000 And if you understand that part of the rest of whatever, 02:06.000 --> 02:08.000 I have on the slides will make more sense. 02:08.000 --> 02:10.000 So, that's the challenge we are solving. 02:10.000 --> 02:15.000 And the other thing that I want to highlight that is very different from both of these, 02:15.000 --> 02:22.000 TIDB does online schema change, which means unlike MySQL and Postgres. 02:22.000 --> 02:25.000 The replication doesn't block. 02:25.000 --> 02:31.000 But in MySQL and Postgres, once it's a DDL, you have to wait till the DDL is finished. 02:31.000 --> 02:33.000 And then the rest of the changes move. 02:33.000 --> 02:35.000 TIDB doesn't work like this. 02:35.000 --> 02:38.000 In TIDB, DDL is online. 02:38.000 --> 02:40.000 It's happening in the background. 02:40.000 --> 02:46.000 And so, if the changes coming in the replication stream from changes to the same table or other table, 02:46.000 --> 02:49.000 the replication stream is still being propagated. 02:49.000 --> 02:51.000 So, that's how TIDB works. 02:51.000 --> 02:56.000 So, for TIDB, all this happens external to the storage nodes. 02:56.000 --> 03:02.000 So, that's entire CDC is a distributed system in itself, 03:02.000 --> 03:05.000 and HA distributed system in itself. 03:05.000 --> 03:10.000 It's not just connecting one socket to another socket running somewhere and just streaming TIDB. 03:10.000 --> 03:11.000 It's not as simple as that. 03:11.000 --> 03:16.000 There is an entire HA system that has to store some of the state. 03:16.000 --> 03:27.000 So, that the back pressure, because the nodes that are producing the data also have their own draft logs. 03:27.000 --> 03:30.000 And you don't want to block them because you can't read the data fast enough. 03:30.000 --> 03:32.000 So, they have to store some state. 03:32.000 --> 03:37.000 And if they crash, they have to resume and not have to start from the last checkpoint. 03:37.000 --> 03:40.000 So, it's a far more complicated and complex problem. 03:40.000 --> 03:46.000 So, if you get that mental model correct, the rest of the slides will make more sense hopefully. 03:46.000 --> 03:49.000 So, in this diagram, you can see lots of data coming in. 03:49.000 --> 03:52.000 They're all writing to the draft nodes or TIDB nodes. 03:52.000 --> 03:56.000 Then those events are collected and they are sorted. 03:57.000 --> 04:06.000 The two main events that are important parts of the stream are the sorting because you have multiple writers writing. 04:06.000 --> 04:14.000 So, you need to order those events so that when you send them downstream, they receive them in completely ordered fashion. 04:14.000 --> 04:18.000 So, that's very important. That's a huge cost in this system. 04:18.000 --> 04:22.000 The other thing is, you have to handle the schema changes. 04:22.000 --> 04:28.000 So, I'll get to that in the rest of the slides. 04:28.000 --> 04:32.000 So, how does it plug into the system? 04:32.000 --> 04:35.000 It uses an observer pattern. 04:35.000 --> 04:42.000 So, the two parts of the rough protocol, the way I can be uses it. 04:42.000 --> 04:45.000 So, you get the events coming into the rough log. 04:45.000 --> 04:49.000 Once they're committed, then they're applied locally on the storage nodes. 04:49.000 --> 04:55.000 So, you put an observer in there that is observing these events and then it's pushing it to the CDC system. 04:55.000 --> 04:58.000 That's the mental picture you must have. 04:58.000 --> 05:03.000 That's how it does it. 05:03.000 --> 05:09.000 So, this is, I want to highlight this up front and rather than do it later, 05:09.000 --> 05:15.000 because this is very important for how to explain one of the bigger challenges that CDC solves. 05:15.000 --> 05:19.000 So, assuming you have an insert, then you have, let's say, an alter column, 05:19.000 --> 05:21.000 and then you have another insert. 05:21.000 --> 05:26.000 You have to set the barrier for when the DDL takes place. 05:26.000 --> 05:30.000 So, one of the things in distributed systems is, I mean even standalone system. 05:30.000 --> 05:33.000 You need some kind of monotonically increasing counter. 05:33.000 --> 05:37.000 If you have that counter and you can have a strict less than relation, 05:37.000 --> 05:42.000 or which in distributed systems would be happens before, you can solve most of these problems. 05:42.000 --> 05:46.000 And TIDB does that through a what's called a TESO or a time stamp. 05:46.000 --> 05:53.000 So, it has a component called PD, which generates this global time monotonically increasing time stamp. 05:53.000 --> 05:56.000 And that is unique and it's stamped on to every transaction. 05:56.000 --> 06:02.000 So, the second thing you need in any database or transaction or distributed system is, 06:02.000 --> 06:05.000 you need to know when that transaction started. 06:05.000 --> 06:09.000 And when it was committed, or when the changes were externalized. 06:10.000 --> 06:13.000 So, you want to propagate what was externalized. 06:13.000 --> 06:17.000 You don't want to send data across that hasn't been committed yet, 06:17.000 --> 06:21.000 because downstream could be a system that is not TIDB. 06:21.000 --> 06:23.000 TIDB can probably handle it. 06:23.000 --> 06:27.000 But let's say we are sending data to, I don't know, 06:27.000 --> 06:30.000 somebody writes a converter for mySQL or one of those things. 06:30.000 --> 06:35.000 And that may just work on ordered committed transactions. 06:36.000 --> 06:39.000 So, once you're armed with these three things, 06:39.000 --> 06:42.000 the start commit and like a, 06:42.000 --> 06:45.000 some kind of counter in this time stamp or any counter, 06:45.000 --> 06:48.000 which is unique in your cluster. 06:48.000 --> 06:51.000 Conceptually there it becomes very easy. 06:51.000 --> 06:56.000 You read the events, you do a happens before and you do a sort and Bob Jarenkel. 06:56.000 --> 06:57.000 It's quite straightforward. 06:57.000 --> 06:59.000 So, conceptually it's not so difficult. 06:59.000 --> 07:02.000 The engineering of this is quite difficult. 07:02.000 --> 07:06.000 It's the scale it's HA, these are the difficult problems. 07:06.000 --> 07:09.000 And handling things like back pressure. 07:09.000 --> 07:12.000 Because as the rough log is being created, 07:12.000 --> 07:14.000 there's a lot of space and other pressure. 07:14.000 --> 07:17.000 So, you have almost like a garbage collector that has to 07:17.000 --> 07:19.000 truncate those rough logs. 07:19.000 --> 07:21.000 So, otherwise they just don't keep drawing. 07:21.000 --> 07:25.000 But you can't truncate it unless the changes have been pushed across to your CDC 07:25.000 --> 07:27.000 and across to wherever they have to go. 07:27.000 --> 07:29.000 So, that's the challenge here. 07:29.000 --> 07:34.000 Once you have the time stamp, you know that anything that comes, 07:34.000 --> 07:39.000 that are cannot push, that changes that were impacted by the time stamp. 07:39.000 --> 07:43.000 We have a log service as part of this. 07:43.000 --> 07:45.000 It sort of squirrels it away on the side. 07:45.000 --> 07:49.000 It waits for the barrier as that DDL barrier time stamp 07:49.000 --> 07:53.000 to be equal to whatever is up to the dissolved time stamp. 07:53.000 --> 07:56.000 And then it's not pushing the entries after that. 07:56.000 --> 07:59.000 But anything else that is not touched by this is being replicated, 07:59.000 --> 08:04.000 which is not the case in my SQL and Postgres. 08:04.000 --> 08:06.000 So, that's why I wanted to highlight. 08:06.000 --> 08:11.000 This is a very important part of what it achieves and what it does. 08:11.000 --> 08:15.000 So, we also learnt our lesson. 08:15.000 --> 08:18.000 The current architecture just didn't happen, 08:18.000 --> 08:20.000 like some kind of a maculate conception. 08:20.000 --> 08:22.000 It was a lot of pain to get to this. 08:22.000 --> 08:25.000 So, you have to start somewhere. 08:25.000 --> 08:27.000 Nobody was an expert at this. 08:27.000 --> 08:29.000 This was a new challenge for everybody. 08:29.000 --> 08:33.000 So, the first solution was, okay, we have a node. 08:33.000 --> 08:36.000 It reads the data, sorts in memory. 08:36.000 --> 08:40.000 And it had a polling mechanism. 08:40.000 --> 08:43.000 Most of our frontend software is in Go. 08:43.000 --> 08:44.000 So, this is in Go. 08:44.000 --> 08:47.000 So, they all use Go channels and whatever else Go has. 08:47.000 --> 08:48.000 Sorted in memory. 08:48.000 --> 08:50.000 Then they had ETCD in the background for HA, 08:50.000 --> 08:52.000 where the metadata was stored. 08:52.000 --> 08:55.000 And these worker nodes would poll data. 08:55.000 --> 08:57.000 Do whatever it takes. 08:57.000 --> 08:59.000 But the idea is growth. 08:59.000 --> 09:02.000 And the kind of data that the customers are storing. 09:02.000 --> 09:05.000 We would get OEM messages and all kinds of other problems. 09:05.000 --> 09:06.000 I didn't scale. 09:06.000 --> 09:11.000 Second step was, okay, you can have a hybrid spillover to disk. 09:11.000 --> 09:14.000 Then suddenly people had 6 million tables. 09:14.000 --> 09:17.000 We would run out of file handles. 09:18.000 --> 09:20.000 Third, final solution. 09:20.000 --> 09:23.000 Before cockroach DB, change the license. 09:23.000 --> 09:24.000 We have a fork. 09:24.000 --> 09:25.000 We write it into LSM. 09:25.000 --> 09:26.000 It's written in Go. 09:26.000 --> 09:27.000 So, it works quite well. 09:27.000 --> 09:28.000 And you can index it. 09:28.000 --> 09:29.000 So, current model is. 09:29.000 --> 09:34.000 It uses Pable DB to store the data. 09:34.000 --> 09:37.000 So, if you have a lot of pages. 09:37.000 --> 09:40.000 As I mentioned, so let's assume we have a million regions. 09:40.000 --> 09:42.000 Across three nodes. 09:42.000 --> 09:45.000 You will probably need more nodes, but let's say three nodes. 09:45.000 --> 09:51.000 So, if you are attaching to, that roughly is about 330k pages per node. 09:51.000 --> 09:55.000 Now, I imagine you have to do an observer on all of these. 09:55.000 --> 09:58.000 So, you have to have 330,000 observers. 09:58.000 --> 10:01.000 Each with an independent stream trying to stream it. 10:01.000 --> 10:02.000 It won't scale. 10:02.000 --> 10:05.000 So, this is one of the channels. 10:05.000 --> 10:08.000 And then you don't want to poll after some. 10:08.000 --> 10:11.000 It's just, it just doesn't work. 10:11.000 --> 10:13.000 But conceptually it will work. 10:13.000 --> 10:14.000 It doesn't scale. 10:14.000 --> 10:16.000 That's the hard engineering problem. 10:16.000 --> 10:17.000 Memory pressure. 10:17.000 --> 10:19.000 Huge problem. 10:19.000 --> 10:20.000 CPU overhead. 10:20.000 --> 10:23.000 So, the CPU overhead also will get into the pipeline later. 10:23.000 --> 10:26.000 So, these are real bugs that we had. 10:26.000 --> 10:31.000 So, these are this. 10:31.000 --> 10:33.000 So, sit down and think. 10:33.000 --> 10:35.000 You need decentralization. 10:35.000 --> 10:39.000 So, even though today, all this microservices is not fashionable. 10:39.000 --> 10:41.000 For this, you need something like that. 10:41.000 --> 10:44.000 So, that you can scale independent components. 10:44.000 --> 10:53.000 Especially the part that does the conversion from the internal format to the SQL or whatever 10:53.000 --> 10:56.000 raw format that is required by whatever your downstream is. 10:56.000 --> 11:00.000 That is a very heavy compute operation. 11:00.000 --> 11:07.000 And you want to be able to spread it as much as you can in a service that you can have multiple services running 11:08.000 --> 11:11.000 which read in parallel from some store. 11:11.000 --> 11:15.000 You want it to be event driven without polling. 11:15.000 --> 11:21.000 Polling even in a general programming sense is works where you can hammer the system with 11:21.000 --> 11:22.000 and always keep it busy. 11:22.000 --> 11:24.000 So, that polling, there is no busy way. 11:24.000 --> 11:26.000 But it's not always the case. 11:26.000 --> 11:28.000 So, polling puts a threshold. 11:28.000 --> 11:31.000 You can't go faster than that. 11:31.000 --> 11:34.000 If the events are not faster coming fast enough. 11:35.000 --> 11:38.000 And so, PSAP operation can serve the extended engineering. 11:38.000 --> 11:42.000 So, you would be divided this whole thing into four abstracts of the things. 11:42.000 --> 11:48.000 There is an upstream adapter that talks to the Tai KV. 11:48.000 --> 11:51.000 We have a log service which is stateful. 11:51.000 --> 11:53.000 A downstream adapter which is with a conversion. 11:53.000 --> 11:56.000 And the coordinator is what coordinates the cluster. 11:56.000 --> 12:01.000 So, they are like four main services. 12:01.000 --> 12:03.000 So, if you go a little bit deeper. 12:03.000 --> 12:06.000 As I mentioned, you need the timestamp. 12:06.000 --> 12:11.000 You need the watermark which is up to where the progress is on the rough log. 12:11.000 --> 12:13.000 You need global aggregation. 12:13.000 --> 12:15.000 You need to know across your entire cluster. 12:15.000 --> 12:20.000 What the minimum is so that you don't delete anything which is. 12:20.000 --> 12:23.000 This is quite similar to how things like Aurora also work. 12:23.000 --> 12:25.000 With the no DBs under log. 12:25.000 --> 12:26.000 You can't. 12:26.000 --> 12:30.000 You have to look at where the read view is across your entire cluster. 12:30.000 --> 12:33.000 You can't just look at one node. 12:33.000 --> 12:36.000 So, you calculate that you use events. 12:36.000 --> 12:38.000 You sort them. 12:38.000 --> 12:44.000 And then transaction reconstruction is basically you map all the events of the transaction. 12:44.000 --> 12:47.000 All the changes to the transaction. 12:47.000 --> 12:48.000 You need. 12:48.000 --> 12:50.000 So, Mount there is misnamed. 12:50.000 --> 12:52.000 But that's what it is in the source code on GitHub. 12:52.000 --> 12:55.000 Mount there is like a transformer. 12:55.000 --> 12:57.000 And this is the compute heavy part. 12:57.000 --> 13:02.000 So, it takes raw data because you want to sort on your opaque data because it's more compact. 13:02.000 --> 13:04.000 And it's a big. 13:04.000 --> 13:05.000 It's easier to manage. 13:05.000 --> 13:09.000 Once it's converted into its SQL type, it becomes bigger. 13:09.000 --> 13:12.000 And it's more difficult to sort. 13:12.000 --> 13:18.000 So, the amounter it's a transformer is the part that you can have multiple. 13:18.000 --> 13:23.000 And you can do lots in parallel. 13:23.000 --> 13:25.000 So, this is what it roughly looks like. 13:25.000 --> 13:30.000 You have a log service where all the chain feeds are coming in. 13:30.000 --> 13:32.000 It writes into PableDB. 13:32.000 --> 13:35.000 You don't do any polling. 13:35.000 --> 13:38.000 And the number of tables is no longer a problem. 13:38.000 --> 13:41.000 We don't store them based on table. 13:41.000 --> 13:48.000 We just store the events based on the changes to the pages and the transactions. 13:48.000 --> 13:51.000 So, some of the other optimizations we have to make, 13:51.000 --> 13:55.000 where that if you look at the number two, 13:55.000 --> 14:01.000 rather than have like the naive approach of multiple connections for each raft, 14:01.000 --> 14:06.000 you just have one connection and then you multi-plex it. 14:06.000 --> 14:11.000 And you do a connection per node rather than connection per that will never scale it. 14:11.000 --> 14:16.000 But you have to start somewhere and so that's how it started. 14:16.000 --> 14:21.000 So, you also need to order some of the events. 14:21.000 --> 14:24.000 For that internally, it's like implementation detail. 14:24.000 --> 14:26.000 It uses, I think, a go-be tree. 14:26.000 --> 14:28.000 Some be tree implementation grow. 14:28.000 --> 14:32.000 Also, you can put bounds on the memory that you use because now your storage. 14:32.000 --> 14:38.000 And you have the ability to put back pressure and slow the ingestion of your service. 14:38.000 --> 14:44.000 But you don't want any of the services to die. 14:44.000 --> 14:48.000 You're much better off telling the service to slow down. 14:48.000 --> 14:52.000 And that the user just needs to put more resources so that they can fix it. 14:52.000 --> 14:56.000 That's the basic idea of back pressure. 14:56.000 --> 15:00.000 I'll skip that part. 15:00.000 --> 15:05.000 So, this is the part about the transformer. 15:05.000 --> 15:11.000 So, the rough idea that I gave is the storage node. 15:11.000 --> 15:14.000 You have the puller is also misnamed. 15:14.000 --> 15:15.000 It's actually pushed. 15:15.000 --> 15:18.000 Then you sort the data, then transform the data. 15:18.000 --> 15:22.000 And the sink is where it receives the data, which does a more magic. 15:22.000 --> 15:24.000 And then it sends it downstream. 15:24.000 --> 15:26.000 So, what does it do? 15:26.000 --> 15:30.000 It decodes the key value pairs, which is what the storage knows about. 15:30.000 --> 15:32.000 Storage doesn't know about SQL. 15:32.000 --> 15:36.000 It also needs to know the schema. 15:36.000 --> 15:40.000 Because schema change is online, you know, 15:40.000 --> 15:44.000 Thikevi can have multiple schemas. 15:44.000 --> 15:49.000 And so, you shouldn't have to go back to the server to check the schema. 15:49.000 --> 15:50.000 That's an additional cost. 15:50.000 --> 15:54.000 You want a local version or cache of a schema, which is, 15:54.000 --> 15:56.000 you can apply an update. 15:56.000 --> 16:00.000 And when you see the DDL, you do the update and then you use the latest schema. 16:00.000 --> 16:05.000 Because some of your data will be working on old schema and some will be on the new schema. 16:05.000 --> 16:09.000 So, you have to maintain a local schema. 16:09.000 --> 16:16.000 So, so by decoupling the whole encoding logic thing, 16:16.000 --> 16:18.000 you can independently scale. 16:18.000 --> 16:22.000 And that's where the microservices thing comes about. 16:22.000 --> 16:24.000 So, how do you start all this? 16:24.000 --> 16:27.000 Because it's ETCD, when you start the system up, 16:27.000 --> 16:29.000 the instance is a register. 16:29.000 --> 16:31.000 And these are all the different services. 16:31.000 --> 16:33.000 They register with ETCD. 16:33.000 --> 16:37.000 They elect a coordinator that does more like housekeeping things. 16:37.000 --> 16:39.000 It doesn't do anything much more than that. 16:39.000 --> 16:42.000 And then you schedule the change fields that's going to talk to Tikevi 16:42.000 --> 16:46.000 and just get the whole system running. 16:46.000 --> 16:48.000 So, because it's an HSA system, 16:48.000 --> 16:50.000 it also has to do all these other tasks. 16:50.000 --> 16:52.000 Like dynamic registration. 16:52.000 --> 16:54.000 It has to do automatic recovery. 16:54.000 --> 16:56.000 If some service fails. 16:56.000 --> 16:59.000 So, that's the other aspect of this. 16:59.000 --> 17:02.000 So, that's, it's a complete system. 17:02.000 --> 17:04.000 It's not just like connecting sockets. 17:04.000 --> 17:06.000 That's the point I wanted to make. 17:06.000 --> 17:08.000 So, when there are failures, 17:08.000 --> 17:11.000 you know, when you're running the scale at which Tikevi runs, 17:11.000 --> 17:13.000 there are nodes going up and down. 17:13.000 --> 17:14.000 There's always some kind of problem. 17:14.000 --> 17:16.000 But you don't want the service to be affected. 17:16.000 --> 17:19.000 So, it handles all things like brain split 17:19.000 --> 17:24.000 and all the other things that come with any distributed systems. 17:24.000 --> 17:25.000 Also, it can do. 17:25.000 --> 17:27.000 So, in version one, 17:27.000 --> 17:29.000 as I mentioned, the crash recovery had. 17:30.000 --> 17:32.000 There was no intermediate state. 17:32.000 --> 17:34.000 The crash recovery had to go all the way back 17:34.000 --> 17:36.000 through a whole intermediate state 17:36.000 --> 17:39.000 and that RP over Tikevi forever. 17:39.000 --> 17:41.000 So, now, because it has local state, 17:41.000 --> 17:42.000 it gets a leader. 17:42.000 --> 17:44.000 It knows I need to get the latest data from 17:44.000 --> 17:46.000 such-and-such log service that's running 17:46.000 --> 17:49.000 and then it rebuilds from where it left off 17:49.000 --> 17:52.000 and then carries on from there. 17:54.000 --> 17:55.000 These are more or less. 17:55.000 --> 17:56.000 So, upgrade down here. 17:56.000 --> 17:58.000 It can handle partition tables quite easily. 17:59.000 --> 18:00.000 Let's skip this. 18:00.000 --> 18:02.000 Otherwise, I'll run over time. 18:04.000 --> 18:06.000 Yeah, this is also good enough. 18:06.000 --> 18:09.000 So, I've mentioned all this, 18:09.000 --> 18:11.000 but I just want to go over this again. 18:11.000 --> 18:14.000 So, the schema is also stored in the state service, 18:14.000 --> 18:15.000 which is the log service. 18:15.000 --> 18:17.000 The events are also there. 18:17.000 --> 18:19.000 The table, so it, 18:19.000 --> 18:21.000 table stream you can ignore. 18:21.000 --> 18:25.000 So, any local event in that cluster is also stored here. 18:26.000 --> 18:28.000 So, there's no metadata in ETCD. 18:28.000 --> 18:31.000 This is as the state of the system. 18:33.000 --> 18:36.000 So, the downstream adapter is for the events 18:36.000 --> 18:38.000 that are coming from the log service. 18:38.000 --> 18:41.000 It's more or less like a reader for the log service. 18:41.000 --> 18:45.000 Then talks to the downstream adapter. 18:48.000 --> 18:50.000 What data has mentioned just does, 18:50.000 --> 18:53.000 it's internal CTC node changes. 18:54.000 --> 18:56.000 So, what can it do? 18:56.000 --> 19:00.000 You can do massive change feeds and scales very easily. 19:00.000 --> 19:03.000 It's designed to scale by adding more services. 19:03.000 --> 19:08.000 Large tables can be split for more efficient storage in the service. 19:08.000 --> 19:12.000 Transaction integrity is preserved because the entire time stamp 19:12.000 --> 19:14.000 is exactly the same that comes from the cluster. 19:14.000 --> 19:17.000 So, there is no ambiguity anywhere in the pipeline way. 19:17.000 --> 19:22.000 You don't know how to order the events or order the transactions. 19:24.000 --> 19:29.000 So, it also supports go plugins, other hooks, 19:29.000 --> 19:33.000 and it has an extensible architecture and we have connectors for Kafka 19:33.000 --> 19:38.000 and all sorts of division and whatever else is the fashion of the day. 19:38.000 --> 19:41.000 So, how much time for QA? 19:41.000 --> 19:43.000 Excellent. 19:43.000 --> 19:44.000 Okay. 19:44.000 --> 19:47.000 So, anyway, there's the GitHub URL. 19:47.000 --> 19:50.000 So, anybody wants to look at the code, 19:50.000 --> 19:53.000 contribute, learn, go for it. 19:53.000 --> 19:55.000 So, it scales linearly. 19:55.000 --> 19:58.000 It has very high throughput. 19:58.000 --> 20:01.000 It's a clear architecture relative to what it was before. 20:01.000 --> 20:06.000 And it uses the last city of the cloud to scale very easily. 20:06.000 --> 20:08.000 It's quite easy. 20:08.000 --> 20:09.000 So, that's it. 20:09.000 --> 20:10.000 Thank you. 20:10.000 --> 20:11.000 Any questions? 20:11.000 --> 20:18.000 Anyone? 20:18.000 --> 20:26.000 Anyone? 20:26.000 --> 20:31.000 Yeah? 20:31.000 --> 20:35.000 Yeah, it's got all the checks on everything. 20:35.000 --> 20:37.000 Yes, that means. 20:37.000 --> 20:42.000 This is used by people who run very large. 20:42.000 --> 20:45.000 Oh, how do you check the quality of the data? 20:45.000 --> 20:50.000 So, it has all the other details like checks on whatever else is required. 20:50.000 --> 20:52.000 Yes. 20:52.000 --> 20:53.000 Okay. 20:53.000 --> 20:54.000 Thank you.