WEBVTT 00:00.000 --> 00:08.000 Well, thank you so much. 00:08.000 --> 00:14.000 So, over the last couple of years, I've been leaning more and more into a lot of, like, 00:14.000 --> 00:19.000 I don't even say more advanced, but like, beyond tables in Postgres. 00:19.000 --> 00:23.000 So, like, views and functions and triggers and stuff. 00:23.000 --> 00:28.000 And I hit a wall when I was trying to figure out the best way to migrate my schema. 00:28.000 --> 00:32.000 And so, I started to go digging into how I wanted to do that. 00:32.000 --> 00:36.000 And I ended up rolling my own. 00:36.000 --> 00:40.000 This talk is not about the tool that migrates my schema. 00:40.000 --> 00:44.000 But it's about how I was able to solve a really critical question. 00:44.000 --> 00:51.000 As part of that, which is, how do I end the tool that truly understands the Postgres schema? 00:51.000 --> 00:57.000 Not just like running the, you know, SQL files, not running migrations against it, not storing it. 00:57.000 --> 00:59.000 But like, actually understanding it. 00:59.000 --> 01:03.000 What objects exist? How do they relate? And what depends on what? 01:03.000 --> 01:05.000 So, I know I had two options. 01:05.000 --> 01:09.000 I could write an SQL parser. 01:09.000 --> 01:14.000 I could parse those, create table, create function, create view, all of that. 01:14.000 --> 01:18.000 Handle all the edge cases and keep up to date with every Postgres release. 01:18.000 --> 01:23.000 Or, option B, I could just ask Postgres. 01:23.000 --> 01:26.000 It already parses SQL. That's kind of its whole job. 01:26.000 --> 01:32.000 And so, what if I let it do all the work and ask it what it knows? 01:32.000 --> 01:36.000 And also, I really, really didn't want to write a SQL parser. 01:36.000 --> 01:40.000 Because, like, what I have wanted to parse this, I mean, there's, look at that. 01:40.000 --> 01:43.000 We've got, like, a returns table with, like, four different arguments. 01:43.000 --> 01:46.000 Some of, there's sudden array with a default up there. 01:46.000 --> 01:49.000 There's a bunch of JSONB stuff down below. 01:49.000 --> 01:51.000 And even if you thought you could, like, skip over the whole thing. 01:51.000 --> 01:54.000 Notice that there's, like, an end for the begin atomic. 01:54.000 --> 01:56.000 And then there's also an end for the case statement. 01:56.000 --> 01:59.000 I knew I would just be messing the whole thing up the whole time. 01:59.000 --> 02:02.000 So, I didn't want to go down that path. 02:02.000 --> 02:06.000 And if you look at, like, what Postgres knows about a function like that. 02:06.000 --> 02:09.000 You know, it's got the name, it's got the argument types. 02:09.000 --> 02:13.000 It's got all these names. It's got the whole tree, whether it was stable. 02:13.000 --> 02:16.000 And that's kind of the information that I want to have. 02:16.000 --> 02:19.000 So, how do I go about actually setting that up? 02:19.000 --> 02:22.000 So, I was pretty simple. 02:22.000 --> 02:25.000 And I kind of leaned on this, like, shadow database pattern, 02:25.000 --> 02:29.000 which I've, so elegantly put up here in three steps, 02:29.000 --> 02:32.000 which is, you start with actual raw SQL files, 02:32.000 --> 02:36.000 so whatever you got, one file, many files, just whatever. 02:36.000 --> 02:39.000 And then you need some form of, like, a temporary postgres instance. 02:39.000 --> 02:42.000 So, a container would make sense, but, you know, 02:42.000 --> 02:46.000 if you just have some local running instance to, like, you can use that. 02:46.000 --> 02:51.000 And then you load it all in there, and you start in the system catalogs, 02:51.000 --> 02:56.000 which is super queryable, and you can get back everything you have. 02:56.000 --> 03:00.000 Of course, this does mean that you have to have a running postgres instance, 03:00.000 --> 03:05.000 but I'm kind of assuming you already do if you're doing some work with Postgres. 03:05.000 --> 03:07.000 Or at least have an easy way to access it. 03:07.000 --> 03:10.000 So, there's a whole bunch of system catalogs. 03:10.000 --> 03:13.000 You may have explored some of these before. 03:13.000 --> 03:15.000 I think there's, like, more than 50. 03:15.000 --> 03:18.000 You know, the big ones, PG classes like the big one. 03:18.000 --> 03:20.000 Anything that's, like, a table or, like, table, 03:20.000 --> 03:23.000 S, like, has columns or anything like that. 03:23.000 --> 03:25.000 The attributes or the, you know, columns on there, 03:25.000 --> 03:30.000 I'm not going to be this entire slide, but there's a lot of good stuff in there. 03:30.000 --> 03:34.000 And so, you can start, you know, querying them, 03:34.000 --> 03:36.000 and so you have a little table here, and you say, okay, 03:36.000 --> 03:39.000 what's going to be, you know, what are my columns, what are the defaults, 03:39.000 --> 03:42.000 and, you know, it takes a lot of reading the Postgres docs, 03:42.000 --> 03:46.000 but you can really start to understand some pretty rich details about 03:46.000 --> 03:47.000 about what you're looking at. 03:47.000 --> 03:50.000 And what's really cool is, you know, you're not even looking at the 03:50.000 --> 03:52.000 Create statement that would have created this table, 03:52.000 --> 03:55.000 but, like, I'm sure you can visualize in your head, like, 03:55.000 --> 03:56.000 what it would have looked like. 03:56.000 --> 03:59.000 I think everybody would have had slightly different formatting of what that 03:59.000 --> 04:01.000 Create statement would look like, but the, you know, 04:01.000 --> 04:03.000 the overall vibes are still there. 04:03.000 --> 04:06.000 And so, then if you've probably already done stuff like this before, 04:06.000 --> 04:07.000 so I won't belabor it. 04:07.000 --> 04:11.000 But the interesting bit is when we start getting into dependencies. 04:11.000 --> 04:13.000 Because order matters. 04:13.000 --> 04:15.000 You've probably seen something like this before. 04:15.000 --> 04:17.000 You go to, you know, drop a table, which, you know, 04:17.000 --> 04:19.000 we all do in production all the time. 04:19.000 --> 04:23.000 And Postgres says no, because it's a bunch of stuff depends on it. 04:23.000 --> 04:26.000 Or you try to, like, drop a type, because you want to, you know, 04:26.000 --> 04:27.000 do something with it. 04:27.000 --> 04:31.000 And it, again, says no, there's a whole bunch of dependencies on it. 04:31.000 --> 04:35.000 And so, if I'm trying to generate some TDL for, like, 04:35.000 --> 04:39.000 migrations or scheme a dump or whatever, that order really, really matters. 04:39.000 --> 04:43.000 And you can't create a view that references a function that doesn't exist yet. 04:43.000 --> 04:47.000 And I can't drop a table that a function still needs without using a cascade, 04:47.000 --> 04:49.000 but it's a little scary. 04:49.000 --> 04:52.000 And so, you know, we have this little dependency graph here, 04:52.000 --> 04:54.000 but how do we get it? 04:54.000 --> 04:57.000 Well, it turns out Postgres has the graph, 04:57.000 --> 04:59.000 and it's in something called PG-depend. 04:59.000 --> 05:03.000 And so, you know, Postgres wasn't just complaining 05:03.000 --> 05:05.000 retroactively when you try to drop things. 05:05.000 --> 05:08.000 It is storing the entire dependency graph in PG-depend. 05:08.000 --> 05:15.000 Every dependency is a row in this kind of view here that I've kind of adjusted a little bit, 05:15.000 --> 05:19.000 because those should be object IDs, and I put the names in there for readability here. 05:19.000 --> 05:25.000 But you can see that the pending order account function depends on the pending order's view, 05:25.000 --> 05:28.000 and the pending order's view depends on the order's status, 05:28.000 --> 05:29.000 and that's it. 05:29.000 --> 05:33.000 That's our whole chain, and we can look right there. 05:33.000 --> 05:36.000 And so, from there, to figure out some ordering, 05:36.000 --> 05:40.000 the graph gives you all the edges, and then you can just do a little topological sort. 05:40.000 --> 05:42.000 You go one direction if you need to do pre-ates. 05:42.000 --> 05:45.000 You go the other direction if you need to do some drops. 05:45.000 --> 05:50.000 However, there are a few nuances that you should be aware of. 05:50.000 --> 05:54.000 There was a column a couple slides ago that I kind of glossed over, 05:54.000 --> 05:56.000 and that was this dependency type. 05:56.000 --> 05:59.000 And there's more than just the four I have labeled here, 05:59.000 --> 06:02.000 but I'll go through them kind of quickly. 06:02.000 --> 06:04.000 So N is for normal. 06:04.000 --> 06:06.000 These are like the real dependencies between the objects. 06:06.000 --> 06:08.000 You created something, it depends on something else. 06:08.000 --> 06:11.000 You know, the kind of examples I gave before. 06:11.000 --> 06:14.000 The views depend on a function, it depends on table, whatever. 06:14.000 --> 06:17.000 Now the next one to start getting a little more interesting. 06:17.000 --> 06:18.000 A is automatic. 06:18.000 --> 06:22.000 Those are going to be like other objects that you could drop yourself, 06:22.000 --> 06:24.000 but they're created together. 06:24.000 --> 06:27.000 So like if you have a serial column, it creates a sequence, 06:27.000 --> 06:31.000 or there's a primary key has an index. 06:31.000 --> 06:34.000 So they're created together and they're kind of linked. 06:34.000 --> 06:36.000 And then there's I for internal. 06:36.000 --> 06:39.000 This is going to be stuff like your toast or some of the array types, 06:39.000 --> 06:42.000 which we'll get to in a moment, and then ease our extensions. 06:42.000 --> 06:47.000 And so like if you're querying something and the extension created a function, 06:47.000 --> 06:51.000 then you'll see that here. 06:51.000 --> 06:54.000 And so you start kind of going through these, 06:54.000 --> 06:57.000 and there are some catches, though. 06:57.000 --> 07:02.000 And so PG depend can't see stuff that's in string literals. 07:02.000 --> 07:07.000 By default, Postgres will enforce that the stuff exists at create time. 07:07.000 --> 07:11.000 But you generally can't create a function referencing a table that doesn't exist, 07:11.000 --> 07:13.000 but the dependency agent actually tracked. 07:13.000 --> 07:18.000 You could drop the table later and like PG depend won't care. 07:18.000 --> 07:22.000 And so for some of this stuff, there's really easy fix. 07:22.000 --> 07:24.000 If you're somewhat recent version of Postgres, 07:24.000 --> 07:26.000 which is just used begin atomic. 07:26.000 --> 07:30.000 And then it actually parses the whole query and stores it, 07:30.000 --> 07:32.000 and then it knows all of those things. 07:32.000 --> 07:35.000 It doesn't just sort as a string. 07:35.000 --> 07:40.000 So now we know that the function body depends on a whole bunch of things. 07:40.000 --> 07:44.000 That only works if you're using language SQL. 07:44.000 --> 07:49.000 So if you need PLPGSQL or any of the other things for any other reasons, 07:49.000 --> 07:51.000 then it's back to storing as a string. 07:51.000 --> 07:53.000 And so there's no fix for that. 07:53.000 --> 07:56.000 So you should use bigototomic, whether you can, 07:56.000 --> 07:59.000 but for everything else, you know, know the limits. 07:59.000 --> 08:02.000 And then I started to talk about this a little bit when we're talking about the dependency types. 08:02.000 --> 08:05.000 But there's implicit objects. 08:05.000 --> 08:08.000 So in this example here, you create a table. 08:08.000 --> 08:10.000 We have this serial primary key. 08:10.000 --> 08:13.000 And so behind the scenes, it's creating a constraint. 08:13.000 --> 08:16.000 It's creating an index and it's creating that sequence. 08:16.000 --> 08:18.000 And those are those dependency type A. 08:18.000 --> 08:22.000 At the end of the day, PG depend is mostly just that they're to yell at you 08:22.000 --> 08:24.000 if you try to drop something. 08:24.000 --> 08:30.000 And so these are A's, which means it'll just drop the other things if you do it won't stop you. 08:30.000 --> 08:33.000 But it's relevant information. 08:33.000 --> 08:35.000 And then there's array types. 08:35.000 --> 08:37.000 And these ones have a little bit of a different quirk. 08:37.000 --> 08:42.000 So like every type, you get as a array sibling. 08:42.000 --> 08:48.000 So if you create this order status, what you're actually going to see in the type catalog is you've 08:48.000 --> 08:49.000 loaded status. 08:49.000 --> 08:51.000 And then it's generally prefix with an underscore. 08:51.000 --> 08:54.000 But I think there's some sort of like collision detection things. 08:54.000 --> 08:55.000 So maybe it's different. 08:55.000 --> 08:59.000 Definitely don't like do like if it starts with an underscore thing. 08:59.000 --> 09:00.000 That's that's bad. 09:00.000 --> 09:02.000 But it creates these two. 09:02.000 --> 09:05.000 And that's that internal dependency type. 09:05.000 --> 09:14.000 And so what's interesting is now if I create a table that has something that uses that array type. 09:14.000 --> 09:17.000 Now I have to make two different jumps on PG depend. 09:17.000 --> 09:22.000 Because technically the table depends on that internal array type. 09:22.000 --> 09:29.000 And then internal array type looks at the actual type. 09:29.000 --> 09:33.000 So you've got to be able to follow those jumps there. 09:33.000 --> 09:39.000 And it's a somewhat similar story depending on what you need to do with the extension types as well. 09:39.000 --> 09:44.000 And so you also want to be a little bit honest with some of the trade-offs with this approach. 09:44.000 --> 09:46.000 So you do need money postgres. 09:46.000 --> 09:48.000 We talked about that one already. 09:48.000 --> 09:53.000 You do need to build support for like every object type explicitly. 09:53.000 --> 10:02.000 There's a lot of catalog tables and there's a lot of different things that you might not know about for all the different types of objects. 10:02.000 --> 10:07.000 So you kind of have to actually read the docs and codify all those things. 10:07.000 --> 10:08.000 So like this isn't magic. 10:08.000 --> 10:11.000 You're still offloading all the parsing to postgres. 10:11.000 --> 10:15.000 But you still need to build the queries that understand all those things. 10:15.000 --> 10:20.000 The postgres matters. Postgres versions do matter catalogs do evolve. 10:20.000 --> 10:30.000 They don't evolve as quickly as the I would say like the actual like language like syntax does. 10:30.000 --> 10:33.000 But things do change like that as well body. 10:33.000 --> 10:36.000 Does require a later version of postgres. 10:36.000 --> 10:40.000 And so sometimes we'll need to do some switching there if you try to support multiple versions. 10:40.000 --> 10:45.000 And then most importantly and this kind of relates to all these points you just review whatever out. 10:45.000 --> 10:52.000 But you have don't try to do anything super magical because there might be some feature that you didn't know existed and so you're not looking at. 10:52.000 --> 10:55.000 So don't just blindly trust it because postgres is doing the parsing. 10:55.000 --> 10:59.000 Like you still have to do some critical thinking of your own. 10:59.000 --> 11:05.000 So yeah overall we've got the shadow database let postgres to the parsing. 11:05.000 --> 11:07.000 We've got to clear the catalogs. 11:07.000 --> 11:15.000 Look through all the catalogs read all the docs implemented all PG depend is your friend for basically anything that requires ordering ever. 11:15.000 --> 11:22.000 So that's going to be you know the graph you can filter on the dependency types to know which ones you care about and how you want to treat them. 11:22.000 --> 11:28.000 Definitely know the blind spots and watch out for those implicit objects and then sort the whole thing. 11:28.000 --> 11:34.000 And so to kind of talk about the thing that IT's in the beginning I did build a tool that used all of this. 11:34.000 --> 11:41.000 This was my migration tool that needed to have really intimate knowledge of all of the different objects in the tree. 11:41.000 --> 11:49.000 So again this isn't to talk about that but it is schema as code for postgres so you add to the SQL files. 11:49.000 --> 11:55.000 We generate the diff from there and then let you review it and do everything you'd want to do there. 11:55.000 --> 12:02.000 And so you can find it on GitHub or the little website there you know give me a star or an issue or maybe even a PR. 12:03.000 --> 12:17.000 But of course all of these things we just talked about aren't just for migrations you could use it for drift detection some sort of schema lenders CI validation you know visualizations are always super fun. 12:17.000 --> 12:27.000 And that is so nervous about fitting under 20 minutes that I was able to get under 15 so that's the whole idea. 12:27.000 --> 12:39.000 Postgres knows your schema it parses SQL that's its whole point tracks dependencies and if you understand the types and columns then you don't have to reimplement it just ask postgres.