Oracle and Postgres Redux

10 Dec

So, my little rant about Oracle got a lot more attention than I’d expected.

While I was mainly focused on the price of Oracle, now I’d like to explore my favorite alternative a little more: PostgreSQL (PG). I’ve been busy the last few days using the excellent Ora2Pg to convert my old 8i database. What I’ve done is a quick and dirty off-hours conversion for a project that I’ve been working on for several months just to see how PG would work.

The project is a dashboard, a business reporting tool that is all the rage these days. It includes every business indicator they could think of from hours a deal spent waiting for approval, to projected totals of deals not yet approved, to charts and graphs of deals long ago approved. Gone are the days when you could give the users a copy of Crystal Reports and let them at it. They expect much more.

In the beginning I was saddled with an enormous spec that delineated only some of what the application should do and a number of existing reports. Unfortunately, not one of them ran in under 10 minutes and they were each a knotted mess of SQL depending on an untold number of views and procs. It took months to streamline the SQL and create a new data warehousing project. When it was all said and done, 51 queries dominated the main screen of the dashboard.

And it still took 30 seconds to draw for simple searches.

That was when I began caching like mad. There were hundreds of thousands of possible search term combinations, meaning I couldn’t simply populate the cache ahead of time, but at least it only needed to run so often. Terracotta, which I’ll write about soon, was a huge help. It’s ten pounds of awesome in a five pound sack.

Caching is probably the only reason my users haven’t gone running into the hands of Oracle, thrusting dollars at them. It alleviates the load and allows only queries that haven’t yet been run to actually touch the database. I can get away with a whole lot less horsepower than I would need otherwise.

Porting

This isn’t meant to be a benchmark because that would be fantastically unfair to Postgres. I’ve spent an enormous amount of effort in Oracle optimization. It’s big job filled with expensive consultants who post weird pictures of themselves.

The schema contains materialized views built on partitioned tables. This greatly helps cut down the amount of data Oracle must sort through to fetch rows. The indexing has been deliberate and extensively tested, some times only to gain a .10 second difference. It’s been de-normalized as much as possible but the structure of the data made that difficult. A couple queries still touch as many as five tables. The queries themselves have also been written and rewritten with Oracle in mind, using inline views, subselects or whatever Oracle happened to fetch faster. I also made some use of optimizer hints like /*+ RULE */ and /*+ JUST_USE_MY_INDEX_ALREADY(table) */. I think of them as an implicit admission that the optimizer is broken if I have to tell it what to use.

But for Postgres I generally let Ora2Pg do it’s thing. I didn’t even bother changing the types on many of the columns, some of which defaulted to the slower arbitrary precision type numeric. Postgres supports more sensible integral types like int2 and int4. I also didn’t bother partitioning the tables similar to my Oracle setup, and I didn’t even bother compiling it myself to my architecture for the best performance. I pulled 8.3 using apt-get from the Ubuntu repository.

In short, I did about as little as possible. The only changes I made to the queries were the simple syntax differences between the two. Such as Postgres requires the "AS" keyword in the SELECT expression. Or a few type casts here and there. I found that a few of my indexes depended on Oracle’s trunc function, so I created my own function that simply returned the result of a ::date cast and indexed that. There’s a lot of room to improve these results.

Results

So without further ado, here’s how an unoptimized PG faired:

That’s pretty impressive.

17 Responses to “Oracle and Postgres Redux”

  1. matt harrison 10. Dec, 2008 at 4:52 pm #

    Am curious as to how you ‘ported’ your materialized views. (A quick glance at ora2pg didn’t find anything)

  2. Mike Johnson 10. Dec, 2008 at 5:06 pm #

    You’re right I didn’t find anything in Ora2Pg either. Exporting with the VIEW option didn’t dump them.

    But all of my views were done using ON PREBUILT TABLE so I could control the partitioning, so Ora2Pg copied them over with the rest of the data.

  3. Alex 10. Dec, 2008 at 6:57 pm #

    I’m working on a project where we have to choose between Oracle and Postgresql and this was helpful to read. Thanks for posting!

  4. Emanuel 10. Dec, 2008 at 8:19 pm #

    The statistisc you show are inmediatly finish the migration?
    Remember that the clean dumped data in postgres maybe help in numbers…
    In that case you must dump cleany the data on oracle and then run again new numbers…
    In adding, Oracle 8i is oldie…
    but in the other hand is exelent the post!

  5. Mike Johnson 10. Dec, 2008 at 9:17 pm #

    Well, I just recently loaded the Oracle data for development purposes, though not as recently as the PG. I only use Java PreparedStatements and I ran both several times before taking 4 timed runs for the averages shown. That should have been plenty of time for both databases to cache the statements.

    I did run a VACUUM ANALYSE FULL after loading PG. Oracle’s table stats are up-to-date.

    Plenty of people are giving me a hard time about comparing 8i to a new Postgres though. :-)

    Seems like there’s enough interest that I’ll go ahead and try 10g and post the difference when I have some more time.

  6. RDL 10. Dec, 2008 at 9:21 pm #

    I’m curious…

    What’s the HW config (cores, memory, disks, etc.) for each DB server?

  7. Scott Swank 10. Dec, 2008 at 9:42 pm #

    If you do try this on 10g or later, make sure that you get rid of those hints. While they may have made sense on 7.3 or 8.0, you really don’t want to screw up the 10g optimizer with them.

  8. Greg Smith 10. Dec, 2008 at 11:35 pm #

    If you didn’t do any configuration optimization for PostgreSQL, you might get a nice boost just from tweaking a few parameters. The default configuration is only appropriate for a system with about 128MB of RAM, it’s “optimized” for working out of the box on a wide range of systems rather than performance.

    There’s a fairly focused list at http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server that goes over where to start. For what you’re doing, touching shared_buffers, effective_cache_size, and work_mem should be sufficient to improve things quite a bit with only a few minutes of effort. Add checkpoint_segments if you care about speeding up write-intensive things, too. Getting those basics right gets you more bang per time spend tuning than trivia like recompiling for your architecture.

  9. Mike Johnson 11. Dec, 2008 at 12:52 am #

    You’re right of course. My intention was to see what little I could get away with to beat Oracle’s times. It wasn’t a lot. Of course, I tweaked our production server pretty extensively.

  10. Mike Johnson 11. Dec, 2008 at 1:02 am #

    I’m running both on my development workstation one at a time. An Ubuntu Hardy install 2.40Ghz dual core, 2 gig RAM with a single SATA drive.

    Of course, Oracle would benefit some what more that Postgres by keeping the logs on separate drive. Postgres keeps just about everything in the table and seems to be well served by a RAID configuration.

    But I think because there’s no write activity the log location wouldn’t make much of a difference for this experiment.

  11. Anonymous 11. Dec, 2008 at 2:20 am #

    Materialized views in postgres are fairly well documented. [http://jonathangardner.net/tech/w/PostgreSQL/Materialized_Views]

  12. Greg Smith 11. Dec, 2008 at 7:15 am #

    Sure, I get that, just pointing that even a few minutes of basic tuning might get you a nice improvement. The default PostgreSQL configuration is really awful, because of kernel restrictions actually. It’s setup to work even on something like a stock Linux install that only allows 32MB of shared ram to be allocated, whereas Oracle just flat out tells you need a larger SHMMAX to support any reasonable SGA size as part of the install instructions.

    Also, above I saw you mention doing a VACUUM FULL ANALYZE. The FULL part of that isn’t needed–if you do regular VACUUM often enough or have autovacuum on, you shouldn’t even need to use FULL. And FULL can cause some problems with your indexes becoming slower than they should be; a full vacuum is not something you should ever need to do, except if you’re deleting a large quantity of information and need the space back. Just a random aside, it shouldn’t have any impact on what you did.

  13. JD 11. Dec, 2008 at 9:17 am #

    While your graph is pretty, it’s invalid. You’re dealing with completely independent data points, so you should be using a bar graph, not a line graph, which should only be used for graphing a series of continuous data points.

    Also, I’m curious what you’re going to say about terracotta, as you appear to be using it as a cache, when it’s not designed for that. Terracotta is a distributed shared memory system. If you want distributed caching, you should be using something like memcache.

  14. Emanuel 11. Dec, 2008 at 3:44 pm #

    Adding to my first comment.
    I think it was interesting if you can show the querys where oracle have better or similar runtime, just for curious …
    could it be?

  15. Richard 11. Dec, 2008 at 7:45 pm #

    Well, the DBA guy with the weird picture has since changed his profile pic.
    I hope you’re comfortable with the soul-searching you inadvertently hoisted on him. (i would be)

  16. Mike Johnson 11. Dec, 2008 at 9:06 pm #

    Yeah, I saw that. Now I feel bad. :-)

    Having just got out of the Marine Corps, I’m used to giving people a good ribbing for just about anything. Sometimes I forget that people will actually get upset.

    I will follow my 12 steps more carefully.

    http://www.robichaux.net/blog/2008/01/12step-program-for-recovering-marines.php

    Although, he should take some comfort in the fact that I mentioned him because there’s no other source as knowledgeable as he is on performance. Every time I have some weird problem with Oracle, I google it. He’s almost always the first result, and his articles are always helpful and well constructed.

  17. www.simononsoftware.com 08. Jan, 2009 at 3:53 pm #

Leave a Reply