Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish cadence for database reloads #1286

Closed
gravitystorm opened this issue Feb 2, 2015 · 25 comments
Closed

Establish cadence for database reloads #1286

gravitystorm opened this issue Feb 2, 2015 · 25 comments

Comments

@gravitystorm
Copy link
Owner

There's a large number of different organisations that use this style on their own systems, and would prefer not to have to reload their databases too often when we want to make changes to the schema. We've had a moratorium on changing the .style file for over 2 years now but time is coming for this to change.

While it's tempting to think that if we change it once (i.e. add hstore) we'll never have to do so again, it's unlikely. We should therefore think of it as something we do occasionally, but weigh up the benefits to cartography versus the headaches for downstream users.

It would be useful to find out how often admins reload their databases anyway, for non-openstreetmap-carto reasons. It will also be good to have suggestions from tileserver admins as to how frequent would become inconvenient.

@pnorman
Copy link
Collaborator

pnorman commented Feb 2, 2015

It would be useful to find out how often admins reload their databases anyway, for non-openstreetmap-carto reasons. It will also be good to have suggestions from tileserver admins as to how frequent would become inconvenient

My recommendations have been to reload every 6 months or annually for performance reasons when running a high-demand worldwide osm2pgsql database server, and to re-index more frequently than that. I can't recall the exact figures for performance degradation at the 6 and 12 month marks, but I think it was over 25%.

I know of no one who actually does this - of everyone whom I've talked to, reloads are only when commissioning a new server.

@SomeoneElseOSM
Copy link
Contributor

I'm guessing here, but I suspect that in terms of total numbers the largest number of instances of running stylesheets based on openstreetmap-carto or a derivative are actually smaller extracts rather than full planets. If these people are applying diffs then they'll probably be reloading every month or so because applying worldwide diffs (if they don't trim them first) to a regional extract is obviously going to make it significantly bigger.

Perhaps a straw poll in #osm on IRC or similar might turn up some more "reasons for update" and reload frequency? Another interesting question would be "how often do you merge style changes made to openstreetmap-carto to your map style"?

In my case (tiles created for personal use only on a small local server), I geographically trim diffs before applying and do use lua tag transformations, so it's about every 3 months or sooner if I've made a significant lua script change.

@pnorman
Copy link
Collaborator

pnorman commented Feb 2, 2015

If these people are applying diffs then they'll probably be reloading every month or so because applying worldwide diffs (if they don't trim them first) to a regional extract is obviously going to make it significantly bigger.

Geofabrik supplies daily diffs for their extracts, but I'm doubtful to if they're used as much as they should be.

@SomeoneElseOSM
Copy link
Contributor

@pnorman FWIW I suspect that the reason why people don't use Geofabrik's diffs as much as they might is that (a) they're a relatively new feature and people don't know that they exist, (b) they're daily rather than "near instantaneous" and (c) there may still be a need to trim the data if your "area of interest" doesn't exactly match an extract. For me the reason was (b).

@pnorman
Copy link
Collaborator

pnorman commented Feb 3, 2015

cc @woodpeck as he might be one of the few others who uses openstreetmap-carto on a planet scale

@gravitystorm
Copy link
Owner Author

I use it (and obviously have many other styles too). My experience matches what @pnorman said i.e. not actually reloading the database, but instead commissioning new servers. So it's around a 2-3 year cadence for database reloads.

I'm trying to find more details, but it looks like OSMF haven't reloaded in over a year.

@matthijsmelissen
Copy link
Collaborator

and obviously have many other styles too

Do these other styles use the same database? If so (and if that's the same for other server admins), that would complicate things a lot.

@gravitystorm
Copy link
Owner Author

Do these other styles use the same database?

Yes. Everything worked great until openstreetmap-carto started relying on ele being a string and now it's a bit more complicated! I've worked with a sort of "superset" .style file but I don't think it's a good approach.

In any case, let's not worry about that too much here otherwise we'll get tangled in knots.

@gravitystorm
Copy link
Owner Author

it looks like OSMF haven't reloaded in over a year.

"Best guess is 2013-07-16 for orm and 2013-08-08 for yevaud", so no OSMF reloads in the last 18 months.

@tomhughes
Copy link

I would quite like to get the OSMF servers upgraded to Ubuntu 14.04 (they're two of the four machines still on 12.04) and ideally we would go to Postgres 9.3 at the same time, which would imply a database reload.

I had kind of been holding off a bit with the thought that mapnik 3 might finally appear and we might want to tie it to an upgrade of mapnik as well but maybe I should just give up waiting for that particular mythical creature to make an appearance ;-)

@pnorman
Copy link
Collaborator

pnorman commented Feb 27, 2015

Closing. Based on this and conversations with others, the answer appears to be "when new servers are commissioned", or "never".

@pnorman pnorman closed this as completed Feb 27, 2015
@nebulon42
Copy link
Contributor

This is totally unsatisfactory. To be clear, I just mean the refusal of letting us establish a process for that.

@pnorman
Copy link
Collaborator

pnorman commented Feb 28, 2015

This is totally unsatisfactory. To be clear, I just mean the refusal of letting us establish a process for that.

This issue was a question. That question was answered.

@nebulon42
Copy link
Contributor

But this has serious implications. Why bother with improving this style when it cannot evolve DB-wise?

@tomhughes
Copy link

@pnorman I have to say that your conclusion seems extraordinarily pessimistic, and if it was based on my response then it certainly wasn't how I intended my response to be interpreted.

@pnorman
Copy link
Collaborator

pnorman commented Feb 28, 2015

My comment doesn't establish a cadence for how often we're going to require database reloads. It summarizes how often databases are currently reloaded, which is what this issue was about

@pnorman pnorman reopened this Feb 28, 2015
@nebulon42
Copy link
Contributor

When reading closer I think you are right. Sorry for misinterpreting.

@imagico
Copy link
Collaborator

imagico commented Feb 28, 2015

For the actual question when and how a style file change is going to happen see also #1243.

@gravitystorm
Copy link
Owner Author

Sorry, I've realised my initial posting was unclear. I would like, using this issue, for us to establish our cadence. The final paragraph was intended to suggest that we should take into account the frequency of reloads that happen anyway, which as @pnorman states above is approximately "when new servers are commissioned", or "never". But that was only supposed to be background information.

@gravitystorm
Copy link
Owner Author

I would like to propose the following:

  • openstreetmap-carto tries to minimise the need for database reloads
  • gaps between database reloads will be at least 6 months and ideally 12-18 months
  • database reloads are indicated by major version number change

I'd also propose that we make our first database-reload-requiring change in April 2015, so the second one would be no sooner than October 2015 and preferably much later.

Part of my initial query was "how frequent would become inconvenient". Do we have any views on that?

@tomhughes
Copy link

Well my thought was something like once a year might be reasonable, ignoring any special cases where we might do an extra one like moving to mapnik 3 or to vector tiles or something.

I guess every six months isn't impossible - the only real issue is admin time to actually do it I think.

On the question of how we might actually do it, how long does an import take on a reasonable machine these days?

@pnorman
Copy link
Collaborator

pnorman commented Mar 2, 2015

On the question of how we might actually do it, how long does an import take on a reasonable machine these days?

Under a day for SSD-based storage.

Of course, it takes longer if you want to do it while the machine is still rendering from the old database. This is actually reasonable if you stop updates to the old database and doesn't take much extra disk room.

  1. Stop updates
  2. Drop slim tables and osm_id indexes from the old database
  3. Create a new database and import. Create partial indexes
  4. Stop rendering server, rename databases, restart rendering server
  5. Drop old databases
  6. Adjust updates to be at the new date
  7. Resume updates

PostgreSQL has been releasing a new major version every year, and if downtime of the rendering server can be tolerated, it's good to combine a version update with a reload.

@kocio-pl
Copy link
Collaborator

kocio-pl commented Mar 2, 2015

For me as an advanced mapper (still not developer) a year is quite a long time in the project life, but still reasonable and it perfectly fits Andy's proposition, even with @tomhughes special cases ("gaps between database reloads will be at least 6 months and ideally 12-18 months").

@kocio-pl
Copy link
Collaborator

We have already reloaded database lately (with v4.0.0) and I think we can surely wait at least 6 months if there will be a need to reload it again. But there can be also no such need for a longer time, so I don't believe we can set a "cadence" at all - only the minimum time between reloads.

@kocio-pl
Copy link
Collaborator

I guess it's now clear and I can close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants