Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIO needs a way to change the ncid of a file... #1457

Closed
edwardhartnett opened this issue Aug 9, 2019 · 18 comments
Closed

PIO needs a way to change the ncid of a file... #1457

edwardhartnett opened this issue Aug 9, 2019 · 18 comments

Comments

@edwardhartnett
Copy link
Contributor

@DennisHeimbigner I have a little itty-bitty problem...

PIO/netCDF integration is working great, but it turns out that PIO needs to reassign ncids after the NC struct has been created and added to the list.

Recall that with PIO, different computational units (each N processors) can create/open netCDF files, all through a single set of dedicated I/O processors. That means that the I/O processors must be in charge of ncids, so that there is no conflicts.

The way it works now is from dfile.c NC_open() function:

    /* Create the NC* instance and insert its dispatcher */
    if((stat = new_NC(dispatcher,path,omode,&model,&ncp))) goto done;

    /* Add to list of known open files. This assignes an ext_ncid. */
    add_to_NCList(ncp);

    /* Assume open will fill in remaining ncp fields */
    stat = dispatcher->open(ncp->path, omode, basepe, chunksizehintp,
			    parameters, dispatcher, ncp->ext_ncid);

So the NC is created and added to the NC list before the PIO open is called.

In the PIO open, I need to determine the correct ncid, and reassign the file to that new ncid.

To support this I am adding two functions to netcdf-c. One of them changes the ncid info in the libsrc4 metadata:

/**
 * @internal Change the ncid of the file. Use with extreme
 * caution. This is used by the (external) PIO dispatch layer.
 *
 * @param ncid The ncid of the file (aka ext_ncid).
 * @param new_ncid The new ncid to use for this file.
 *
 * @return ::NC_NOERR No error.
 * @return ::NC_EBADID No NC struct with this ext_ncid.
 * @author Ed Hartnett
 */
int
nc4_file_change_ncid(int ncid, int new_ncid)
{
    NC *nc;
    int ret;

    /* Find NC pointer for this file. */
    if ((ret = NC_check_id(ncid, &nc)))
        return ret;

    /* Move the NC strut to new position in list. */
    if ((ret = move_in_NCList(nc, new_ncid)))
        return ret;

    /* Change the ncid of the file. */
    nc->ext_ncid = new_ncid;

    return NC_NOERR;
}

I am having some trouble putting together the other function I need, to move an item in the NC list.

Also I wonder what effect this will have on the selection of the next available ncid. I think it will work fine actually.

What I would like to do as a first step is to document and write tests for the code in nclistmgr.c relating to the ncid stuff. That way I will have a better understanding of what is going on.

@edhartnett
Copy link
Contributor

@WardF I have this working but am still testing. I will put up a PR tomorrow. I hope this can make it to the 4.7.1 release. It is required for PIO integration.

@WardF
Copy link
Member

WardF commented Aug 22, 2019

@DennisHeimbigner I'm going through all of GitHub PR's and such now, any input/thoughts on this?

@DennisHeimbigner
Copy link
Collaborator

I think there a lot of consequences to this change. I still do not understand
exactly why the ncid needs to change. Ed, can you spell out the scenario
in detail. If you were passed the NC* instead of ncid, would the problem
go away?

@DennisHeimbigner
Copy link
Collaborator

I am trying to figure out the consequences for this.
Is there an issue with changing the ncid's of all files
with higher ncid's than the one you are changing
-- attribute deletion has this problem, for example.
Or are you assuming a hole would be left in the file list?

@DennisHeimbigner
Copy link
Collaborator

Would it be possible for pio to use its own ncid equivalents and
use a hash table to map those to the real ncids?

@edhartnett
Copy link
Contributor

edhartnett commented Aug 22, 2019

I believe this is something that can be in place for PIO, with minimum code impact and no affect on non-PIO users.

Why is it needed?

Imagine I have 10K processors. I assign 1K to I/O, 1K to an ocean model, and 8K to an atmospheric model.

Recall that with PIO, the I/O processors do all I/O for both the ocean model and the atmospheric model.

The ocean model creates/opens a file. The computation processors assign an ncid (file ID = 1), and so do the I/O processors (file ID = 1); the IDs are the same since this is the first file.

Then the atmospheric model opens a file. It assigns an ncid (file ID = 1). It has no knowledge of the ocean model and what it has done. The I/O processors assign ID = 2 (since ID 1 is already in use). The I/O processors know that the ocean model has opened a file.

Hilarious confusion ensues.

Now here's how it works after ncids can be reassigned.

The ocean model opens a file. The I/O processors open it (file ID = 1) and they assign an ncid to the ocean model code (still 1).

Now the atmospheric model opens a file. The I/O processors open it (file ID = 2) and they assign an ncid to the atmospheric model (ID = 2).

So this is why PIO needs to assign ncids. The ncids have to be generated in the I/O processors, and assigned in the computational processors.

Make sense?

As I say, this will have no effect on normal netcdf operation. Just two extra functions which move the NC.

@edhartnett
Copy link
Contributor

I might add that these two functions are all that remain of #555

The rest of #555 has been accomplished using the user-defined format feature. But PIO needs the ability to reassign ncid. This makes a lot more sense to put in the netcdf library than the PIO library, because it would require that PIO be aware of lots of netCDF internal structs.

@DennisHeimbigner
Copy link
Collaborator

In your example, it appears that you are assuming that each processor
(or is it sets of processors?) have independent copies of the global
state of the netcdf-c library (else they would alll see the see the same set
of file ids in use). Is this correct?

@edwardhartnett
Copy link
Contributor Author

Each processor has its own global state. For processors in the same computational unit, that state will be identical. So if all 1000 processors are in the same computational unit, and call nc_create, they will all start with the same starting ncid, and increment it by one, so they can all assign ncids and they are all the same.

But consider the case of multi-level parallelism. Now 300 processors are running an ocean model, 600 are running an atmospheric model, and 100 are doing I/O.

The ocean model creates a file and gets the first ncid. It tells the I/O processors to create the file, and the I/O processors also use the first ID. So far, so good.

Now the atmospheric model creates a file. It uses the first ncid, but the I/O processors are already using that. The first ncid is in use, so the second ncid is used. (Don't forget that the I/O processors are the only ones really opening a file when nc_create() is called with PIO.)

So now the I/O cores have to reassign the ncid to the atmospheric model. Once they do, everything works fine.

There are no serious consequences to adding this function. If it is not called, it does not do anything. As with any of the functions in nc4internal.c, if called incorrectly, damage may result. The same would be true if the user deleted a var from a var list by calling nc4_var_list_del() inappropriately. I am not proposing that this become part of the public API. It is an internal function that PIO uses. So it is hard to see the objection here.

Passing the NC struct would of course also work. The point of this function is to hide knowledge of the netcdf internals from PIO. If you decline to merge this code, then I will simply have to copy a bunch of netcdf internals into the PIO library, causing even greater exposure, which is not what any of us want. With this change, the PIO library does not need to know about the NC struct, and netcdf-c does not need to know about any PIO internals.

As I mentioned during various meetings about PIO, some changes to the netcdf-c library are required to support them. This is that change. The point is to bring these great HPC features to the users. Adding this one function is a remarkably small change to accomplish so much.

These features will be very exciting to HPC users. Being able to scale to thousands of cores is a transformatioal improvement to netCDF. Being able to re-use existing netCDF code is a huge win for the science community, and netCDF.

@edwardhartnett
Copy link
Contributor Author

To make things a little more theoretical, the issue here is really that ncid is assigned before the dispatch layer is called. So the dispatch layer does not control the assignment of ncid. Another way to approach this would be to have the dispatch layer code assign ncids, but then we have the problem of coordinating between different dispatch layers. And the code to assign ncid would be repeated everywhere, but the need to control ncid is probably restricted to the odd but important case of multi-level parallelism.

So in the case of multi-level parallelism, I determined that being able to move the ncid had the minimal impact on netcdf-c code.

@DennisHeimbigner
Copy link
Collaborator

Short answer, yes. There are multiple copies of the netcdf-c library
global state. ANd changing the ncid is a hack to achieve the effect
of having a single copy of the file list part of the global state.

One question that comes to mind is that within a processor, it appears
that the ncid list is being simultaneously accessed by multiple "threads"
(my term) within the, say, IO processor. so how do you prevent race
conditions?

@edwardhartnett
Copy link
Contributor Author

@DennisHeimbigner not sure what you are answering "yes" to.

There are multiple copies of the netcdf global state. They are not shared. Multi-processing means a bunch of processors executing the same code with independent memory. So if 300 processors are involved in a computational unit, each of them have their own state for netCDF. But since they are all executing the same code, that state changes in lockstep.

Do not confuse this with threads. There is no connection. If netcdf-c is made threadsafe, it will be threadsafe on each of the 300 processors. Changing the ncid would obviously be one of the changes in global state that have to be protected to achieve thread safety, just as any operation on the nc4internal lists would be if they change the state of the list. So in nc4internal.c there would be some code around all changes to the lists that is protected with a semaphore. And the ncid changing code too, would be protected with that semaphore.

It is not "an" I/O processor, it is 100 I/O processors, all running the same code. So there is no possibility of race conditions. They do not share the same internal state, but they execute exactly the same calls and MPI makes each processor to wait for the others to catch up. When a file is created it is created on all 100, and none of them continue processing until all have created the file and agree to start using it (PIO enforces that).

I know this is confusing. Perhaps I should drop by Unidata and meet in person to explain it.

@DennisHeimbigner
Copy link
Collaborator

From a concurrency point of view, I think the correct solution
would be if we had a single "thread"/processor that manages
the file list and all other processors talked to that single
processor to get ncids. In practice, this should be the case
for all of the global state of the netcdf-c library.

However, I suspect that this proper approach is not doable
with mpio, so you are probably right.

@edwardhartnett
Copy link
Contributor Author

In non-async mode, all processors do the same exact things to their global state, so it is always in lockstep.

In async situations (i.e. multi-level parallelism) then the I/O processors are in charge of the file list, and they own the nicds. They are the only ones that really open files on the file system. But the computational units need to know the ncid's that have been assigned by the I/O units. Since they are not in lockstep, they don't know.

And that is the point of the function I have introduced. It allows the I/O processors to tell the computational processors what the "real" ncid is. Since they are operating independently of the computational processors, there is no shared memory between them. If we want the computational processors to know the "real" ncid, we must have a function to call to inform them of it.

Also, it should be clear that this happens during the create call. The user only ever sees one value for the ncid, and it is the same everywhere. The ncid is never attempted to be changed other than during the create call.

I believe this PR is the lowest impact way to solve multi-level parallelism with netcdf. It will have no impact on other users. And if you do make the library thread-safe, this code must be part of that, so it would be best if it were in the netcdf-c library, not the PIO library.

@DennisHeimbigner
Copy link
Collaborator

I think I understand your point The fact that some subset of processors
is operating in tight lockstep means that they act in effect as if they
were one cpu. So we have a situation in which each cpu(set of
lockstep processors) has its own state but synchronization of the
state of the various cpus must be synchronized at least with respect
to the ncid assignment. Frankly the netcdf-c library is not designed
for operation with multiple copies of the global state that must
be synchronized.
As I indicated, the proper solution would be if we assigned
a cpu that was the only one that actually assigned the ncids
and all other cpus talked to it to get the ncid. That solution,
however, would require significant refactoring of the library --
even more that normal thread support would require.
Bottom line, your solution of renumbering ncids is indeed
as simple a solution as seems possible given the current
library organization. I think, however, we need to put these
functions into a separate netcdf-pio.h file and surround it
with warnings. I am afraid of what others might do with this
capability.

@DennisHeimbigner
Copy link
Collaborator

I agree. What I did not realize is that PIO is in effect introducing
a distributed memory model and not a shared memory model.
As I said, a more general solution to distributed memory is just not
worth thinking about right now. In any case, I have given my approval
to your proposed pull request. Sorry for being so obtuse about this.

@edwardhartnett
Copy link
Contributor Author

No problem. I'm happy that others are safe-guarding my baby too. ;-)

@edhartnett
Copy link
Contributor

This had been done. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants