Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters #222

Closed
snowman2 opened this issue Dec 26, 2019 · 88 comments · Fixed by #282
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@snowman2
Copy link
Contributor

snowman2 commented Dec 26, 2019

Title: Allow CRS WKT to represent the CRS without requiring reader to compare with grid mapping parameters
Moderator: ???
Moderator Status Review [last updated: YY/MM/DD]: ???
Requirement Summary:

I propose the requirement be changed like so:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. , then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

Benefits:

  1. The CRS could originate from several different formats such as WKT, PROJ, or SRS Authority Code. If there are errors in the conversion process to the CF or WKT representation, only the provider would have the original CRS representation. As such, if there are conflicts, the provider would be the best source to go to in order to resolve the conflicts.
  2. Making this change will simplify the lives of software developers so they can just read in the WKT or grid mapping CF parameters for the CRS without a need to compare the two.

Status Quo:
http://cfconventions.org/cf-conventions/cf-conventions.html#use-of-the-crs-well-known-text-format mentions

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.
@snowman2 snowman2 added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Dec 26, 2019
@JonathanGregory
Copy link
Contributor

The status quo (giving the CF attributes precedence over WKT) was discussed at great length when the possibility of including WKT strings was added. I have not reviewed that discussion but it would be relevant to do so to avoid repeating it! It's in https://cf-trac.llnl.gov/trac/ticket/69 and https://cf-pcmdi.llnl.gov/trac/ticket/80. I opposed the introduction of WKT strings because I didn't like redundancy, which would probably lead to inconsistency, but I agreed with the resolution that we have, in which the CF attributes take precedence.

Without reviewing the previous discussion, these points occur to me:

  • The WKT model of metadata is different from the CF one. If I remember correctly, WKT overlaps with standard names and units of coordinate variables, not just with grid mapping attributes. Giving precedence to WKT would therefore mean that software would need to analyse the WKT in many circumstances. To enable this, we would need to draw up a thorough correspondence between the CF and WKT metadata models, with rules to resolve inconsistencies. That would be a big job

  • We would have to revise these rules whenever the definition of WKT was amended - maybe this doesn't happen often, but it's not under our control, and we can't foresee what problems it might cause for CF.

  • To give WKT precedence, we would have to require all CF-compliant applications to be able to parse WKT. That's a big expectation, which I think is unrealistic in practice.

In view of these points, I don't think this proposal is the best way to proceed. Instead, if there are elements of the CRS that can't currently be represented in CF but are needed, we should consider adding them, as we have done before (your points 1 and 3). If the equivalence between CF and WKT is unclear or incomplete (related to my first point above) it should be improved (your points 2 and 4).

@snowman2
Copy link
Contributor Author

I am a GDAL/PROJ user, so from my biased perspective life would be much easier from the WKT form :). Additionally, since WKT is already a standard from the OGC geospatial community, most geospatial software should be able to support it.

The WKT model of metadata is different from the CF one. ... That would be a big job.

Correct. That is why I propose the CRS WKT take precedence. The CF grid mapping parameters only provides support for a limited subset of projection parameters. (Ref: https://cf-trac.llnl.gov/trac/ticket/69):

3.2. Because the conceptual model for coordinate reference systems is both large and complex it is considered impractical to devise CF attributes for all of the potential CRS properties which might need to be encoded as metadata attributes in netCDF files. Consequently there is a requirement for such CRS properties to be specified in a compact notational format, preferably a format that is already in widespread use, either as a de facto or de jure standard.

So, in this proposal, if the CRS WKT exists and can be read in, the CF projection parameters should be ignored entirely and no checks made between the two. However, the CF projection parameters are there for both backwards compatibility as well as for programs that do not support the WKT form of the projection.

To give WKT precedence, we would have to require all CF-compliant applications to be able to parse WKT. That's a big expectation, which I think is unrealistic in practice.

I should clarify that in this proposal that CRS WKT can remain optional. However, when it does exist and your program can read it in, I propose that it should take precedence. As a side note, with the GDAL Barn changes (https://gdalbarn.com/), reading in a WKT is much more practical with PROJ as a dependency. It also provides support for WKT2. Additionally, GDAL can easily support the WKT form of the projection which enables all the dependent software to read in the projection.

@snowman2
Copy link
Contributor Author

Here is the WKT2 form of the British National Grid from: https://cf-pcmdi.llnl.gov/trac/ticket/80

>>> from pyproj import CRS
>>> cc = CRS("OSGB 1936 / British National Grid")
>>> cc
<Projected CRS: EPSG:27700>
Name: OSGB 1936 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E
- bounds: (-9.2, 49.75, 2.88, 61.14)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: OSGB 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich

>>> print(cc.to_wkt(pretty=True))
PROJCRS["OSGB 1936 / British National Grid",
    BASEGEOGCRS["OSGB 1936",
        DATUM["OSGB 1936",
            ELLIPSOID["Airy 1830",6377563.396,299.3249646,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4277]],
    CONVERSION["British National Grid",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",49,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-2,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996012717,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",400000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",-100000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["unknown"],
        AREA["UK - Britain and UKCS 49°46'N to 61°01'N, 7°33'W to 3°33'E"],
        BBOX[49.75,-9.2,61.14,2.88]],
    ID["EPSG",27700]]

The coordinate system and area of use currently don't have an equivalent in the CF conventions. The coordinate system is important to note as the axis order is taken into account in PROJ 6+ and GDAL 3+.

@dblodgett-usgs
Copy link
Contributor

Dear @snowman2 --
I agree with @JonathanGregory, that if things are missing from CF that are in WKT, they should be added.

Maybe the core of your proposal is actually best made to the GDAL / PROJ project to modify default behavior when working with CF data? When different, a warning could be issued and the WKT used with preference?

Regards - Dave

@rmendels
Copy link

@snowman2 @dblodgett-usgs @JonathanGregory
Second what Dave says. The argument seems to be to break everything in the CF world so that GDAL will work better with netcdf files. Why not improve GDAL (supposedly based on a talk I heard the new, I believe yet to be released version indeed does have better support).

GDAL is a great library and a lot of work has gone into it, but its netcdf support has always been sketchy . When I first was directed to it years ago, it could only do 2-D files, and would flip the data, even when the metadata clearly said the axes went in the other direction (it just ignored the metadata attributes). That problem lasted for a long time (for all I know it still does this). GDAL has had problems with greater than 3-D files, forecast files, DSG files, files that are part of the NCEI examples for sending in data, some issues with time, and some of the newer features in netcdf4 files.

Things that can improve CF are most welcome, things that would potentially break most present CF based software should have to make an awfully strong case for the benefits.

@snowman2
Copy link
Contributor Author

snowman2 commented Dec 30, 2019

Thanks all for the comments! My desire here is to unite the geospatial (OGC) and CF-conventions here to simplifying things when transitioning between the communities.

Much of the inspiration for this thought came when attempting to match PROJ parameters to the CF conventions as documented here. There are several parameters that do not match up and in several cases a grid mapping does not exist.This is problematic for users who wish to convert back and forth between the two. However, since PROJ supports reading in the WKT string, the full CRS can be properly represented in that manner and no information is lost. Additionally, the PROJ FAQ strongly discourages the use of PROJ strings to represent the CRS and instead recommends using the WKT string.

Maybe the core of your proposal is actually best made to the GDAL / PROJ project to modify default behavior when working with CF data?

This would indeed be problematic and confusing for users of GDAL to change this as the behavior would differ from the CF spec. This is already done in pyproj and an issue exists due to users noting that the behavior differs from the spec here.

Alternative proposal?

Thoughts on stating in the spec that if the CRS cannot be properly represented using the CF grid mapping parameters, that the CRS WKT form is recommended as a fall back (noting of course that this may not be compatible with some software)? Also, it would be good to note for users to make an issue in this repo with their CRS WKT that cannot be represented using the grid mapping parameters so the CF spec can be updated accordingly.

@rmendels
Copy link

Would point out that the newest version of Proj4 in its latest incarnation just introduced a bunch of changes in how things are done, and in the CRS. See some of the discussion related to the R packages sp and sf. It is breaking a lot of software.

@erget
Copy link
Member

erget commented Dec 31, 2019

I agree with @dblodgett-usgs and @JonathanGregory - we already have a clear hierarchy that establishes which values have precedence over which ones in the case of conflict. Data producers already have the possibility of omitting CF attributes in favour of using WKT, although this is discouraged. I would see this as an acceptable solution if one wanted to produce data now and the relevant parameters weren't supported by CF. Optimally, one would pursue the adoption of the needed parameters in CF in parallel.

@rsignell-usgs
Copy link
Member

This is a bit of a sidebar, but one thing that would make it easier for people EPSG and WKT folks to create the CF representation would be if we could get the friendly folks over at spatialreference.org to supply the CF representation. If I google "EPSG 4326", I end up at https://spatialreference.org/ref/epsg/wgs-84/
Opera Snapshot_2019-12-31_074229_spatialreference org
which provides the WKT representation in several flavors as well as other representations. Why not one more for CF? This would also be a good way to figure out what is missing in CF...

@snowman2
Copy link
Contributor Author

snowman2 commented Dec 31, 2019

@rsignell-usgs, that would definitely be nice. However, it will also require a lot of work, so I imagine some kind of funding would be needed.
It would also be a nice feature to have in PROJ, but I assume it will require funding as well: OSGeo/PROJ#1193.

@snowman2 snowman2 reopened this Dec 31, 2019
@marqh
Copy link
Member

marqh commented Dec 31, 2019

Hello @snowman2

this is an interesting topic and I am grateful that you have raised it

I think there are some fine details that are being picked out here that are interesting, as well as the big picture.

Whilst the big picture comes with a lot of considerations, there are small scale benefits we can try to get to.

One example stands out for me from your comments:

The coordinate system is important to note as the axis order is taken into account in PROJ 6+ and GDAL 3+.

I have also been looking at the axis order with respect to CRS-WKT. I agree that this is important.

I think that there is an in situ feature that can be extended to provide some extra clarity on this topic.

With this in mind, I have opened a new issue
#223
to discuss this as an isolated topic, to see if there is a quick and easy extension that would address this concern.

I very much support the broader scope discussion on this topic, hence my approach to separate out #223 so that the discussion on that targeted topic does not get in the way of these valuable considerations.

I hope this is a useful step
mark

@marqh
Copy link
Member

marqh commented Dec 31, 2019

This is a bit of a sidebar, but one thing that would make it easier for people EPSG and WKT folks to create the CF representation would be if we could get the friendly folks over at spatialreference.org to supply the CF representation. If I google "EPSG 4326", I end up at https://spatialreference.org/ref/epsg/wgs-84/

Hi @rsignell-usgs

I'm afraid that Google may be somewhat unhelpful with its advice

the resources at https://spatialreference.org are not very well maintained, and the process of maintenance has been far from clear for some time:
https://spatialreference.org/about/

The EPSG maintain the official registry for EPSG codes, providing URI and URN notation for encodings, e.g.
https://www.epsg-registry.org/export.htm?wkt=urn:ogc:def:crs:EPSG::4326

Comparing this resource to
https://spatialreference.org/ref/epsg/wgs-84/ogcwkt/
it is clear (to me) that the spatial reference resource have not adopted the updated CRS-WKT syntax, which is implemented in GDAL, ESRI and other modern packages.

At present the only well maintained resource for EPSG codes in WKT encoding that I am confident of using is
https://www.epsg-registry.org/

all the best
mark

@marqh
Copy link
Member

marqh commented Dec 31, 2019

On the detail point of the proposal, I would support amending the current text:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

To remove the latter precedence statement.

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

It is my view that there is too much of an onus placed on the data consumer here, to parse both content representations, map terms to one another and interpret outputs. This is complicated and difficult to implement. There are many opportunities for mistakes and problems.

If there is WKT in a file, I want my application to trust it, not to have to parse it to look for mistakes. If i can just parse it then I can delegate this to a supporting application, which is great for maintainability.

I think that placing the onus on the data producer to produce content that they assert is consistent is sufficient.

I think the value of data consumers being able to simply parse the WKT directly is very large.

I think the cost of managing the assertion of consistency on data producers is much smaller. In a sense the status quo is standardising for mistakes in encoding, which i don't think the standard should do, especially given the cost here.

all the best
mark

@graybeal
Copy link

I read both #69 and #80, and was startled by the sudden acceptance of these tickets after such long discussion of possible issues. (Credit here to Jonathan for flexibility!) Many of those issues are raised in this context, but this ticket proposes WKT be dominant in a much narrower sense (see detailed item (b) below).

I agree strongly with @Margh's recent points, including the large value of data consumers being able to simply parse the WKT directly. It's key to recognize this is an augmentation, not a restriction. My detailed reasons follow, but first, I think the phrasing at the beginning of the proposal is creating unneeded alarm.

Despite the misleading title, the proposal doesn't make WKT dominant, it just makes it directly usable (but still secondary, because the WKT is not required). I offer this as an equivalent rewrite of the proposal's first paragraph:

I propose that if a CRS WKT is present and can be used by the software program, that the WKT should be allowed to stand alone as an official CRS of the file by CF standards (thus, implicitly ignoring non-WKT CRS parameters). However, non-WKT CRS parameters still must be present to serve as an official representation of the CRS, in the event the software program cannot read in the CRS WKT or chooses not to use it.

In the text you wouldn't say anything like this of course. The text already describes how WKT is an optional augmentation, and that the non-WKT CF must be as complete as possible. I'd only tweak one line, just before the paragraph marqh highlighted, by replacing "as well as by crs_wkt" with "even if a crs_wkt is present", so now it reads:

Therefore the CRS should be described as thoroughly as possible with the single-property attributes, even if a crs_wkt is present.

With the proposed precedence deletion of marqh (item (c) below), I believe this fully captures the intent of the proposal.

Detailed responses to a few points:

(a) Yes ideally CF could be equally capable. On the other hand, WKT will continue to improve and many tools are and will be built around it. Does CF want to take on the job of "keeping up with WKT" and expect tool developers to "keep up with the CF version of WKT"? Even if we want that to happen, who in CF wants to volunteer to make it happen for CF? And in the tool community?
(b) This seems to be a graceful co-habitation strategy. I don't see how this "breaks everything in the CF world"—it is adding a capability to CF, not breaking anything that exists. If the data creator doesn't add WKT, it doesn't apply. If the tool reading the file doesn't support WKT, it doesn't apply. Everything presented in CF will continue to work with all the tools written for CF. If I'm trying to create CF-compliant data, even if my WKT adds critical value I will want to make my CF as complete and accurate as I can for non-WKT applications, and the CF parameters are still required by the convention.
(c) Supporting the proposed precedence deletion of marqh: If the added WKT does not align with the CF, the data creator has introduced a bug. This can be corrected by social pressure (as is usually the case for any mistakes in the data) and does not require custom text in CF defining the "true meaning" (independent of the originator's intent). The tool creator is motivated to make their tool maximally useful given available time, including whether to favor the WKT expression and whether to cross-check the two expressions. Co-existence works here also and does not damage CF (because the CF parameters still have to be there).

I'm assuming positive assessments of the prevalence of WKT, its features, and its community support for upgrades. If you agree these are favorable indicators, then there are two ways to consider the options. (1) How good will this be for existing CF users going forward? Although maybe not many of them need WKT yet, it will be favorable on balance, with little or no downside that I can see. And more broadly, (2) How much will this encourage/allow the geospatial community to easily adopt and use CF? I think it will be quite encouraging.

@snowman2 snowman2 changed the title Make CRS WKT dominant over grid mapping attributes Allow CRS WKT to represent the CRS without requiring comparison with grid mapping parameters Dec 31, 2019
@snowman2
Copy link
Contributor Author

@graybeal, thanks for clarifying! I used your clarified version as I think it does a much better job of capturing the intent of the proposal.

@JonathanGregory
Copy link
Contributor

Although I'm watching this repository, and I contributed to this thread, GitHub has sent me only one of the contributions to this issue, namely the most recent (before this one, 10 h ago by @snowman2). Shouldn't I receive all of them by email? I depend on email to be informed that some discussion is taking place.

@JimBiardCics
Copy link
Contributor

A few comments on the discussion to this point. I think the discussion is moving in the overall right direction. If seems to me that there was confusion at first between implementations and uses on the one hand and design and conventions on the other. I think we need to seriously consider how big a job it would be to "re-invent the wheel" by trying to add to CF, even piecemeal, all the parameters needed to represent all coordinate reference systems (CRSs). The vast majority of us are not geodesists. We need to acknowledge that this is a significant discipline that we know little about, and allow the experts in that field to be the experts. Let's use the standards they have developed rather than build an inferior substitute.

CF added the ability to specify a few projected coordinate systems. We clearly must continue to honor those for backward compatibility purposes, but let's not add any new ones. I think we should encourage the use of WKT CRS declarations going forward and focus on what might need to be added to CF to resolve ambiguities that might be present. If I understood correctly, @JonathanGregory thought there were possible issues. I didn't see any specifics given, but I'd rather try to clear those up than follow a "make our own" approach any longer.

I've worked with a few data providers that attempted to add grid_mapping variables to their netCDF files. The majority of them botched it. They would have been much better off if they could have copied and pasted a WKT string rather than try to figure out how to read CRS definitions and map elements to CF grid_mapping attributes.

@dblodgett-usgs
Copy link
Contributor

dblodgett-usgs commented Jan 3, 2020

Great strategy @JimBiardCics. Having contributed an implementation to map CF conventions to WKT in R -- I know how error prone and hard it can be. Moving toward support of WKT as a fully fledged option within CF is unambiguously a good thing in my mind.

@marqh's suggested text changes make a ton of sense to me.

Should we also add something that emphasizes the points about "graceful co-habitation" ?

@snowman2
Copy link
Contributor Author

snowman2 commented Jan 8, 2020

Should we also add something that emphasizes the points about "graceful co-habitation" ?

Are you thinking something along the lines of:

"If both a CRS WKT and grid mapping parameters exist, it is assumed that they are equivalent. As such, either one may be used to represent the CRS of the file."

@graybeal
Copy link

graybeal commented Jan 8, 2020

Or to deal with the edge cases and be consistent with our expectations:

"If both a CRS WKT and grid mapping parameters exist, it is assumed that they do not conflict. As such, information from either one (or both) may be used to represent the CRS of the file, recognizing that the grid mapping parameters should always be completed as fully as possible."

@snowman2
Copy link
Contributor Author

snowman2 commented Jan 9, 2020

One minor addition:
"If the CRS cannot be represented using the grid mapping parameters, using only the CRS WKT is allowed. However, some applications will not be able to read in the CRS WKT form."

@JimBiardCics
Copy link
Contributor

@snowman2 Are there any applications that actively read in and use the CF grid mapping parameters?

@snowman2
Copy link
Contributor Author

snowman2 commented Jan 9, 2020

@snowman2 Are there any applications that actively read in and use the CF grid mapping parameters?

The only application I am aware of that does so is GDAL. However, it also checks for the WKT string and compares the two at present. I am not sure about other applications, but I assume there are based on the current cf-conventions. 🤷‍♂️

@JimBiardCics
Copy link
Contributor

GDAL is one more than I was aware of. I'm not aware of any others.

@taylor13
Copy link

I agree that informing the data provider of conflicts when found is good practice. But what behavior should software have? Should it just stop, or can we tell it which of the two conflicting pieces of information it should rely on (until the conflict has been resolved)?

My general ignorance about WKT prevents me from understanding what is meant by "then the value specified by the single-property attribute shall take precedence." Is the single-property attribute sometimes the WKT property and sometimes the CF attribute, or is it invariably the CF attribute?

@JonathanGregory
Copy link
Contributor

I agree that the data-producer should be the best authority on what was intended. However, knowing that doesn't give the data-user an immediate solution to an inconsistency. I think the current wording (the precedence of CF metadata over WKT) makes sense, since this is a CF dataset. As Karl says, that default also gives an incentive to the data-producer to ensure consistency. However, I think it's also fine to recommend contacting the data-producer.

@snowman2
Copy link
Contributor Author

The final wording from the breakout meeting:

There will be occasions when a given CRS property value is duplicated in both a single-property grid mapping attribute and the crs_wkt attribute. In such cases the onus is on data producers to ensure that the property values are consistent. If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. , then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. then the former, being the more specific attribute, takes precedence. Naturally if the two values are equal then no ambiguity arises.

@taylor13
Copy link

Which parts of the sentence:

"If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible."

should trigger an error (warning?) in a compliance checker? Is the file compliant if the two are not the same? Is a file compliant if the grid mapping parameters are incomplete (when it is possible for them to be complete)?

@snowman2
Copy link
Contributor Author

should trigger an error (warning?) in a compliance checker?

I would say an error.

Is the file compliant if the two are not the same?

I would say no based on this part: "in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately"

Is a file compliant if the grid mapping parameters are incomplete (when it is possible for them to be complete)?

I would say no based on this part: "the attributes must be the same and grid mapping parameters should always be completed as fully as possible"

@cameronsmith1
Copy link

How about an entirely different solution: When there are multiple grid descriptions in a file, the creator must add a metadata flag that indicates which of the grid descriptions is 'primary'.

The onus to make grid descriptions as equivalent as possible can still be on the creator, but the user will know which one to trust if there is a discrepancy. And the CF checkers will only need to check that there is one, and only one, 'primary' flag.

@snowman2
Copy link
Contributor Author

That would require all applications to be able to support the WKT format. Currently that is not possible due to software limitations (see comments in this thread).

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Jun 12, 2020

Dear all

Before the meeting yesterday I was arguing, like Karl @taylor13, to retain the presumption that if the grid_mapping and crs_wkt are inconsistent, the grid_mapping is correct. I was persuaded by the discussion that this isn't generally helpful, because the data-user might well think it was unsafe to proceed on that basis. In particular, the data-user might be aware that the WKT information had come first, and therefore suspect the grid_mapping of being an incorrect translation. Karl and I had argued also that the presumption of grid_mapping being correct gives an extra incentive to the data-producer to make sure the two are consistent. However, this also probably doesn't work; if the data-producer had thought about the consequence of misinterpretation, they would have tried to avoid inconsistency anyway.

Therefore I support the change to remove this assumption, and state that the metadata is invalid if grid_mapping and crs_wkt are inconsistent. In response to Karl, I agree with Alan @snowman2 that this is an error, and the file is not compliant, because the convention states the two kinds of metadata must be consistent.

Unfortunately, the CF checker won't be able to detect this error unless we write down the mapping between grid_mapping attributes and crs_wkt in the conformance document (or some document it can refer to). To make the check, you have to be able to interpret both kinds and compare them. One of Alan's concerns was that data-users felt they had to do that. We agreed that they don't. They can read one or the other, assuming they agree. However, it would be good if this could be checked routinely. As I argued before, I strongly feel that it would improve the convention if we could write down the equivalence. Note that we don't have to consider all aspects of WKT, but only those aspects which people want to write in CF-netCDF files. Doing this would cause us to identify what has to be added to grid_mapping to give it the required capabilities, and whether there are inconsistencies between the CF data model and WKT. I wouldn't be surprised if there are, and we need to know, because it's not safe to treat crs_wkt as a black box if it might conflict with other CF metadata (not just in the grid_mapping, but for instance in the units and standard name of coordinate variables).

I would be concerned about adopting Philip @cameronsmith1's suggestion, because I fear that might lead to data-producers not being so careful with one or the other of the representations, thinking that they could set the flag to indicate it's not to be trusted.

Jonathan

@JonathanGregory
Copy link
Contributor

This issue doesn't have a moderator - I think that's why it's not progressed. I will moderate it. @snowman2's current proposal (#282) is to replace

However, in those situations where two values of a given property are different, then the value specified by the single-property attribute shall take precedence. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element) then the former, being the more specific attribute, takes precedence.

with

If both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible. As such, information from either one (or both) may be read in by the user without needing to check both. However, in those situations where the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis and also by the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

I think this is OK, except for the last sentence, which I think should be

For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

That leads naturally to the unaltered final sentence, "Naturally if the two values are equal then no ambiguity arises."

Philip @cameronsmith1 and Karl @taylor13, are you content with this? Alan @snowman2, is my amendment OK with you?

Jonathan

@taylor13
Copy link

taylor13 commented Jul 9, 2020

Yes, I think the intent of this is fine.

I don't recall the text that precedes the revised text, but should the first sentence read:

"If, for a given property, both crs_wkt and grid mapping attributes exist, the attributes must be the same and grid mapping parameters should always be completed as fully as possible"

Also, is the second clause dependent on the first clause, or in general is it true that "grid mapping parameters should always be completed as fully as possible." If it is generally true, I think the second clause should be its own sentence (and maybe it should appear elsewhere?).

@snowman2
Copy link
Contributor Author

snowman2 commented Jul 9, 2020

Minor tweak:

For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately.

@taylor13
Copy link

taylor13 commented Jul 9, 2020

@snowman2 My poor brain can't seem to detect what exactly was tweaked. Could you please point to the specific change made?

@taylor13
Copy link

taylor13 commented Jul 9, 2020

Oh, I just saw the crossed out "is". Guess I mistook it for dust on my monitor. Sorry.

@cameronsmith1
Copy link

I am OK with this (as amended).

@JonathanGregory
Copy link
Contributor

Thanks for correcting my typo, @snowman2. I agree with Karl @taylor13's second point. It is a general statement. However, I think we're introducing unnecessary repetition. I appreciate that the modified text is in the pull request, but our guidelines are that we should discuss it as far as possible in the issue, so there's only one place to look to see the discussion. So I'm repeating the whole of paragraph and the previous one for context. I propose minor deletions for conciseness and to reduce repetition. How's this:

The crs_wkt attribute is intended to act as a supplement to other single-property CF grid mapping attributes (as described in Appendix F); it is not intended to replace those attributes. If data producers omit the single-property grid mapping attributes in favour of the compound crs_wkt attribute, software which cannot interpret crs_wkt will be unable to use the grid_mapping information. Therefore the CRS should be described as thoroughly as possible with the single-property attributes as well as by crs_wkt.

In cases where CRS property values can be represented by both a single-property grid mapping attribute and the crs_wkt attribute, the grid mapping should be provided, and if both are provided, the onus is on data producers to ensure that their property values are consistent. Therefore information from either one (or both) may be read in by the user without needing to check both. However, if the two values of a given property are different, the CRS information cannot be interpreted accurately and users should inform the provider so the issue can be addressed. For example, if the semi-major axis length of the ellipsoid is defined by the grid mapping attribute semi_major_axis disagrees with the crs_wkt attribute (via the WKT SPHEROID[…​] element), the value of this attribute cannot be interpreted accurately. Naturally if the two values are equal then no ambiguity arises.

Jonathan

@snowman2
Copy link
Contributor Author

Sounds like a reasonable change to me.

Minor tweaks:

compound crs_wkt attribute

I am thinking the "compound" word could be removed.

the ellipsoid is defined by

I am thinking the "is" should be removed.

@JonathanGregory
Copy link
Contributor

Thanks, @snowman2. I agree with both of those changes. Please could you update your pull request so it's the same text as above (with those two changes)?

If Karl @taylor13 and Philip @cameronsmith1 think that's OK still, we can count them as supporters, which means the proposal meets the conditions for acceptance. It will be accepted three weeks from now (3rd August) if there are no further concerns raised before then.

@snowman2
Copy link
Contributor Author

snowman2 commented Jul 13, 2020

Sounds good, just updated the PR (459e514). (Note: just edited commit hash).

@taylor13
Copy link

Yes, count me as a supporter. Thanks.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Jul 13, 2020 via email

@cameronsmith1
Copy link

These changes look good to me.

@JonathanGregory
Copy link
Contributor

There have been no further comments in the last three weeks and the required level of support has been reached so the proposal is accepted according to the rules. I have merged the pull request #282. Thanks, Alan @snowman2

@snowman2
Copy link
Contributor Author

snowman2 commented Aug 3, 2020

Thanks for assisting getting this proposal accepted @JonathanGregory 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

Successfully merging a pull request may close this issue.