Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional requirements on formulas #388

Open
merkys opened this issue Oct 27, 2021 · 25 comments
Open

Additional requirements on formulas #388

merkys opened this issue Oct 27, 2021 · 25 comments
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/property-standardization The specification of the precise data representation of properties and entries type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.

Comments

@merkys
Copy link
Member

merkys commented Oct 27, 2021

In Materials-Consortia/optimade-python-tools#986 I have proposed two additional requirements for formulas (chemical_formula_reduced, chemical_formula_hill and chemical_formula_anonymous) dictated by my personal "common sense":

  • Formulas MUST NOT be empty strings. If a formula is unknown, null value MUST be used. Empty formula therefore looks as if the structure in question does not have any atoms, and I do not think this is something that should be allowed.
  • Formulas MUST NOT contain element proportions equal to 0. Trivially, if no such element exists in a structure, it MUST be excluded. If, however, minute proportion of a certain element is observed, it MUST NOT be rounded to 0 (specification requires rounding to integers for chemical_formula_reduced and chemical_formula_anonymous). I think a better approach than rounding would be to multiply all proportions by some number to eliminate fractional proportions.

This seems related to #361.

@merkys merkys added topic/property-standardization The specification of the precise data representation of properties and entries type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus. status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. labels Oct 27, 2021
@JPBergsma
Copy link
Contributor

I agree that formulas MUST NOT be empty strings.
I am however wondering what counts as a minute amount.
I would find it rather ugly to have very large element proportions.
If an element is only present as a dopant, I would allow this element to not be in the formula.

@merkys
Copy link
Member Author

merkys commented Nov 3, 2021

I am however wondering what counts as a minute amount.

In my proposal I meant > 0. So if element exists in the structure, it has to be mentioned in formula no matter the actual amount.

I would find it rather ugly to have very large element proportions.

I agree that large proportions are ugly, but they retain the information.

If an element is only present as a dopant, I would allow this element to not be in the formula.

If we all can agree on a formal definition for a dopant, then I guess such elements could in principle be excluded. However, element exclusion negatively affects queries which might be interested in dopants.

@JPBergsma
Copy link
Contributor

On second thought, dopants are probably not such a problem as the composition is made on pupose and in that sense it is intended as a material on it self. It is also likely that there is data on the pure material as well.

For experimental systems it could be more problemetic as a material can have impurities and defects. In that case a material may have a composition like Ca198Na2O199
Do we want users to be able to find this material if they look for CaO? I think it should be found.
So I would place CaO in the chemical_formula_reduced field and Ca198Na2O199 in the Chemical Formula descriptive field to show that the material was impure.

There is also the elements ratio field that does keep the exect ratios between the elements and this field can therefore also be used to find doped materials. So we would not lose information if we rounded the amount of an element in the chemical_formula_reduced field.

@merkys
Copy link
Member Author

merkys commented Dec 8, 2021

For experimental systems it could be more problemetic as a material can have impurities and defects. In that case a material may have a composition like Ca198Na2O199 Do we want users to be able to find this material if they look for CaO? I think it should be found.

Fair enough. This can be achieved now by querying elements HAS ONLY [ "Ca", "O" ] no matter what conventions for formulas are used.

So I would place CaO in the chemical_formula_reduced field and Ca198Na2O199 in the Chemical Formula descriptive field to show that the material was impure.

Could you propose a programmatic way to arrive to CaO from Ca198Na2O199?

@rartino
Copy link
Contributor

rartino commented Dec 8, 2021

Formulas MUST NOT be empty strings. If a formula is unknown, null value MUST be used. Empty formula therefore looks as if the structure in question does not have any atoms, and I do not think this is something that should be allowed.

Nothing right now forbids a structure without any atoms (i.e., unit cell but no coordinates), and I think the reasonable chemical formula for that is the empty string. Does it not make sense to allow that? I suppose I can try to come up with some scenarios where it can be useful. You could possibly modify your proposed requirement to more be a clarification that empty formulas MUST only appear for structures without any atoms. (However - eh - how was it with those non-specifically placed hydrogens? Since I don't use this, I don't quite remember everything they make possible - is it perhaps possible to come up with a pathological example of only unspecifically placed hydrogens which should have an empty chemical_formula_descriptive?)

I think I agree with the limitation that, e.g. Na0 (0 = zero, not oxygen) should not be used to indicate disordered systems with very small concentrations, but I suppose the question is what one should do for those cases.

@ml-evs
Copy link
Member

ml-evs commented Dec 8, 2021

Just to join up this conversation with #361, which approaches a similar problem from the elements and elements_ratios side.

One suggestion for impurities with vanishing/unknown concentration is to add them to the elements and elements_ ratios lists but exclude them from chemical_formula_reduced. I am more comfortable with elements_ratios being zero for one species (as all the normal filtering semantics on floats would work) compared to adding it to formulae where it would break string matching.

In this case, it might even be sensible to add a new structure_feature tag to such an entry so that it can be filtered easily (impurity?). We would just need to define the rules for when this tag should be added (e.g. a minimum elements_ratios). The only problem I can see is that we would be treating adatom and substitutional impurities differently from vacancies and stoichiometry-preserving defects which might be misleading for users. The alternative would be for queries that want to return only pristine structures to add something like elements_ratios HAS ALL > 0.01 (which I do not think is well-supported).

Could you propose a programmatic way to arrive to CaO from Ca198Na2O199?

A query that could return Ca198Na2O199 and related structures around CaO could be:

elements:elements_ratios HAS ALL "Ca":>0.49, "O":>0.49.

Although this is an optional filter feature, a database serving defected structures should probably implement it... With the additional suggestion above of adding the Na defect with elements_ratios = [0.5, 0, 0.5], the query above would become
elements:elements_ratios HAS ALL "Ca":=0.5, "O":=0.5 (where more awkward ratios would provide problems).

@JPBergsma
Copy link
Contributor

JPBergsma commented Dec 8, 2021

Fair enough. This can be achieved now by querying elements HAS ONLY [ "Ca", "O" ] no matter what conventions for formulas are used.

This would also return calcium peroxide CaO2. (and "O2" and "Ca" but you can prevent that with 'HAS ALL ["Ca","O"] AND nelements=2')

@ratino I guess you could consider a vacuum a material in some respects. If a database contains data on the polarizability of different materials, they may include vacuum, as it also has a measurable polarizability.

Perhaps a small PR can already be created about the things about which we agree:

  • If the reduced chemical formula is unknown, it should be null.
  • Elements proportions MUST not be 0.

We can then turn the discussion about how to handle impurities into a separate topic.
Do databases have information about the purity of their structures? If not, it is not really useful to have such a discussion here.

@merkys
Copy link
Member Author

merkys commented Dec 20, 2021

Fair enough. This can be achieved now by querying elements HAS ONLY [ "Ca", "O" ] no matter what conventions for formulas are used.

This would also return calcium peroxide CaO2. (and "O2" and "Ca" but you can prevent that with 'HAS ALL ["Ca","O"] AND nelements=2')

Right, elements HAS ONLY [ "Ca", "O" ] will return CaO2, but so would elements HAS ALL ["Ca","O"] AND nelements=2. To filter out CaO2 one would need to query for elements:elements_ratios HAS ALL "Ca":>0.49, "O":>0.49, like @ml-evs pointed out.

@ratino I guess you could consider a vacuum a material in some respects. If a database contains data on the polarizability of different materials, they may include vacuum, as it also has a measurable polarizability.

Not sure if current specification is ready to describe such structures, but why not introduce them in the future.

Perhaps a small PR can already be created about the things about which we agree:

  • If the reduced chemical formula is unknown, it should be null.
  • Elements proportions MUST not be 0.

Agree. Shall we add that formulas MUST be empty strings only for vacuum? Structures with inspecifically placed hydrogen atoms would still have hydrogen atoms in formulas, I guess. Not sure, though, what to do about structures built from subatomic particles, for example, sole electrons, if only such exist.

We can then turn the discussion about how to handle impurities into a separate topic. Do databases have information about the purity of their structures? If not, it is not really useful to have such a discussion here.

At least experimental structural databases have information about impurities of sites.

@JPBergsma
Copy link
Contributor

JPBergsma commented Dec 22, 2021

Agree. Shall we add that formulas MUST be empty strings only for vacuum? Structures with inspecifically placed hydrogen atoms would still have hydrogen atoms in formulas, I guess. Not sure, though, what to do about structures built from subatomic particles, for example, sole electrons, if only such exist.

For me it is ok to specify that formulas MUST be empty strings only for vacuum.
In my opinion, inspecifically placed hydrogen atoms MUST appear in the chemical formula.

Good point about the solvated electrons / electrides. At low temperature, they can be quite stable. The most logical would be to use the small letter "e". I just realized this may be confusing when distinguishing C + e from Ce in the chemical formula fields. Perhaps an "E" would be better, as it fits the rules for the other elements.

I guess we could use the centre of the electron density as the position. Although, I am not sure how to define the positions for a metallic electride. It would definitively be a good idea to add this to the standard, although this would affect more than just the elements field, so I think it would be better to create a separate issue about this.

@merkys
Copy link
Member Author

merkys commented Feb 2, 2022

Electrons and neutrons could be marked as e and n respectively, if only we mandate that such symbols appear in the beginning of the formula. This way no capital letter will appear before e and n if these elements appear in the formula.

However, I am not sure this is not an overkill. Will there be structures made up from electrons or neutrons entirely?

@JPBergsma
Copy link
Contributor

I do not see how a structure could be made of just electrons and neutrons. Neutrons that are not bound in atomic nuclei will decay quickly, and they can not form bound states. Electrons repel each other. So I do not see how you can form a chemical structure with just electrons and neutrons.

The only scenario I can think of with both free electrons and neutrons is when you would study the effect of (neutron/beta)radiation on a material. In that case, when an unbound neutron is "fired" at a material, an atom could get ionized. In that case, the trajectory would also contain a separate electron. Other than that, I do not see how an unbound neutron could appear in a trajectory or structure.

@rartino
Copy link
Contributor

rartino commented Feb 2, 2022

One of the most important papers for DFT, the Ceperley-Alder Monte Carlo simulations that more or less all LDA correlation functionals are based on
[ http://dx.doi.org/10.1103/PhysRevLett.45.566 ; 11k citations] deals with "structures" of only electrons. When you get down to low densities, you get something called a Wigner crystal, and the high density limit is the famous uniform electron gas.

If these can be called "materials" can be discussed, but I suppose it could be relevant to be able to represent them...

@merkys
Copy link
Member Author

merkys commented Feb 2, 2022

Thanks for a link, @rartino. So having electron (I suggest e) as possible chemical symbol makes sense. I just want to make sure e would not break anything in formulas:

  • In chemical_formula_reduced, elements are ordered alphabetically. e < A, thus there should be no problems.
  • In chemical_formula_hill, using Hill formula notation, e could be written on the leftmost position in the formula. This way it will appear before any capital letter.

I highly doubt IUPAC will ever standardize a chemical symbol starting with minor letter. A small fraction of 26^2 possible double letter symbols is already taken.

@rartino
Copy link
Contributor

rartino commented Feb 2, 2022

I'm not sure why it is a good idea to indicate extra subatomic particles in the chemical formulas at all? Are there any examples of anyone doing that? My vote is to skip the e+n extension until someone shows up with a relevant use case for that.

And then, rather than to relate an empty chemical formula to specifically vacuum, just say that an empty chemical formula MUST only occur for a structure with no atoms. Does that work?

@merkys
Copy link
Member Author

merkys commented Feb 7, 2022

I agree with @rartino, there is no need to over-complicate right now.

@JPBergsma
Copy link
Contributor

I'm not sure why it is a good idea to indicate extra subatomic particles in the chemical formulas at all? Are there any examples of anyone doing that? My vote is to skip the e+n extension until someone shows up with a relevant use case for that.

These are some examples of chemical formula's with electrons:
[Ca24Al28O68]4+4e- https://pubs.acs.org/doi/10.1021/ol701885p
[Na(NH3)6]+e (https://en.wikipedia.org/wiki/Electride)
[La8Sr2(SiO4)6]4+:4e https://www.nature.com/articles/s41535-017-0053-4

Leaving the electrons out will give you a different material with different properties. It would also make it more difficult to find electrides in the databases.
It would probably be good to also have a field for the charge distribution on the atoms, as different charge distributions will give different materials.

@rartino
Copy link
Contributor

rartino commented Feb 10, 2022

@JPBergsma Fair enough, but your examples only make sense because they charge balance the ^{N+} in those formulas - which is a notation we also do not support, not even in chemical_formula_descriptive. Should we then support that as well? And, does the separate e:s add anything to those formulas that isn't already communicated with the ^{N+} notation?

@merkys
Copy link
Member Author

merkys commented Feb 11, 2022

These are some examples of chemical formula's with electrons:
[Ca24Al28O68]4+4e- https://pubs.acs.org/doi/10.1021/ol701885p
[Na(NH3)6]+e− (https://en.wikipedia.org/wiki/Electride)
[La8Sr2(SiO4)6]4+:4e– https://www.nature.com/articles/s41535-017-0053-4

Leaving the electrons out will give you a different material with different properties. It would also make it more difficult to find electrides in the databases. It would probably be good to also have a field for the charge distribution on the atoms, as different charge distributions will give different materials.

Neither of the formulas considered in the initial post on this issue support charges. While having ionic composition formulas would be nice to have, I think this is out of scope for this particular issue.

@merkys
Copy link
Member Author

merkys commented Oct 28, 2022

I would like to revive the thread. There have been some nice future-proof suggestions, but how about introducing just the constraints expressed in my original post, for the time being? Non-empty formula and non-zero element proportion constraints have already been implemented in optimade-python-tools (see Materials-Consortia/optimade-python-tools#986) and are suggested to be included into OpenAPI schemas (see Materials-Consortia/schemas#8).

I understand that some structures will become non-expressable (vacuum structures; structures with very tiny proportions of some element), but at the time being the specification does not say how such formulas should be interpreted.

@rartino
Copy link
Contributor

rartino commented Oct 28, 2022

Echoing what I said previously, I want to be allowed to have a zero length cartesian_site_positions and then set the chemical formulas to the empty string. It may sound silly, but can be the outcome of certain automatic processes that generate structures, and I see no reason to disallow them from being represented.

I am in favor of stating that the proportion constant must be strictly a positive number.

@ml-evs
Copy link
Member

ml-evs commented Oct 28, 2022

I'm perhaps slightly more reticent to allow empty strings than others, as I think it goes against the spirit of what we laid out in the description of chemical_formula_x fields (Is a periodic box of vacuum a chemical? Is jellium?) --- however, I would not be against explicitly loosening the spec in this regard. I think my standpoint is similar to, my intrepretation, at least, of Andrius' (@merkys), i.e., the constraints on non-empty strings and non-zero proportions are implied by the current spec, and thus could be included in our v1.1 OpenAPI schemas, but that we could consider loosening this for 1.2 (i.e., simply changing the regex we would be introducing for the OPTIMADE v1.1 schema in Materials-Consortia/schemas#8 and adding example values for these edge cases). Of course, if we really think the specification as it stands allows such empty formulae then we can just manually alter the optimade-python-tools-produced schema for 1.1 and drop the regex altogether.

@rartino
Copy link
Contributor

rartino commented Oct 31, 2022

@ml-evs If you say it these constraints are implied in the current specification, what passage of text do you support this on? Personally I would say our requirements/conventions are just simply unclear on this (which is why this issue is good - it should be clarified either way). However, I would also generally argue when it comes to schemas that unclear = allowed.

If we are moving this to a semantic discussion ("what is a chemical?") it should probably start with whether an OPTMADE structure generalizes to systems of zero atoms. I'd argue that I want them to, both based on semantics and utility. Then it follows:

  • If structures of zero atoms are not allowed, I think we needs quite a bit of clarification also for other fields (e.g., nelements > 0, nsites > 0, length of cartesian_site_positions > 0, etc.)
  • If structures of zero atoms are allowed, then what should I set the chemical formulas to? We specifically disallow null, so, empty string seems as the only viable choice for these systems?

@merkys
Copy link
Member Author

merkys commented Nov 25, 2022

I agree that "nonempty formula" part of this discussion boils down to whether OPTIMADE allows structures with 0 atoms or not. To me it seems that for structures with 0 atoms, structural properties like lattice_vectors and space_group_* have no sense.

@rartino
Copy link
Contributor

rartino commented Dec 6, 2022

for structures with 0 atoms, structural properties like lattice_vectors and space_group_* have no sense.

I don't see why one cannot define a perfectly fine (non-primitive) unit cell out of three lattice vectors and have it contain zero atoms; to me these are two almost completely separate things. space_group_ is a bit more tricky, but it also isn't required to be provided for these. (If I had to set it, I suppose I would have to indicate symmetry under all symmetry operations).

For a standard representation of "structures" in materials science, is it not better to err on the side allowing too much, than too little? I already gave one use case above, I think I can come up with more if pressed. What problem is it that you are trying to solve by forbidding people to transmit information about empty unit cells via OPTIMADE?

@ml-evs
Copy link
Member

ml-evs commented Dec 6, 2022

It sounds like empty formulae are desirable after all (pending an excoriating rebuttal from @merkys), so I have just relaxed the constraint on non-empty formulae in optimade-python-tools (pending a review). (Re-reading my original comment I don't see anywhere in the spec that implies they cannot be empty, beyond the semantics of the term 'chemical' which I agree is not a relevant discussion to have here!)

The "0-proportion" elements is sensible though, so I have left that in.

Do we need a PR to tighten the wording in the spec on this, or can this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/has-concrete-suggestion This issue has one or more concrete suggestions spelled out that can be brought up for consensus. topic/property-standardization The specification of the precise data representation of properties and entries type/proposal Proposal for addition/removal of features. May need broad discussion to reach consensus.
Projects
None yet
Development

No branches or pull requests

4 participants