Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail on duplicate keys in a mapping #416

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tejassharma96
Copy link
Contributor

This fixes #415 but it is a slightly "naive" implementation. It's possible that the "correct" place to throw an error from would be the C code rather than the swift code, but I do not have the knowledge to make those changes 😓

Please let me know what you think!


##### Bug Fixes

* Yams will now correctly error when it tries to decode a mapping with duplicate keys.
Copy link
Contributor Author

@tejassharma96 tejassharma96 May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a bug fix but it may be breaking if people have malformed yaml they are trying to decode with this. imo this is the correct spot for this but happy to move it to "Breaking" instead if you feel that would be more apt

@jpsim
Copy link
Owner

jpsim commented Jul 10, 2024

Thanks for the PR!

This is a welcome change, and doing this at the Swift layer is fine. Based on some GitHub discussions, it does look like the C API can detect duplicate keys. There's also this issue with relevant discussion: yaml/pyyaml#41

However, I think it's important that if we start to error in these cases, that 1) it be recoverable (e.g. we don't fail the whole decoding) and 2) the error is reported at the correct location (mark), right now 1:1 is being hardcoded as the location.

If you rebase or merge main you should have CI in a healthier state to run again.

@tejassharma96
Copy link
Contributor Author

hmmm ok, I'll need a bit of support on both of those due to my lack of familiarity here:

  1. Recoverable - what would you suggest here? I personally would actually expect the entire decoding to fail if we have invalid YAML. I'm curious what you're thinking regarding how we can both make the error clear and visible to users but also not fail the whole decoding.
  2. Correct location - yeah, I was wondering what to do here, couple questions:
    • Say there are 3 instances of a specific key, at which one of those keys should we error? Can we even decide that? I suppose we could use key.mark and compare line/column numbers to choose either first/last in the data?
    • What if there are multiple instances of duplicated keys? If keys a, b and c are duplicated, currently my code would basically just error on the first one, but the error would say but found multiple instances of: ["a", "b", "c"]. I think there's a balance to be struck here on being specific about the point of error but also giving the consumer actionable feedback on all the issues. I suppose this could potentially be an opportunity to create a new case on YamlError, something like .duplicatedKeys(keys: [Node]) that could display all the relevant information from the nodes (ie by fetching .mark for all duplicated nodes and creating some custom error message that gives immediate feedback about all duplicated keys)?

# By JP Simard (3) and others
# Via GitHub
* upstream/main:
  Require Swift 5.7 or later (jpsim#424)
  Add empty changelog section
  Release 5.1.3
  add support for riscv64 (jpsim#419)
  Add support for Android (jpsim#421)
  GitHub Actions workflow fixes (jpsim#422)
  add os(visionOS) support (jpsim#418)
  Bazel: support rules_swift 2.x (jpsim#420)
  Bump rexml from 3.2.5 to 3.2.8

# Conflicts:
#	CHANGELOG.md
@tejassharma96
Copy link
Contributor Author

tejassharma96 commented Jul 10, 2024

Updated the error to notify of all duplicated keys and their locations by creating a new custom error type that can give you actionable feedback for duplicated keys, no matter how many times a key is duplicated/how many keys are duplicates.

I haven't updated it to be recoverable bc personally I don't believe it should recover - imo it should just throw immediately (but happy to make updates to make it recoverable if you give me guidance on how you think that should be done)

let node = Node.mapping(.init(pairs, tag(firstEvent.mappingTag), event.mappingStyle, firstEvent.startMark))
if let anchor = firstEvent.mappingAnchor {
anchors[anchor] = node
}
return node
}

private func checkDuplicates(mappingKeys: [Node]) throws {
var duplicates: [String: [Node]] = [:]
for key in mappingKeys {
Copy link
Contributor Author

@tejassharma96 tejassharma96 Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 368-377 (define var, for loop, filter) could just be

let duplicateKeys = Dictionary(grouping: keys, by: { $0.string! }).filter { $1.count > 1 }

but I wanted to avoid the force unwrap on $0.string. if we're confident that these keys will always be scalars (which I think they should be but I'm not sure about that) then I can just switch to that simpler form

@tejassharma96
Copy link
Contributor Author

hey @jpsim, apologies for the tag but I'd love another review here when you have a minute 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Yams should fail if there are duplicate keys in a mapping
2 participants