Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cargo clippy fails on core in main #1843

Closed
rtyler opened this issue Nov 12, 2023 · 6 comments · Fixed by #1852
Closed

cargo clippy fails on core in main #1843

rtyler opened this issue Nov 12, 2023 · 6 comments · Fixed by #1852
Labels
binding/rust Issues for the Rust crate bug Something isn't working crate/core

Comments

@rtyler
Copy link
Member

rtyler commented Nov 12, 2023

Running cargo clippy in crates/deltalake-core in the latest HEAD of main (currently 2b913b37)

    Checking deltalake-core v0.17.0 (/usr/home/tyler/source/github/delta-io/delta-rs/crates/deltalake-core)
error: higher-ranked lifetime error
   --> crates/deltalake-core/src/operations/optimize.rs:661:46
    |
661 |                           let rewrite_result = tokio::task::spawn(Self::rewrite_files(
    |  ______________________________________________^
662 | |                             task_parameters.clone(),
663 | |                             partition,
664 | |                             files,
665 | |                             object_store,
666 | |                             batch_stream,
667 | |                         ));
    | |__________________________^
    |
    = note: could not prove `impl futures::Future<Output = std::result::Result<(std::vec::Vec<kernel::actions::Action>, operations::optimize::PartialMetrics), errors::DeltaTableError>>: std::marker::Send`

error: higher-ranked lifetime error
   --> crates/deltalake-core/src/operations/optimize.rs:661:65
    |
661 |                         let rewrite_result = tokio::task::spawn(Self::rewrite_files(
    |                                                                 ^^^^^^^^^^^^^^^^^^^

error: future cannot be sent between threads safely
   --> crates/deltalake-core/src/operations/optimize.rs:262:9
    |
262 | /         Box::pin(async move {
263 | |             let writer_properties = this.writer_properties.unwrap_or_else(|| {
264 | |                 WriterProperties::builder()
265 | |                     .set_compression(Compression::ZSTD(ZstdLevel::try_new(4).unwrap()))
...   |
287 | |             Ok((table, metrics))
288 | |         })
    | |__________^ future created by async block is not `Send`
    |
note: opaque type is declared here
   --> crates/deltalake-core/src/operations/optimize.rs:594:10
    |
594 |     ) -> Result<Metrics, DeltaTableError> {
    |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: this item depends on auto traits of the hidden type, but may also be registering the hidden type. This is not supported right now. You can try moving the opaque type and the item that actually registers a hidden type into a new submodule
   --> crates/deltalake-core/src/operations/optimize.rs:259:8
    |
259 |     fn into_future(self) -> Self::IntoFuture {
    |        ^^^^^^^^^^^
note: future is not `Send` as it awaits another future which is not `Send`
   --> crates/deltalake-core/src/operations/optimize.rs:276:27
    |
276 |               let metrics = plan
    |  ___________________________^
277 | |                 .execute(
278 | |                     this.log_store.clone(),
279 | |                     &this.snapshot,
...   |
282 | |                     this.min_commit_interval,
283 | |                 )
    | |_________________^ await occurs here on type `impl futures::Future<Output = std::result::Result<operations::optimize::Metrics, errors::DeltaTableError>>`, which is not `Send`
    = note: required for the cast from `std::pin::Pin<std::boxed::Box<[async block@crates/deltalake-core/src/operations/optimize.rs:262:18: 288:10]>>` to `Pin<Box<dyn Future<Output = Result<(DeltaTable, Metrics), DeltaTableError>> + Send>>`
    = note: the full name for the target type has been written to '/usr/home/tyler/source/github/delta-io/delta-rs/target/debug/deps/deltalake_core-b47cab85e0f6abbd.long-type-4803164412313686460.txt'

error: could not compile `deltalake-core` (lib) due to 3 previous errors
@rtyler rtyler added bug Something isn't working binding/rust Issues for the Rust crate crate/core labels Nov 12, 2023
@rtyler rtyler self-assigned this Nov 12, 2023
@rtyler
Copy link
Member Author

rtyler commented Nov 12, 2023

Running git bisect tells me that it's SHA@947067865a43af624a327db6c5cd557c2c471b46 which was introduced with #1742 is the problem commit, but I'm still not sure why. I'm also not sure if I believe it 😄

@rtyler rtyler removed their assignment Nov 12, 2023
@roeap
Copy link
Collaborator

roeap commented Nov 12, 2023

strange, at least for me, I can reproduce the clippy error, however when I run it with datafusion feature enabled, all is good. so it seems this is exclusive to the non-datafusion branch, which might explain, why it does not show up in CI?

@rtyler
Copy link
Member Author

rtyler commented Nov 12, 2023

This does also look to fail a cargo build with no explicitly defined features 🤔 which is probably how it passed through CI

@rtyler
Copy link
Member Author

rtyler commented Nov 12, 2023

I've tried a couple variations here and I think @roeap is on the right track, the issue seems to be datafusion vs. non-datafusion enabled builds of core

@dispanser
Copy link
Contributor

I'll take care of this today, sorry for breaking main 🙈

@dispanser
Copy link
Contributor

I took a look at this today, but this is out of my skill level. What I found:

  • with an updated rust toolchain (1.76 nightly), the error message is more concise and skips some spurious follow-up errors;
error: higher-ranked lifetime error
   --> crates/deltalake-core/src/operations/optimize.rs:671:46
    |
671 |                           let rewrite_result = tokio::task::spawn(Self::rewrite_files(
    |  ______________________________________________^
672 | |                             task_parameters.clone(),
673 | |                             partition,
674 | |                             files,
675 | |                             object_store,
676 | |                             batch_stream,
677 | |                         ));
    | |__________________________^
    |
    = note: could not prove `impl futures::Future<Output = Result<(Vec<actions::Action>, PartialMetrics), errors::DeltaTableError>>: std::marker::Send`

error: higher-ranked lifetime error
   --> crates/deltalake-core/src/operations/optimize.rs:671:65
    |
671 |                         let rewrite_result = tokio::task::spawn(Self::rewrite_files(
    |                                                                 ^^^^^^^^^^^^^^^^^^^

  • root cause seems to be the (non-datafusion variant of) MergePlan::read_zorder. If I replace the body with
Ok(futures::stream::empty().boxed())

Things start compiling again. To the contrary, skipping only the interleaving / reordering of batches part doesn't:

        let object_store_ref = context.object_store.clone();
        // Read all batches into a vec
        let batches: Vec<RecordBatch> = futures::stream::iter(files.clone())
            .then(|file| {
                let object_store_ref = object_store_ref.clone();
                async move {
                    let file_reader = ParquetObjectReader::new(object_store_ref.clone(), file);
                    ParquetRecordBatchStreamBuilder::new(file_reader)
                        .await?
                        .build()
                }
            })
            .try_flatten()
            .try_collect::<Vec<_>>()
            .await?;
        Ok(futures::stream::iter(batches)
            .map(|x| Ok(x))
            .boxed())
  • root cause seems to be Bogus higher-ranked lifetime error in an async block rust-lang/rust#102211 and in particular this comment. However, I can't make any sense of this. I played around with some of the suggestions but no luck.
  • I double-checked the diff from my PR that introduced the problem, but the relevant function read_zorder doesn't have any changes, and in general the change just mechanically introduces log_store where needed and log_store.object_store() for the other places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working crate/core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants