docs: 📝 Add expected answers to `DataFrame` method examples #12564

Eason0729 · 2024-09-21T05:03:48Z

Which issue does this PR close?

Closes #12527.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb · 2024-09-21T11:12:46Z

datafusion/core/src/dataframe/mod.rs

@@ -225,7 +225,12 @@ impl DataFrame {
    /// # async fn main() -> Result<()> {
    /// let ctx = SessionContext::new();
    /// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?;
-    /// let df = df.select_columns(&["a", "b"])?;
+    /// df.select_columns(&["a", "b"])?.show().await?;


What would you think about using assert_batches_eq instead so that the output is automatically checked as part of the doc tests rather than relying on us manually keeping it up to date?

It is of similar readability I think. Here is an example:

datafusion/datafusion/core/tests/sql/select.rs

Lines 27 to 49 in d9cb6e6

let results = ctx

.sql("SELECT * FROM test WHERE c1 = $1")

.await?

.with_param_values(vec![ScalarValue::from(3i32)])?

.collect()

.await?;

let expected = vec![

"+----+----+-------+",

"| c1 | c2 | c3 |",

"+----+----+-------+",

"| 3 | 1 | false |",

"| 3 | 10 | true |",

"| 3 | 2 | true |",

"| 3 | 3 | false |",

"| 3 | 4 | true |",

"| 3 | 5 | false |",

"| 3 | 6 | true |",

"| 3 | 7 | false |",

"| 3 | 8 | true |",

"| 3 | 9 | false |",

"+----+----+-------+",

];

assert_batches_sorted_eq!(expected, &results);

My original thought is to simplify large long column on doctest(less readable), like this: https://github.com/Eason0729/datafusion/blob/4dd44c5a2b0d9810d7e9163689afab227c58d542/datafusion/core/src/dataframe/mod.rs#L734

Therefore, I simplify example_long.csv in the next commit to the point which is just enough to showcase most method on dataframe.

But the expect behavior on describe method is more complex, which might require dedicated csv, so I decided to leave as it is.

alamb

Thank you so much @Eason0729 for this contribution 🙏

I have started the CI to check this code

I think the PR would be even better if the output was verified (so it is guaranteed to stay in sync). I left a suggestion on how to do this. Let me know what you think

comphead · 2024-09-21T17:23:26Z

datafusion/core/src/dataframe/mod.rs

@@ -184,7 +184,7 @@ impl DataFrame {
    }

    /// Creates logical expression from a SQL query text.
-    /// The expression is created and processed againt the current schema.
+    /// The expression is created and processed against the current schema.


comphead · 2024-09-21T17:24:58Z

datafusion/core/src/dataframe/mod.rs

    /// # #[tokio::main]
    /// # async fn main() -> Result<()> {
    /// let ctx = SessionContext::new();
    /// let df = ctx.read_json("tests/data/unnest.json", NdJsonReadOptions::default()).await?;
+    /// // expend into multiple columns if it's json array, flatten field name if it's nested structure


Suggested change

/// // expend into multiple columns if it's json array, flatten field name if it's nested structure

/// // expand into multiple columns if it's json array, flatten field name if it's nested structure

comphead · 2024-09-21T17:42:18Z

Thanks @Eason0729 for your contribution.
I'm feeling we need to have something to create DF from rows rather in addition to creating DF from data files.
Example can be

let schema = 
DataFrame::from(schema, data)

Underneath the method can call ctx.read_batch(record_batch). The batch can be created with RecordBatch::try_from_iter

docs: 📝 Add expected answers to DataFrame method examples

323fd18

github-actions bot added the core Core DataFusion crate label Sep 21, 2024

alamb reviewed Sep 21, 2024

View reviewed changes

test: 📝 use assert_batches_sorted_eq and simplify example_long.csv

4dd44c5

comphead reviewed Sep 21, 2024

View reviewed changes

comphead mentioned this pull request Sep 21, 2024

Concise API to create DataFrame from collection #12574

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: 📝 Add expected answers to `DataFrame` method examples #12564

docs: 📝 Add expected answers to `DataFrame` method examples #12564

Eason0729 commented Sep 21, 2024

alamb Sep 21, 2024

Eason0729 Sep 21, 2024

Eason0729 Sep 21, 2024

alamb left a comment

comphead Sep 21, 2024

comphead Sep 21, 2024

comphead commented Sep 21, 2024

	let results = ctx
	.sql("SELECT * FROM test WHERE c1 = $1")
	.await?
	.with_param_values(vec![ScalarValue::from(3i32)])?
	.collect()
	.await?;
	let expected = vec![
	"+----+----+-------+",
	"\| c1 \| c2 \| c3 \|",
	"+----+----+-------+",
	"\| 3 \| 1 \| false \|",
	"\| 3 \| 10 \| true \|",
	"\| 3 \| 2 \| true \|",
	"\| 3 \| 3 \| false \|",
	"\| 3 \| 4 \| true \|",
	"\| 3 \| 5 \| false \|",
	"\| 3 \| 6 \| true \|",
	"\| 3 \| 7 \| false \|",
	"\| 3 \| 8 \| true \|",
	"\| 3 \| 9 \| false \|",
	"+----+----+-------+",
	];
	assert_batches_sorted_eq!(expected, &results);

	/// // expend into multiple columns if it's json array, flatten field name if it's nested structure
	/// // expand into multiple columns if it's json array, flatten field name if it's nested structure

docs: 📝 Add expected answers to DataFrame method examples #12564

Are you sure you want to change the base?

docs: 📝 Add expected answers to DataFrame method examples #12564

Conversation

Eason0729 commented Sep 21, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

alamb Sep 21, 2024

Choose a reason for hiding this comment

Eason0729 Sep 21, 2024

Choose a reason for hiding this comment

Eason0729 Sep 21, 2024

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

comphead Sep 21, 2024

Choose a reason for hiding this comment

comphead Sep 21, 2024

Choose a reason for hiding this comment

comphead commented Sep 21, 2024

docs: 📝 Add expected answers to `DataFrame` method examples #12564

docs: 📝 Add expected answers to `DataFrame` method examples #12564