Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigquery): use absl::optional for fields in job related structs #14050

Closed

Conversation

cocoa-xu
Copy link

@cocoa-xu cocoa-xu commented Apr 24, 2024

Current implementation of BigQuery's REST API by default sends all fields, which causes problems because the default values like "" are different from (literally nothing).

Take the InsertJobRequest message as an example, if we only set the configuration.query.query and configuration.query.writeDisposition field, it's expected that following JSON will be sent:

{
  "configuration": {
    "query": {
      "query": "SELECT * FROM dataset_id.table_id LIMIT 5",
      "writeDisposition": "WRITE_TRUNCATE"
    }
  }
}

However, current implementation will serialise all fields with their initial values by default:

Full JSON string
{
  "configuration": {
    "dryRun": false,
    "jobTimeoutMs": "0",
    "jobType": "",
    "query": {
      "allowLargeResults": true,
      "clustering": {
        "fields": []
      },
      "connectionProperties": [],
      "createDisposition": "",
      "createSession": false,
      "defaultDataset": {
        "datasetId": "",
        "projectId": ""
      },
      "destinationEncryptionConfiguration": {
        "kmsKeyName": ""
      },
      "destinationTable": {
        "datasetId": "",
        "projectId": "",
        "tableId": ""
      },
      "flattenResults": false,
      "maximumBytesBilled": "0",
      "parameterMode": "",
      "preserveNulls": false,
      "priority": "",
      "query": "SELECT * FROM dataset_id.table_id LIMIT 5",
      "queryParameters": [],
      "rangePartitioning": {
        "field": "",
        "range": {
          "end": "",
          "interval": "",
          "start": ""
        }
      },
      "schemaUpdateOptions": [],
      "scriptOptions": {
        "keyResultStatement": "",
        "statementByteBudget": "0",
        "statementTimeoutMs": "0"
      },
      "timePartitioning": {
        "expirationTime": "0",
        "field": "",
        "type": ""
      },
      "useLegacySql": false,
      "useQueryCache": false,
      "writeDisposition": "WRITE_TRUNCATE"
    }
  },
  "etag": "",
  "id": "",
  "jobReference": {
    "jobId": "",
    "location": "",
    "projectId": ""
  },
  "kind": "",
  "selfLink": "",
  "statistics": {
    "completionRatio": 0,
    "creationTime": "0",
    "dataMaskingStatistics": {
      "dataMaskingApplied": false
    },
    "endTime": "0",
    "finalExecutionDurationMs": "0",
    "numChildJobs": "0",
    "parentJobId": "",
    "query": {
      "billingTier": 0,
      "cacheHit": false,
      "dclTargetDataset": {
        "datasetId": "",
        "projectId": ""
      },
      "dclTargetTable": {
        "datasetId": "",
        "projectId": "",
        "tableId": ""
      },
      "dclTargetView": {
        "datasetId": "",
        "projectId": "",
        "tableId": ""
      },
      "ddlAffectedRowAccessPolicyCount": "0",
      "ddlOperationPerformed": "",
      "ddlTargetDataset": {
        "datasetId": "",
        "projectId": ""
      },
      "ddlTargetRoutine": {
        "datasetId": "",
        "projectId": "",
        "routineId": ""
      },
      "ddlTargetRowAccessPolicy": {
        "datasetId": "",
        "policyId": "",
        "projectId": "",
        "tableId": ""
      },
      "ddlTargetTable": {
        "datasetId": "",
        "projectId": "",
        "tableId": ""
      },
      "dmlStats": {
        "deletedRowCount": "0",
        "insertedRowCount": "0",
        "updatedRowCount": "0"
      },
      "estimatedBytesProcessed": "0",
      "materializedViewStatistics": {
        "materializedView": []
      },
      "metadataCacheStatistics": {
        "tableMetadataCacheUsage": []
      },
      "numDmlAffectedRows": "0",
      "performanceInsights": {
        "avgPreviousExecutionMs": "0",
        "stagePerformanceChangeInsights": {
          "inputDataChange": {
            "recordsReadDiffPercentage": 0
          },
          "stageId": "0"
        },
        "stagePerformanceStandaloneInsights": {
          "insufficientShuffleQuota": false,
          "slotContention": false,
          "stageId": "0"
        }
      },
      "queryPlan": [],
      "referencedRoutines": [],
      "referencedTables": [],
      "schema": {
        "fields": []
      },
      "searchStatistics": {
        "indexUnusedReasons": [],
        "indexUsageMode": ""
      },
      "statementType": "",
      "timeline": [],
      "totalBytesBilled": "0",
      "totalBytesProcessed": "0",
      "totalBytesProcessedAccuracy": "",
      "totalPartitionsProcessed": "0",
      "totalSlotMs": "0",
      "transferredBytes": "0",
      "undeclaredQueryParameters": []
    },
    "quotaDeferments": [],
    "reservation_id": "",
    "rowLevelSecurityStatistics": {
      "rowLevelSecurityApplied": false
    },
    "scriptStatistics": {
      "evaluationKind": "",
      "stackFrames": []
    },
    "sessionInfo": {
      "sessionId": ""
    },
    "startTime": "0",
    "totalBytesProcessed": "0",
    "totalSlotMs": "0",
    "transactionInfo": {
      "transactionId": ""
    }
  },
  "status": {
    "errorResult": {
      "location": "",
      "message": "",
      "reason": ""
    },
    "errors": [],
    "state": ""
  },
  "user_email": ""
}

Similar fix has been attempted in #12988, but was later closed. I'll be happy to work on this further (including update these tests) if we all agree to use absl::optional as it's can achieve identical behaviour as in Golang implementation of BigQuery:

type JobConfigurationQuery struct {
	// AllowLargeResults: Optional. If true and query uses legacy SQL
	// dialect, allows the query to produce arbitrarily large result tables
	// at a slight cost in performance. Requires destinationTable to be set.
	// For GoogleSQL queries, this flag is ignored and large results are
	// always allowed. However, you must still set destinationTable when
	// result size exceeds the allowed maximum response size.
	AllowLargeResults bool `json:"allowLargeResults,omitempty"`

	// ...
}

This change is Reviewable

Copy link

conventional-commit-lint-gcf bot commented Apr 24, 2024

🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use automerge label. Good luck human!

-- conventional-commit-lint bot
https://conventionalcommits.org/

@dbolduc
Copy link
Member

dbolduc commented Apr 24, 2024

/gcbrun

@cocoa-xu
Copy link
Author

/gcbrun

Hi @dbolduc, I'll be happy to fix any tests if you're okay with the absl::optional approach :)

@dbolduc
Copy link
Member

dbolduc commented Apr 24, 2024

Hi @dbolduc, I'll be happy to fix any tests if you're okay with the absl::optional approach :)

Cool, and thanks for the PR. I am a dolt when it comes to BigQuery, so I will get some subject matter experts to review this.

@cocoa-xu
Copy link
Author

Hi @dbolduc, I'll be happy to fix any tests if you're okay with the absl::optional approach :)

Cool, and thanks for the PR. I am a dolt when it comes to BigQuery, so I will get some subject matter experts to review this.

No problem. I'll be waiting for further replies from you or other experts 😃

@dbolduc
Copy link
Member

dbolduc commented Apr 25, 2024

@cocoa-xu - tl;dr: Thanks, but we are not prepared to take this PR, even if it is the right thing to do.

I talked with @jsrinnn. Google has internal projects that depend on this code. We aren't ready to update them.

Note that this code is all in an internal directory, and *_internal namespace. It is explicitly not part of our public API. We cannot support its use.

We recognize that there is no public C++ API for interacting with BigQuery jobs. That is something we plan to do. (See #14061). I can't commit to any timelines but we are hoping to get it done by the 2024-07 release1.

Thanks again for the contribution. Your code changes all looks good to me. We are just not ready to take the PR. Sorry.

Footnotes

  1. Word of advice - never trust a software engineer's estimates

@dbolduc dbolduc closed this Apr 25, 2024
@cocoa-xu
Copy link
Author

cocoa-xu commented May 1, 2024

@cocoa-xu - tl;dr: Thanks, but we are not prepared to take this PR, even if it is the right thing to do.

I talked with @jsrinnn. Google has internal projects that depend on this code. We aren't ready to update them.

Note that this code is all in an internal directory, and *_internal namespace. It is explicitly not part of our public API. We cannot support its use.

We recognize that there is no public C++ API for interacting with BigQuery jobs. That is something we plan to do. (See #14061). I can't commit to any timelines but we are hoping to get it done by the 2024-07 release1.

Thanks again for the contribution. Your code changes all looks good to me. We are just not ready to take the PR. Sorry.

Footnotes

  1. Word of advice - never trust a software engineer's estimates

Hi @dbolduc, thank you very much for the update! It's understandable and looking forward to the public C++ API for interacting with BigQuery job sometime in the future :)

@dbolduc
Copy link
Member

dbolduc commented Aug 1, 2024

looking forward to the public C++ API for interacting with BigQuery job sometime in the future :)

It has arrived! The library was included in the v2.27.0 release:

https://github.com/googleapis/google-cloud-cpp/tree/v2.27.0/google/cloud/bigquerycontrol

Note that the library is initially experimental, and that it may take some extra time to propagate through the ecosystem (e.g. to be available via vcpkg).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants