-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add possibility to preview tables #59
Comments
For the preview of tables, it would also be interesting to see if we can profit here if we would store the tables in a different format (e.g. parquet) in the repository. E.g. if it would be possible to not download the whole table, but just stream the first 10 lines from the repo when requested. |
Good news, when storing tables as PARQUET files on the backend, we can preview them without the need to download the whole file. The following example highlights it with a dependency table (as we don't have a real table yet published on the server) from our internal server (copied from audeering/audformat#376 (comment)): import aiohttp
import fsspec
import pyarrow.parquet as parquet
import audbackend
host = "https://artifactory.audeering.com/artifactory"
auth = audbackend.backend.Artifactory.get_authentication(host)
repository = "data-public-local"
# Prepare fsspec https file-system to communicate with Artifactory
fs = fsspec.filesystem("https", auth=aiohttp.BasicAuth(auth[0], auth[1]))
# Preview dependency table of casual-conversations-v2 dataset
dataset = "casual-conversations-v2"
version = "1.0.0"
url = f"{host}/{repository}/{dataset}/db/{version}/db-{version}.parquet"
file = parquet.ParquetFile(url, filesystem=fs)
first_ten_rows = next(file.iter_batches(batch_size=10))
print(first_ten_rows.to_pandas()) which returns
Which means it should now be much easier to integrate a fast table preview feature, at least for tables we store in PARQUET. /cc @ChristianGeng |
It might be of interest to allow an interactive preview of tables on the datacard.
E.g. one solution could be to pre-load the first 10 lines for every table and add them to the static web page.
Another solution might be to provide an interface for selecting a table to preview, and first 10 lines from the table is only then read (and maybe downloaded) when requested.
The text was updated successfully, but these errors were encountered: