Add possibility to preview tables #59

hagenw · 2024-02-23T14:17:08Z

It might be of interest to allow an interactive preview of tables on the datacard.

E.g. one solution could be to pre-load the first 10 lines for every table and add them to the static web page.
Another solution might be to provide an interface for selecting a table to preview, and first 10 lines from the table is only then read (and maybe downloaded) when requested.

hagenw · 2024-02-23T14:40:25Z

For the preview of tables, it would also be interesting to see if we can profit here if we would store the tables in a different format (e.g. parquet) in the repository. E.g. if it would be possible to not download the whole table, but just stream the first 10 lines from the repo when requested.

hagenw · 2024-06-21T13:10:48Z

Good news, when storing tables as PARQUET files on the backend, we can preview them without the need to download the whole file.

The following example highlights it with a dependency table (as we don't have a real table yet published on the server) from our internal server (copied from audeering/audformat#376 (comment)):

import aiohttp
import fsspec
import pyarrow.parquet as parquet

import audbackend


host = "https://artifactory.audeering.com/artifactory"
auth = audbackend.backend.Artifactory.get_authentication(host)
repository = "data-public-local"

# Prepare fsspec https file-system to communicate with Artifactory
fs = fsspec.filesystem("https", auth=aiohttp.BasicAuth(auth[0], auth[1]))

# Preview dependency table of casual-conversations-v2 dataset
dataset = "casual-conversations-v2"
version = "1.0.0"
url = f"{host}/{repository}/{dataset}/db/{version}/db-{version}.parquet"
file = parquet.ParquetFile(url, filesystem=fs)
first_ten_rows = next(file.iter_batches(batch_size=10))
print(first_ten_rows.to_pandas())

which returns

                                      file                               archive  bit_depth  channels  ... removed  sampling_rate type  version
0                      db.disabilities.csv                          disabilities          0         0  ...       0              0    0    1.0.0
1                             db.files.csv                                 files          0         0  ...       0              0    0    1.0.0
2               db.physical-adornments.csv                   physical-adornments          0         0  ...       0              0    0    1.0.0
3               db.physical-attributes.csv                   physical-attributes          0         0  ...       0              0    0    1.0.0
4                         db.recording.csv                             recording          0         0  ...       0              0    0    1.0.0
5                         db.skin-tone.csv                             skin-tone          0         0  ...       0              0    0    1.0.0
6                           db.speaker.csv                               speaker          0         0  ...       0              0    0    1.0.0
7  audio/0000_portuguese_nonscripted_1.wav  f76b3d4a-a172-63ee-22f2-fb2255d692ee         16         1  ...       0          48000    1    1.0.0
8  audio/0000_portuguese_nonscripted_2.wav  81db070f-69a1-ab92-a365-ca95ac36c893         16         1  ...       0          48000    1    1.0.0
9  audio/0000_portuguese_nonscripted_3.wav  d4572eb1-d458-7717-2145-a7861208b8da         16         1  ...       0          48000    1    1.0.0

[10 rows x 11 columns]

Which means it should now be much easier to integrate a fast table preview feature, at least for tables we store in PARQUET.
For the CSV tables it might be slightly more complicated as those are stored inside a ZIP file, and we would need to download the first 10 rows of that file from within the ZIP file. I think it should also be possible, but I don't know how yet.

/cc @ChristianGeng

hagenw mentioned this issue Feb 23, 2024

Add possibility to select other example audio files #61

Open

hagenw added the enhancement New feature or request label Jun 21, 2024

hagenw mentioned this issue Jun 24, 2024

Add possibility to iterate over table data (streaming) audeering/audformat#440

Open

hagenw mentioned this issue Jul 23, 2024

Add table preview to data cards #97

Merged

hagenw closed this as completed in #97 Jul 25, 2024

hagenw mentioned this issue Aug 15, 2024

Add audb.stream() and audb.DatabaseIterator audeering/audb#448

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility to preview tables #59

Add possibility to preview tables #59

hagenw commented Feb 23, 2024

hagenw commented Feb 23, 2024 •

edited

Loading

hagenw commented Jun 21, 2024

Add possibility to preview tables #59

Add possibility to preview tables #59

Comments

hagenw commented Feb 23, 2024

hagenw commented Feb 23, 2024 • edited Loading

hagenw commented Jun 21, 2024

hagenw commented Feb 23, 2024 •

edited

Loading