Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK CID sampling alpha #43

Closed
bajtos opened this issue Sep 6, 2023 · 8 comments
Closed

SPARK CID sampling alpha #43

bajtos opened this issue Sep 6, 2023 · 8 comments
Assignees

Comments

@bajtos
Copy link
Contributor

bajtos commented Sep 6, 2023

eta: 2023-10-31
description: Remove the static list of job templates and replace it with dynamic (CID, SP) selection sampling data stored in FIL+ deals. Depending on the complexity of the “proper” CID sampling we envision, this milestone can implement a simplified version or a part of the grand solution.

Current idea:

  1. Use Datacap API to obtain the list of client IDs notarised for LDN FIL+ deals
  2. Pick a random FIL+ LDN deal from StorageMarketActor on-chain state
  3. Ask IPNI to give us a random PayloadCID stored in that deal

See also:

Dependencies:

@bajtos
Copy link
Contributor Author

bajtos commented Sep 20, 2023

Related work we may leverage later: data-preservation-programs/spade#6

@bajtos
Copy link
Contributor Author

bajtos commented Sep 26, 2023

How to find the list of Client IDs that are participating in FIL+ LDN program for data that should be publicly retrievable:

  1. Find the list of notaries for the LDN program here:
    https://datacapstats.io/notaries?showInactive=false&filter=ldn&limit=25
  2. For each notary, find the list of clients they notarised:
    ❯ curl -H 'X-API-KEY: [...]' \
      'https://api.datacapstats.io/public/api/getVerifiedClients/f01858410?limit=10000'
    

Presumably, this list can be obtained by inspecting on-chain data, we don't necessarily have to use the api.datacapstats.io service.

When inspecting StorageMarketActor state for the list of deals, we can sample only deals made by LDN clients.

@bajtos
Copy link
Contributor Author

bajtos commented Sep 26, 2023

  1. Find the list of notaries for the LDN program here:
    https://datacapstats.io/notaries?showInactive=false&filter=ldn&limit=25

We can do this programmatically, too:

❯ curl -H 'X-API-KEY: [...]' \
  'https://api.datacapstats.io/public/api/getVerifiers?limit=1000&filter=ldn'

API docs: https://api.datacapstats.io/docs

@bajtos
Copy link
Contributor Author

bajtos commented Sep 27, 2023

Until we have IPNI endpoint for sampling Payload CIDs, we may want to lean into the approach based on analysing Piece data as explored by RetrievalBot: data-preservation-programs/RetrievalBot#36

This was referenced Sep 27, 2023
@bajtos
Copy link
Contributor Author

bajtos commented Oct 12, 2023

How to get an API key:

curl -X 'GET' \
  'https://api.datacapstats.io/public/api/getApiKey' \
  -H 'accept: */*'

@bajtos
Copy link
Contributor Author

bajtos commented Nov 15, 2023

Next steps:

  • Build IPNI Context ID from FIL Deal proposal, so that we can filter IPNI records to pick only the advertisement from the SP handling the deal

  • Rework SPARK tasking to push IPNI queries to spark-checkers

    • Change the tasks from (CID, address, protocol) to (miner, contextId, CID)
    • Change spark-checkers to query IPNI to get the address & protocol, include miner and contextId in the measurements
    • Change spark-api to ingest new measurement fields
    • Change spark-evaluate - add a fraud-detection step to validate that all members of the committee used the same address & protocol

See filecoin-station/spark#40

@bajtos
Copy link
Contributor Author

bajtos commented Nov 23, 2023

What's remaining:

We already have that data in InfluxDB as of filecoin-station/spark-evaluate#61, but I am reworking that part in filecoin-station/spark-evaluate#67, so I am waiting with dashboards until the second PR is landed.

@bajtos
Copy link
Contributor Author

bajtos commented Nov 28, 2023

Visualisation in SPARK dashboard

Screenshot 2023-11-28 at 10 33 55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant