Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store all torrent fields in the database #284

Closed
3 tasks done
josecelano opened this issue Sep 13, 2023 · 15 comments
Closed
3 tasks done

Store all torrent fields in the database #284

josecelano opened this issue Sep 13, 2023 · 15 comments
Labels
Easy Good for Newcomers EPIC Contains several subissues
Milestone

Comments

@josecelano
Copy link
Member

josecelano commented Sep 13, 2023

Relates to: #285

Currently, we have these fields when we parse/decode the torrent file:

#[derive(PartialEq, Debug, Clone, Serialize, Deserialize)]
pub struct Torrent {
    pub info: TorrentInfo, //
    #[serde(default)]
    pub announce: Option<String>,
    #[serde(default)]
    pub nodes: Option<Vec<TorrentNode>>,
    #[serde(default)]
    pub encoding: Option<String>,
    #[serde(default)]
    pub httpseeds: Option<Vec<String>>,
    #[serde(default)]
    #[serde(rename = "announce-list")]
    pub announce_list: Option<Vec<Vec<String>>>,
    #[serde(default)]
    #[serde(rename = "creation date")]
    pub creation_date: Option<i64>,
    #[serde(rename = "comment")]
    pub comment: Option<String>,
    #[serde(default)]
    #[serde(rename = "created by")]
    pub created_by: Option<String>,
}

Some are not persisted in the database. We could persist all fields in the Torrent struct. Some fields are missing in the database like:

  • nodes
  • encoding
  • httpseeds
  • creation_date
  • comment
  • created_by
CREATE TABLE "torrust_torrents" (
	"torrent_id"	INTEGER NOT NULL,
	"uploader_id"	INTEGER NOT NULL,
	"category_id"	INTEGER,
	"info_hash"	TEXT NOT NULL UNIQUE,
	"size"	INTEGER NOT NULL,
	"name"	TEXT NOT NULL,
	"pieces"	TEXT NOT NULL,
	"piece_length"	INTEGER NOT NULL,
	"private"	BOOLEAN DEFAULT NULL,
	"root_hash"	INT NOT NULL DEFAULT 0,
	"date_uploaded"	TEXT NOT NULL,
	"source"	TEXT DEFAULT NULL,
	FOREIGN KEY("uploader_id") REFERENCES "torrust_users"("user_id") ON DELETE CASCADE,
	FOREIGN KEY("category_id") REFERENCES "torrust_categories"("category_id") ON DELETE SET NULL,
	PRIMARY KEY("torrent_id" AUTOINCREMENT)
);

Subtasks

@josecelano josecelano added the Easy Good for Newcomers label Sep 13, 2023
@josecelano
Copy link
Member Author

Should the creation date be the original, upload or download torrent date? @da2ce7

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 13, 2023

Should the creation date be the original, upload or download torrent date? @da2ce7

I think that it should be the original. Since it is closest to the real date. However dose the GUI show this date?

@josecelano
Copy link
Member Author

Should the creation date be the original, upload or download torrent date? @da2ce7

I think that it should be the original. Since it is closest to the real date. However dose the GUI show this date?

No, It doesn't.

image

It only shows the upload date and the canonical info-hash

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 13, 2023

We have the problem that there is a many-to-one relationship with the original creation date, and our torrent in the DB... 😮‍💨

@josecelano
Copy link
Member Author

We have the problem that there is a many-to-one relationship with the original creation date, and our torrent in the DB... 😮‍💨

Maybe we should keep the original creation date is the info-hash does not change, and update the creation date to the upload date if the canonical info-hash is different. As I see it we are creating a new torrent because the torrent identity is defined by its info-hash.

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 13, 2023

So the torrent creation date is canonical. Then that is easy: it is a one-to-one relationship. 😄

@josecelano
Copy link
Member Author

josecelano commented Sep 15, 2023

Hi @da2ce7, Maybe some fields were not included intentionally because they are described in BEPs we do not support or because they are not officially defined in any BEP. Anyway, i would include them.

Original unofficial specification

creation_date:

BEP: There's no specific BEP for "creation_date".

Description: This field is an optional key that contains the creation time of the torrent, in standard UNIX epoch format (seconds since 1-Jan-1970 00:00:00 UTC).

comment:

BEP: Again, there's no specific BEP for "comment". It's part of the original unofficial specification.

Description: An optional field that contains free-form comments for the torrent. It's essentially a text field where the creator of the torrent can put any desired information.

created_by:

BEP: Like "comment" and "creation_date", there's no specific BEP for "created_by". It's part of the original unofficial specification.

Description: Another optional field that identifies the software used to create the .torrent file.

encoding:

BEP: There's no specific BEP that defines the "encoding" field. It's part of the original unofficial specification.

Description: It specifies the character encoding used for various strings within the torrent meta-info, such as the "comment" or "created by" fields. The most common encoding is UTF-8.

These fields are from the original unofficial specification of the BitTorrent protocol, all of them are widely recognized and used by most modern BitTorrent clients.

BEP 5: DHT Protocol

nodes:

Description: This BEP defines the Distributed Hash Table (DHT) protocol that BitTorrent clients use to find peers without using a central tracker. The "nodes" field in a .torrent file provides an initial list of nodes for bootstrapping into the DHT.

BEP 32: IPv6 extension for DHT

nodes6:

BEP 17: HTTP Seeding (Hoffman-style)

httpseeds:

In the main area of the metadata file and not part of the "info" section, will be a new key, "httpseeds". This key will refer to a list of URLs, and will contain a list of web addresses where torrent data can be retrieved. This key may be safely ignored if the client is not capable of using it.

BEP 19: HTTP/FTP Seeding (GetRight-style)

url-list:

Using HTTP or FTP servers as seeds for BitTorrent downloads.

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 15, 2023

Hello @josecelano

I like idea of including these unofficial fields. However we should be careful that the uploaded torrents may not have valid or even malicious data that is uploaded into these fields.

@josecelano
Copy link
Member Author

Hello @josecelano

I like idea of including these unofficial fields. However we should be careful that the uploaded torrents may not have valid or even malicious data that is uploaded into these fields.

Hi @da2ce7 Maybe we should draft a new BEP collecting and describing all the unofficial fields in order to make them official. I think that would be a really great contribution.

@da2ce7
Copy link
Contributor

da2ce7 commented Sep 15, 2023

@josecelano I would love to do that once I find time. I would use our https://github.com/torrust/teps repo for the draft, then once we are happy with it; we can submit it to https://www.bittorrent.org/

@josecelano
Copy link
Member Author

@josecelano I would love to do that once I find time. I would use our https://github.com/torrust/teps repo for the draft, then once we are happy with it; we can submit it to https://www.bittorrent.org/

OK, then I think I can start implementing this issue so we have more info for the TEP/BEP.

Besides, I have the database with the 2400 torrents from academictorrenst, so we can check what values those fields have.

@josecelano
Copy link
Member Author

Just for the record and to be clear about the purpose if this issue. This package:

https://github.com/ttlajus/lava_torrent/blob/master/src/torrent/v1/mod.rs#L58-L88

uses a different strategy. It models the struct after the specifications and unknown fields are added separately.

/// Everything found in a *.torrent* file.
///
/// Modeled after the specifications
/// in [BEP 3](http://bittorrent.org/beps/bep_0003.html) and
/// [BEP 12](http://bittorrent.org/beps/bep_0012.html). Unknown/extension
/// fields will be placed in `extra_fields` (if the unknown
/// fields are found in the `info` dictionary then they are placed in
/// `extra_info_fields`). If you need any of those extra fields you would
/// have to parse it yourself.
#[derive(Clone, Debug, Eq, PartialEq)]
pub struct Torrent {
    /// URL of the torrent's tracker.
    pub announce: Option<String>,
    /// Announce list as defined in [BEP 12](http://bittorrent.org/beps/bep_0012.html).
    pub announce_list: Option<AnnounceList>,
    /// Total torrent size in bytes (i.e. sum of all files' sizes).
    pub length: Integer,
    /// If the torrent contains only 1 file then `files` is `None`.
    pub files: Option<Vec<File>>,
    /// If the torrent contains only 1 file then `name` is the file name.
    /// Otherwise it's the suggested root directory's name.
    pub name: String,
    /// Block size in bytes.
    pub piece_length: Integer,
    /// SHA1 hashes of each block.
    pub pieces: Vec<Piece>,
    /// Top-level fields not defined in [BEP 3](http://bittorrent.org/beps/bep_0003.html).
    pub extra_fields: Option<Dictionary>,
    /// Fields in `info` not defined in [BEP 3](http://bittorrent.org/beps/bep_0003.html).
    pub extra_info_fields: Option<Dictionary>,
}

In our case, we are using explicit optional fields:

pub nodes: Option<Vec<TorrentNode>>,
pub httpseeds: Option<Vec<String>>,

What we want to do is to persist them in the database. Since the strcut and the persisted data don't match.

@josecelano
Copy link
Member Author

I've created a console command that helps create torrent files with the fields you want to add for testing purposes. See #511.

@josecelano
Copy link
Member Author

All files in the Torrent struct are now persisted. However, some fields are not even included in the Torrent struct. I've opened new issues to add to the Torrent struct and persist them.

We should also persist all extra fields as described in my previous comment.

pub struct Torrent {
    pub info: TorrentInfoDictionary, //
    #[serde(default)]
    pub announce: Option<String>,
    #[serde(default)]
    pub nodes: Option<Vec<(String, i64)>>,
    #[serde(default)]
    pub encoding: Option<String>,
    #[serde(default)]
    pub httpseeds: Option<Vec<String>>,
    #[serde(default)]
    #[serde(rename = "announce-list")]
    pub announce_list: Option<Vec<Vec<String>>>,
    #[serde(default)]
    #[serde(rename = "creation date")]
    pub creation_date: Option<i64>,
    #[serde(default)]
    pub comment: Option<String>,
    #[serde(default)]
    #[serde(rename = "created by")]
    pub created_by: Option<String>,
}

@josecelano
Copy link
Member Author

For the record, fields related to files (md5sum, length, path) are stored in the torrust_torrent_files table regardless of whether the torrent file is a single file torrent or a multi-file torrent.

CREATE TABLE "torrust_torrent_files" (
	"file_id"	INTEGER NOT NULL,
	"torrent_id"	INTEGER NOT NULL,
	"md5sum"	TEXT DEFAULT NULL,
	"length"	BIGINT NOT NULL,
	"path"	TEXT DEFAULT NULL,
	FOREIGN KEY("torrent_id") REFERENCES "torrust_torrents"("torrent_id") ON DELETE CASCADE,
	PRIMARY KEY("file_id" AUTOINCREMENT)
);

Single file torrent:

{
   "created by": "qBittorrent v4.5.4",
   "creation date": 1691149572,
   "info": {
      "length": 11,
      "name": "sample.txt",
      "piece length": 16384,
      "pieces": "<hex>D4 91 58 7F 1C 42 DF F0 CB 0F F5 C2 B8 CE FE 22 B3 AD 31 0A</hex>"
   }
}

Multiple file torrent:

{
   "created by": "qBittorrent v4.5.4",
   "creation date": 1691151958,
   "info": {
      "files": [
         {
            "length": 11,
            "path": [
               "sample.txt"
            ]
         }
      ],
      "name": "sample",
      "piece length": 16384,
      "pieces": "<hex>D4 91 58 7F 1C 42 DF F0 CB 0F F5 C2 B8 CE FE 22 B3 AD 31 0A</hex>"
   }
}

https://wiki.theory.org/BitTorrentSpecification#Info_in_Single_File_Mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Good for Newcomers EPIC Contains several subissues
Projects
Status: Done
Development

No branches or pull requests

3 participants