Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Databento Symbology in InteractiveBrokersInstrumentProvider #1790

Open
rsmb7z opened this issue Jul 17, 2024 · 10 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@rsmb7z
Copy link
Collaborator

rsmb7z commented Jul 17, 2024

Feature Request

Refactor InteractiveBrokersInstrumentProvider to accept Databento symbology as an option, while keeping the original Interactive Brokers symbology intact. This will enhance flexibility in symbol management.

Requirements

Optional Databento Symbology flag:

  • Add a parameter lag for Databento symbology in the config.
  • Ensure backward compatibility by defaulting to the original symbology.

Symbology Conversion:

  • Implement logic to convert Databento symbology to Interactive Brokers format.

Configuration and Validation:

  • Add configuration options to enable/disable Databento symbology.
  • Validate accepted symbology types.

Testing and Documentation:

  • Update unit tests for both symbology types.
  • Provide documentation with examples.

Backward Compatibility:

  • Ensure existing functionality remains unaffected.
@rterbush
Copy link
Contributor

I've had this discussion quite a bit with @rsmb7z ...

Is it possible for us to standardize on the exchange symbology and map to the various data and execution providers symbology under the covers?

For example, I'd much prefer to standardize on CME symbology for working with futures contracts, mapping to whatever symbol needed for IB to execute trades, or whatever symbol Databento needs to pull data. Seems that would stay true to the goal of same code running in backtest or live trading.

Or am I misunderstanding this issue?

@rsmb7z
Copy link
Collaborator Author

rsmb7z commented Jul 18, 2024

Hi @rterbush

The current plan is to ensure that the translation happens seamlessly under the hood, with the IB adapter respecting the Databento symbology. This means the symbology used for the historical dataset provided by Databento will be utilized during backtesting and other processes. Once the use case is implemented, there will be room for further refinement and consolidation as needed.

@cjdsellers
Copy link
Member

cjdsellers commented Jul 19, 2024

Some additional background: I had originally implemented the Databento client to use the individual CME venues instead of the umbrella GLBX venue which Databento are using.

IIRC this resulted in a sharp increase in complexity, any subscription would first require instrument definitions to be available or requested so we could get at the exchange field, and then this translation between GLBX and the individual venues was needed in a few places. So I ended up walking that back which has now pushed the complexity back out to the Interactive Brokers adapter.

I agree with @rterbush, that we should avoid layering on even more complexity with additional configuration settings users have to be concerned with. Probably the way the initial Databento adapter implementation was heading was along the right lines, where proper MIC codes are used for the venues -- which would then only need a simple XCME -> CME type mapping for Interactive Brokers.

There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g., XNAS rather than NASDAQ).

[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from GLBX -> CME?

@rsmb7z
Copy link
Collaborator Author

rsmb7z commented Jul 20, 2024

There's some additional context with IB I have to catch up on, but do we at least agree that for traditional assets we should use the official ISO 10383 MIC codes as the venue identifier? (e.g., XNAS rather than NASDAQ).

Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ.

[edit] @rsmb7z and I did have several conversations about this months ago. I think this is when we settled on that initial Databento implementation. But I'm not sure we've covered this since I walked that back? My intuition is that the Interactive Brokers adapter probably shouldn't be responsible for the translation from GLBX -> CME?

Yes, the background is well covered. I think the adapter should handle symbols where there is no ambiguity and can resolve a single unique instrument. Let's include this in the initial draft and get community feedback. Since this will be optional, it shouldn't impact any existing functionality, and users can still have their own translation for InstrumentId within their strategy.

@faysou
Copy link
Collaborator

faysou commented Jul 20, 2024

@cjdsellers from my short experience as a user with databento and nautilus, I think the definition has to be downloaded anyway so the system works properly, especially when using options. So I would assume that someone using databento would as well have access to the definition file.

I've worked on some helper functions to make it easy to always download data and defintions from databento and interact with Nautilus by saving them to a catalog as well. Here's the code below, maybe it could be included in Nautilus at some point somewhere as it makes it quite easy to handle databento data.

from datetime import datetime, timedelta
from pathlib import Path

import databento as db
from nautilus_trader.adapters.databento.loaders import DatabentoDataLoader
from nautilus_trader.persistence.catalog import ParquetDataCatalog


DATA_PATH = Path("~/databento_data").expanduser()

databento_api_key = "db-xxxx"
client = db.Historical(key=databento_api_key)


def get_next_day(date_str):
    date_format = "%Y-%m-%d"
    date = datetime.strptime(date_str, date_format)
    next_day = date + timedelta(days=1)

    return next_day.strftime(date_format)
    

def get_databento_data(symbols, start, end, schema='ohlcv-1m', subfolder='', file_prefix='', dataset='GLBX.MDP3',
                       path=DATA_PATH, save_to_catalog=True):
    used_path = path / subfolder

    if not used_path.exists():
        used_path.mkdir(parents=True, exist_ok=True)

    # downloading and saving defintion
    definition_date = start.split('T')[0]
    end_date = end.split('T')[0]
    used_end_date = end_date if definition_date != end_date else get_next_day(definition_date)

    used_file_prefix = file_prefix + ('_' if file_prefix != '' else '')
    definition_file_name = used_file_prefix + "definition.dbn.zst"
    definition_file = used_path / definition_file_name

    if not definition_file.exists():
        definition = client.timeseries.get_range(
            dataset=dataset,
            schema='definition',
            symbols=symbols,
            start=definition_date,
            end=used_end_date,
            path=definition_file
        )
    else:
        definition = load_databento_data(definition_file)

    # downloading and saving data
    data_file_name = f"{used_file_prefix}{schema}_{start}_{end}.dbn.zst"
    data_file = used_path / data_file_name

    if not data_file.exists():
        data = client.timeseries.get_range(
            dataset=dataset,
            schema=schema,
            symbols=symbols,
            start=start,
            end=end,
            path=data_file
        )
    else:
        data = load_databento_data(data_file)

    result = dict(symbols=symbols, dataset=dataset, schema=schema,
                  start=start, end=end, path=used_path, file_prefix=file_prefix,
                  definition_file=definition_file, data_file=data_file,
                  definition=definition, data=data)

    if save_to_catalog:
        catalog_data = save_data_to_catalog(definition_file, data_file, subfolder, path)
        result = {**result, **catalog_data}

    return result


def save_data_to_catalog(definition_file, data_file, subfolder='', path=DATA_PATH):
    catalog = load_catalog(subfolder, path)

    loader = DatabentoDataLoader()
    nautilus_definition = loader.from_dbn_file(definition_file, as_legacy_cython=True)
    nautilus_data = loader.from_dbn_file(data_file, as_legacy_cython=False)

    catalog.write_data(nautilus_definition + nautilus_data)

    return dict(catalog=catalog, nautilus_definition=nautilus_definition, nautilus_data=nautilus_data)


def load_catalog(subfolder='', path=DATA_PATH):
    used_path = path / subfolder

    if not used_path.exists():
        used_path.mkdir()

    return ParquetDataCatalog(used_path)


def query_catalog(catalog, data_type='bars', **kwargs):
    if data_type == 'bars':
        return catalog.bars(**kwargs)
    elif data_type == 'ticks':
        return catalog.quote_ticks(**kwargs)
    elif data_type == 'instruments':
        return catalog.instruments(**kwargs)
    elif data_type == 'custom':
        return catalog.custom_data(**kwargs)


def save_databento_data(data, file):
    return data.to_file(file)


def load_databento_data(file):
    return db.DBNStore.from_file(file)

And an example:

start = '2024-05-09T10:00'
end = '2024-05-09T10:05'

test_folder = '20240720_ES_Test'

#Note: the file_prefix allows to do similar data requests without file conflicts as if a file already exists no data request is done
option_symbols = ['ESM4 P5230', 'ESM4 P5250']
symbols_data1 = get_databento_data(option_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='options')

future_symbols = ['ESM4']
symbols_data2 = get_databento_data(future_symbols, start, end, schema='mbp-1', subfolder=test_folder, file_prefix='futures')

future_symbols = ['ESM4']
symbols_data3 = get_databento_data(future_symbols, start, end, schema='ohlcv-1m', subfolder=test_folder, file_prefix='futures')

catalog = load_catalog(test_folder)
query_catalog(catalog, 'ticks', instrument_ids=['ESM4.GLBX'])
catalog.instruments(instrument_ids=['ESM4 P5250.GLBX'])

@cjdsellers
Copy link
Member

Yes, I agree, especially when using Databento+IB together. However, if someone is using only IB, they can continue to use IB symbology, i.e. AAPL.NASDAQ.

@rsmb7z that's a good point I missed, some users will still want to use Interactive Brokers naming conventions - and since this is already working now with great effort, then it should continue to work and be an option.

@faysou thanks for the suggested solution including code. I think ideally we'd want a solution which didn't always require a data catalog -- so that a live trading node didn't always need access to a populated catalog, and BacktestEngine users aren't forced into needing a catalog for GLBX venue translations to work.

Is someone able to point me in the direction of the IB docs for the CME venues? would be appreciated 🙏.

@faysou
Copy link
Collaborator

faysou commented Jul 21, 2024

It seems that supporting universal symbols across data providers should work, for example using the convention that @cjdsellers mentioned for venues. And each market adapter then is responsible for translating to its specificities.

For databento having a higher venue granularity than GLBX would also allow higher granularity of portfolio functions related to exposures.

@rsmb7z
Copy link
Collaborator Author

rsmb7z commented Jul 21, 2024

Is someone able to point me in the direction of the IB docs for the CME venues?

@cjdsellers, here you can find the list of exchanges covered by IB worldwide.
https://www.interactivebrokers.com/en/trading/products-exchanges.php

@faysou
Copy link
Collaborator

faysou commented Jul 22, 2024

As an example the symbols for options between databento and IB are currently quite different, for example 'ESM4 P5230.GLBX' in databento and 'ESU24P5550.CME' in IB. So there needs to be a choice for a universal nautilus convention.

@anegrean
Copy link

anegrean commented Aug 2, 2024

Hi! I'm very much interested in solving this issue as well. Please let me know if I can help in any way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

5 participants