Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting Cassandra/Yugabyte as airflow backend DB #32666

Closed
2 tasks done
tetsushiawano opened this issue Jul 18, 2023 · 4 comments
Closed
2 tasks done

Supporting Cassandra/Yugabyte as airflow backend DB #32666

tetsushiawano opened this issue Jul 18, 2023 · 4 comments
Labels
kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet

Comments

@tetsushiawano
Copy link

tetsushiawano commented Jul 18, 2023

Description

Wishing compatability for active-active database as backend DB of airflow.

The current supported DB are not active-active DB by nature in my understanding. (or expensive to enable active-active feature.)

Is there any plan to support database like Cassandra or YugabyteCQL in future?

Use case/motivation

Make multi-region active-active Airflow hosted on our multiple OnPrem Datacenter.

Screenshot 2023-07-18 at 14 18 07

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tetsushiawano tetsushiawano added kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet labels Jul 18, 2023
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 18, 2023

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@raphaelauv
Copy link
Contributor

Airflow expect a relational database because it use SqlAlchemy

for Yugabyte , maybe you could make it work , but few postgres "features" are missing -> yugabyte/yugabyte-db#5683

@nathadfield
Copy link
Collaborator

@potiuk Would you like to comment on this from the perspective of project direction and priorities?

@potiuk
Copy link
Member

potiuk commented Jul 18, 2023

Not possible without completely rewriting hiw scheduler works. We heavily rely on lock mechanisms in relation databases. That does not work well with active/active setup

And it is not needed IMHO.

Generally speaking Airflow is not targeting 99.9% available system(8 hours a year). It is far less availability as target. And this is by design. Airflow should not be used as a backbone of system that requires it. It is a batch scheduling system. Having minutes availability for Airflow is NON-goal for us.

Every single 9 there order of magnitude complexity and cost of development and requires usually much more complex maintrnance. If someone wills to invest 10x the time that has been used to develop current scheduler - maybe they should. But we never saw it as a goal. That's why we rejected things like zookeeper, non-relational databases and supporting active-active setup in the past

Supporting such setup without real need is at most following a 'trend' without really considering if it is worth the cost. You have to remember that every single decision of architecture bears a cost. And in this case - the cost is huge.

If someone would like to spend time and write Airflow Improvement Proposal and convince others and bear the cost of implementing it - yes sure, anyone is free to start it. But considering cost of implementation, deployment and deeply considering what it means to run Airflow in such a way, reasoning why it is necessary will be a crucial part of it. And convincing community members that it is worth it is the first step.

For now I personally am not convinced. But luckily it's not me to convince. Bring it to devlist, state your reasoning, fend off people who will not like it, reach consensus, get the AIP to pass the vote and implement it. This is the way how one should approach it.

@apache apache locked and limited conversation to collaborators Jul 18, 2023
@potiuk potiuk converted this issue into discussion #32675 Jul 18, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind:feature Feature Requests needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

4 participants