Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement binary exec / RPC based plugins for Machine #3

Closed
wants to merge 7 commits into from

Conversation

nathanleclaire
Copy link
Owner

Docker Machine Driver Plugins

Motivation

Many folks have expressed interest in writing or maintaining a driver for Machine but @ehazlett and I are incapable of reviewing and maintaining all of them ourselves. Additionally, we are a bottleneck for when people want to add features and release new versions of existing drivers. Therefore, we are moving to a plugin-based architecture.

Usage

From the end user's perspective, the Machine CLI will be pretty similar to how it always has, with the additional caveat that binaries for the desired plugins must be installed as well as the core docker-machine binary.

For the plugin developer and/or distributor, there are some considerations to keep in mind. All plugins must be Go binaries located in the end user's PATH with the name of docker-machine-drivername. Once present, they will be usable.

To implement a plugin, the driver developer must make a new repository with the plugin code, which:

  • includes a properly named module (e.g. virtualbox) which contains a struct that fulfills the Driver interface from github.com/docker/machine/libmachine/drivers
  • includes a main module (I've chosen to make this in a directory called bin in the virtualbox example presented here for this) which calls plugin.RegisterPlugin() with an instantiated instance of the struct.

Example from VirtualBox:

package main

import (
        "github.com/docker/machine/drivers/virtualbox"
        "github.com/docker/machine/libmachine/drivers/plugin"
)

func main() {
        plugin.RegisterDriver(new(virtualbox.Driver))
}

Migrating an Existing Driver to a Plugin

This is for the time of writing and is likely to change, but I think through documenting things we can hopefully the most mundane details of this process through a script.

The most important / likely to trip up migrators details are:

  • A lot of paths have changed, usually because the modules have been moved inside of libmachine. Some examples:
    • github.com/docker/machine/log => github.com/docker/machine/libmachine/log
    • github.com/docker/machine/utils => github.com/docker/machine/libmachine/mcnutils
    • github.com/docker/machine/drivers => github.com/docker/machine/libmachine/drivers
  • The StorePath expected by many drivers has been replaced by GlobalArtifactPath and LocalArtifactPath methods (formerly ResolveStorePath), but in the light of day I'm not really sold on this decision of mine, and may well change it back to the way it was before).
  • The usage of cli.Flag in the drivers has changed to a Machine-specific model which is more RPC-friendly. Essentially:
    • GetCreateFlags is now a required method on the Driver itself, not a module-level function, which returns []mcnflag.Flag (a custom type encompassing all types of flags). All instances of cli.Flag and/or concrete types that implement it should be replaced by mcnflag.Flag.
  • It is no longer necessary to have the init block calling Register in the driver module
  • It is conventional to have a NewDriver "factory" method in the driver module, but not actually required.

Under The Hood

The way that plugins work is heavily inspired by Hashicorp's Terraform.

When a Machine command is invoked, the following happens:

  1. Host struct(s) is loaded up from disk, and instead of loading the literal driver struct like in the past, a RpcClientDriver is loaded into the Driver field. This struct fulfills the Driver interface but is agnostic about the provider, it simply proxies methods to the RPC Driver server listening in the plugin binary.
  2. An RPC plugin server (registered using the wrapper method detailed above) is spun up listening on an available TCP port on localhost, and the created RPC client connects to it.
  3. Using the RPC client the configuration for the actual driver (running in the plugin binary) is set remotely.
  4. When calls to the driver are made in the Machine code, they are sent over the wire to get a response from the plugin server.
  5. When the host is saved to disk again, first it is ensured that the RawDriver is set as []byte to ensure that the configuration can be set over the network again on subsequent invocations.

Tooling

We don't have any tooling / default Machine plugin repository yet, but there are several things I'd like to see:

  1. A base / example project at somewhere like https://github.com/docker/docker-machine-example which is easily forkable and has an example driver
  2. A Makefile and some basic automation around compiling your driver and releasing the binaries to Github.
  3. Detailed documentation explaining how to develop a driver and what the expectations are for each driver. For instance, we need to be explicit that ports should be open and available for 22, 2376, 3376 in the case of Swarm master, etc.

Caveat Emptor

Some of the underlying API / mechanics are likely to change, such as libmachine details. Ultimately, however, it is our goal to make migrating existing work that has been done on drivers to the plugin-based model as smooth as possible. To that end, we really want to get people playing with this ASAP, so feel free to reach out, comment, and attempt some implementations.

  • N

@thaJeztah
Copy link

Good to see progress 👍 ; lots of todo's as well I see 😄

@nathanleclaire
Copy link
Owner Author

Added some details.

@nathanleclaire
Copy link
Owner Author

Example plugin here: https://github.com/nathanleclaire/docker-machine-xhyve

@zchee @timfallmk Check it out! I can help you to compile if you need. I have gotten a create to run about halfway through with the driver but looks like the driver has an issue getting the DHCP lease on my computer.

@nathanleclaire
Copy link
Owner Author

You can see what I changed in that example driver in this commit: nathanleclaire/docker-machine-xhyve@dfb0d8b

Some of it as well is just revisions to the driver such as fixing an issue with VBox version checking, updating it to use BaseDriver, etc.

@nathanleclaire
Copy link
Owner Author

I've discovered an issue with leaking goroutines / processes after the main process exits, so that will need to be debugged before this is ready for prime time as well.

- Clear out some cruft tightly coupling libmachine to filestore

- Comment out drivers other than virtualbox for now

- Change way too many things

- Mostly, break out the code to be more modular.

- Destroy all traces of "provider" in its current form.  It will be
brought back as something more sensible, instead of something which
overlaps in function with both Host and Store.

- Fix mis-managed config passthru

- Remove a few instances of state stored in env vars

- This should be explicitly communicated in Go-land, not through the
shell.

- Rename "store" module to "persist"

- This is done mostly to avoid confusion about the fact that a concrete
instance of a "Store" interface is oftentimes referred to as "store" in
the code.

- Rip out repetitive antipattern for getting store

- This replaces the previous repetive idiom for getting the cert info, and
consequently the store, with a much less repetitive idiom.

- Also, some redundant methods in commands.go for accessing hosts have
either been simplified or removed entirely.

- First steps towards fixing up tests

- Test progress continues

- Replace unit tests with integration tests

- MAKE ALL UNIT TESTS PASS YAY

- Add helper test files

- Don't write to disk in libmachine/host

- Heh.. coverage check strikes again

- Fix remove code

- Move cert code around

- Continued progress: simplify Driver

- Fixups and make creation work with new model

- Move drivers module inside of libmachine

- Move ssh module inside of libmachine

- Move state module to libmachine

- Move utils module to libmachine

- Move version module to libmachine

- Move log module to libmachine

- Modify some constructor methods around

- Change Travis build dep structure

- Boring gofmt fix

- Add version module

- Move NewHost to store

- Update some boring cert path infos to make API easier to use

- Fix up some issues around the new model

- Clean up some cert path stuff

- Don't use shady functions to get store path :D

- Continue artifact work

- Fix silly machines dir bug

- Continue fixing silly path issues

- Change up output of vbm a bit

- Continue work to make example go

- Change output a little more

- Last changes needed to make create finish properly

- Fix config.go to use libmachine

- Cut down code duplication and make both methods work with libmachine

- Add pluggable logging implementation

- Return error when machine already in desired state

- Update example to show log method

- Fix file:// bug

- Fix Swarm defaults

- Remove unused TLS settings from Engine and Swarm options

- Remove spurious error

- Correct bug detecting if migration was performed

- Fix compilation errors from tests

- Fix most of remaining test issues

- Fix final silly bug in tests

- Remove extraneous debug code

- Add -race to test command

- Appease the gofmt

- Appease the generate coverage

- Making executive decision to remove Travis coverage check

In the early days I thought this would be a good idea because it would
encourage people to write tests in case they added a new module.  Well,
in fact it has just turned into a giant nuisance and made refactoring
work like this even more difficult.

- Move Get to Load
- Move HostListItem code to CLI

Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
@zchee
Copy link

zchee commented Sep 14, 2015

@nathanleclaire Yes!
This PR is it works until the error: open /var/db/dhcpd_leases error :)

I start with the debug, and the refactoring...

I've discovered an issue with leaking goroutines / processes after the main process exits, so that will need to be debugged before this is ready for prime time as well

This problem caused by libmachine issue?
If true, the time being I do not touch.

I will wait @nathanleclaire reply.
Thanks.

@nlamirault
Copy link

hi
@nathanleclaire any doc to help us to understand how compile it ?
Thanks !

Also, a few various cleanups are bundled:

1. Only call GetDriver() once to get the object in provision/utils.go
2. SSH command wrapper will return the error and let the consumer decide
   what to do with it instead of bailing automatically on non-255

Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
- Clear out some cruft tightly coupling libmachine to filestore

- Comment out drivers other than virtualbox for now

- Change way too many things

- Mostly, break out the code to be more modular.

- Destroy all traces of "provider" in its current form.  It will be
brought back as something more sensible, instead of something which
overlaps in function with both Host and Store.

- Fix mis-managed config passthru

- Remove a few instances of state stored in env vars

- This should be explicitly communicated in Go-land, not through the
shell.

- Rename "store" module to "persist"

- This is done mostly to avoid confusion about the fact that a concrete
instance of a "Store" interface is oftentimes referred to as "store" in
the code.

- Rip out repetitive antipattern for getting store

- This replaces the previous repetive idiom for getting the cert info, and
consequently the store, with a much less repetitive idiom.

- Also, some redundant methods in commands.go for accessing hosts have
either been simplified or removed entirely.

- First steps towards fixing up tests

- Test progress continues

- Replace unit tests with integration tests

- MAKE ALL UNIT TESTS PASS YAY

- Add helper test files

- Don't write to disk in libmachine/host

- Heh.. coverage check strikes again

- Fix remove code

- Move cert code around

- Continued progress: simplify Driver

- Fixups and make creation work with new model

- Move drivers module inside of libmachine

- Move ssh module inside of libmachine

- Move state module to libmachine

- Move utils module to libmachine

- Move version module to libmachine

- Move log module to libmachine

- Modify some constructor methods around

- Change Travis build dep structure

- Boring gofmt fix

- Add version module

- Move NewHost to store

- Update some boring cert path infos to make API easier to use

- Fix up some issues around the new model

- Clean up some cert path stuff

- Don't use shady functions to get store path :D

- Continue artifact work

- Fix silly machines dir bug

- Continue fixing silly path issues

- Change up output of vbm a bit

- Continue work to make example go

- Change output a little more

- Last changes needed to make create finish properly

- Fix config.go to use libmachine

- Cut down code duplication and make both methods work with libmachine

- Add pluggable logging implementation

- Return error when machine already in desired state

- Update example to show log method

- Fix file:// bug

- Fix Swarm defaults

- Remove unused TLS settings from Engine and Swarm options

- Remove spurious error

- Correct bug detecting if migration was performed

- Fix compilation errors from tests

- Fix most of remaining test issues

- Fix final silly bug in tests

- Remove extraneous debug code

- Add -race to test command

- Appease the gofmt

- Appease the generate coverage

- Making executive decision to remove Travis coverage check

In the early days I thought this would be a good idea because it would
encourage people to write tests in case they added a new module.  Well,
in fact it has just turned into a giant nuisance and made refactoring
work like this even more difficult.

- Move Get to Load
- Move HostListItem code to CLI

Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
- First RPC steps

- Work on some flaws in RPC model

- Remove unused TLS settings from Engine and Swarm options

- Add code to correctly encode data over the network

- Add client driver for RPC

- Rename server driver file

- Start to make marshal make sense

- Fix silly RPC method args and add client

- Fix some issues with RPC calls, and marshaling

- Simplify plugin main.go

- Move towards 100% plugin in CLI

- Ensure that plugin servers are cleaned up properly

- Make flag parsing for driver flags work properly
Signed-off-by: Nathan LeClaire <nathan.leclaire@gmail.com>
@nathanleclaire
Copy link
Owner Author

Should note that this also includes docker#1685: I needed it to make a dind plugin I've been prototyping work correctly.

@zchee
Copy link

zchee commented Sep 17, 2015

@nlamirault
For example for usage libmachine-rpc, docker-machine-hypercore.
This repo is not works now, but How to install It is written in README.md.

Check it out.

@nlamirault
Copy link

@zchee thanks

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from 65255f1 to 51f40dd Compare September 17, 2015 22:21
@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 3 times, most recently from 79da42f to 4095f02 Compare September 18, 2015 21:47
@janeczku
Copy link

@zchee any particular reason why you are building the driver binary with GOGC=off?

@zchee
Copy link

zchee commented Sep 20, 2015

@janeczku Hi,

GOGC=off is not that I specified. It @nathanleclaire was specified in the Makefile.
See https://github.com/nathanleclaire/docker-machine-xhyve/blob/master/bin/Makefile.
I do not even know why, but there is something meaning.

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from 6a50dd1 to c84605e Compare September 21, 2015 20:49
@nathanleclaire
Copy link
Owner Author

GOGC=off was set due to having heard that it helps speed up compiles, but I admit I have no provable basis for the claim, so it's not particularly necessary.

@nathanleclaire nathanleclaire force-pushed the git_r_done_libmachine branch 2 times, most recently from fe0ae54 to b5927f1 Compare September 23, 2015 19:31
@nathanleclaire
Copy link
Owner Author

Please move discussion to: docker#1902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants