Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kerberos support #133

Merged
merged 5 commits into from
Aug 1, 2018
Merged

Kerberos support #133

merged 5 commits into from
Aug 1, 2018

Conversation

colinmarc
Copy link
Owner

This PR contains basic kerberos support, based on the hard work by @Shastick and @staticmukesh in #99.

I'd love feedback from people who actually use kerberos, especially on the API. The command line client uses the MIT kerberos defaults (and env variables) for krb5.conf and the credential cache; I have no idea if that's idiomatic. It also doesn't support a keytab file. Please speak up if you have an opinion on how this should work!

@Shastick
Copy link
Contributor

Thanks for pulling this through the last mile!

About your keytab remark: the gokrb5 library does actually support keytabs, and we do use the hdfs client both with keytabs and credential caches from our fork.

I'm not sure exactly about how much the PR branch and our fork have diverged, but for using kerberos it's just about calling a different factory method with the path to the keytab.

@colinmarc
Copy link
Owner Author

@Shastick can you elaborate on how you use both? You're right that keytab support was in your original branch; I elided it purely to avoid interface complexity, and because hadoop fs doesn't seem to support it. But if there's a good argument for leaving it in I'd like to hear it.

@colinmarc colinmarc force-pushed the kerberos branch 25 times, most recently from c2d15ee to 8a239d3 Compare July 20, 2018 23:15
@colinmarc
Copy link
Owner Author

setups where kinit generates a credential-cache in non-standard/weird/random paths (this happened more than once) and where we cannot change this

Does hadoop fs just not work in that case? Does gohdfs respecting KRB5CCNAME help?

On another subject: I agree that it's a good target to mimic hadoop fs's behavior, But I'd like to add:

Given that the go client is much lighter, we've seen people install it in more places, often where no hadoop configuration is present at all (ie, no HADOOP_HOME is to be found on the system): in such cases we are not in the old "mold" anyway, and it's nice to quickly be able to setup a system where hdfs ls is available without further steps.

I totally get what you're saying, and I think that's a great aspiration. But if the choice is between that or converging on having what's effectively a config file worth of ENV variables (or an actual toml/whatever config file!), I think it's more convenient to just copy over hdfs-site.xml and core-site.xml and set HADOOP_CONF_DIR. I'd like to get to the point (we're maybe 60% there) where the binary can just read all the necessary fields from those xml files and be configured correctly without any extra work.

@colinmarc
Copy link
Owner Author

In other news - tests are green with none skipped! 🎆 🌭 🥇 🐎

I'm going to wait a few days to get some positive feedback from folks actually using kerberos in production (and settle the keytab interface question) before I merge this.

@@ -4,6 +4,7 @@ go_import_path: github.com/colinmarc/hdfs
go: 1.x
env:
- PLATFORM=cdh5
- PLATFORM=cdh5 KERBEROS=true
- PLATFORM=hdp2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not testing KERBEROS=true for hdp2 ?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's redundant, and because I didn't factor out the krb setup stuff (although I could probably). I don't actually think it's really necessary to have hdp2 in there right now, but I do want to have hdp3 shortly (I think I need xenial travis support for that?) and cdh6 eventually.

client.go Outdated
}

if conf["dfs.namenode.kerberos.principal"] != "" {
options.KerberosServiceName = strings.Split(conf["dfs.namenode.kerberos.principal"], "/")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dfs.namenode.kerberos.principal can be hdfs@EXAMPLE.COM. In that case, options.KerberosServiceName will be hdfs@EXAMPLE.COM

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Do you happen to know what's actually required by the KDC there? Maybe the best option is just to reuse the hadoop _HOST thing, even though it's gross.

Shastick and others added 5 commits August 1, 2018 14:41
This adds mutual Kerberos authentication to connections with the
namenode. A kerberos client must be constructed manually and passed
to hdfs.NewClient to enable support.
Before, we were making an extra getBlockLocations request to get the
block location to write to. This worked fine except under kerberos,
where the append call had the correct access token for the block. The
fix also saves us a round trip.
@mxk1235
Copy link

mxk1235 commented Aug 2, 2018

awesome! when do you expect to cut a release with it?

@xxh2000
Copy link

xxh2000 commented Aug 3, 2018

when I test kerberos in our hdfs cluster of kerberos ,it is failed

the error msg is no available namenodes: SASL handshake: wrong Token ID. Expected 0504, was 6030

follow is the error location:

image

my test code :

const(
	krb5ConfPath = "D:\\company\\datasocket2\\keberos\\krb5.conf"
	cacheFilePath = "D:\\company\\datasocket2\\keberos\\krb5cc_0"
)
func main() {
	conf := LoadHadoopConf("")
	options, _ := ClientOptionsFromConf(conf)
	options.Addresses = []string{"gs-server-1016:8020"}
	if options.Addresses == nil {
		log.Fatal("No hadoop configuration found at HADOOP_CONF_DIR")
	}

	if options.KerberosClient == nil {
		options.KerberosClient = getKerberosClient()
	}
    options.KerberosServicePrincipleName = "hdfs/gs-server-1016"
	client, err := NewClient(options)
	if err != nil {
		log.Fatal(err)
	}
	st, err := client.Stat("/tmp")
	if err != nil {
		log.Fatal(err)
	}
	fmt.Println("stat ",st.Name())
}
func getKerberosClient() *krb.Client {
	cfg, err := config.Load(krb5ConfPath)
	ccache, err := credentials.LoadCCache(cacheFilePath)
	if err != nil {
		log.Fatal(err)
	}
	client, err := krb.NewClientFromCCache(ccache)
	if err != nil {
		log.Fatal("Couldn't initialize krb client:", err)
	}
	return client.WithConfig(cfg)
}

@Shastick
Copy link
Contributor

Shastick commented Aug 3, 2018

Looking at https://tools.ietf.org/html/rfc4121#section-4.4 it seems that 0x6030 is an invalid token identifier, so it looks like we are not dealing with a token here (or at least not a cleartext one?).

Is your server configured to rely on SASL/Kerberos for authentication only? If it expects anything else from Kerberos (e.g, encryption of tokens/payloads or something else), the client does not support this.

I have no other idea so far.

@colinmarc colinmarc deleted the kerberos branch August 5, 2018 21:45
@colinmarc
Copy link
Owner Author

colinmarc commented Aug 5, 2018

@mxk1235 The new release is published now.

@AnirudhVyas
Copy link

I took the binary and set HADOOP_CONF_DIR and HADOOP_HOME on one of our boxes which we use to do hadoop fs -ls / tried with go-hdfs, my flow is -
echo "password" | kinit hdfs ls /
I also set HADOOP_NAMENODE not sure what I am doing wrong but I get this error -

Couldn't connect to namenode: no available namenodes: SASL handshake: [Root cause: Encrypting_Error] KRBMessage_Handling_Error: TGS Exchange Error: failed to generate a new TGS_REQ < Encrypting_Error: error getting etype to encrypt authenticator: unknown or unsupported EType: 1

In fact I do not even need to give hdfs ls root dir - i just type hdfs ls and it fails ... any suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants