Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce global checkpoint listeners #32696

Merged
merged 22 commits into from
Aug 15, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
/*
* Licensed to Elasticsearch under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

package org.elasticsearch.index.shard;

import org.apache.logging.log4j.Logger;
import org.apache.logging.log4j.message.ParameterizedMessage;

import java.io.Closeable;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
import java.util.concurrent.Executor;

import static org.elasticsearch.index.seqno.SequenceNumbers.NO_OPS_PERFORMED;
import static org.elasticsearch.index.seqno.SequenceNumbers.UNASSIGNED_SEQ_NO;

/**
* Represents a collection of global checkpoint listeners. This collection can be added to, and all listeners present at the time of an
* update will be notified together. All listeners will be notified when the shard is closed.
*/
public class GlobalCheckpointListeners implements Closeable {

/**
* A global checkpoint listener consisting of a callback that is notified when the global checkpoint is updated or the shard is closed.
*/
@FunctionalInterface
public interface GlobalCheckpointListener {
/**
* Callback when the global checkpoint is updated or the shard is closed. If the shard is closed, the value of the global checkpoint
* will be set to {@link org.elasticsearch.index.seqno.SequenceNumbers#UNASSIGNED_SEQ_NO} and the exception will be non-null. If the
* global checkpoint is updated, the exception will be null.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I wonder if we should have an onFailure method here for all kind of failures and send the IndexShardClosedException down that route. The down side is of course that people wouldn't be able to pass a method references, but the method won't need to start with if (e != null) etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use a functional interface for enabling the use of lambda expressions.

*
* @param globalCheckpoint the updated global checkpoint
* @param e if non-null, the shard is closed
*/
void accept(long globalCheckpoint, IndexShardClosedException e);
}

// guarded by this
private boolean closed;
private volatile List<GlobalCheckpointListener> listeners;

private final ShardId shardId;
private final Executor executor;
private final Logger logger;

/**
* Construct a global checkpoint listeners collection.
*
* @param shardId the shard ID on which global checkpoint updates can be listened to
* @param executor the executor for listener notifications
* @param logger a shard-level logger
*/
GlobalCheckpointListeners(final ShardId shardId, final Executor executor, final Logger logger) {
this.shardId = Objects.requireNonNull(shardId);
this.executor = Objects.requireNonNull(executor);
this.logger = Objects.requireNonNull(logger);
}

/**
* Add a global checkpoint listener.
*
* @param listener the listener
*/
synchronized void add(final GlobalCheckpointListener listener) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should have parameter that indicates the global checkpoint last sampled by the component trying to register the listener. We can then immediately call the listener if the last global checkpoint this component was notified about (needs to be captured) is higher. It think this would help avoiding race conditions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my POC I was doing this on the API layer (transport layer) where we have a request with the last known global checkpoint indeed but I agree it makes sense to move that to here.

if (closed) {
throw new IllegalStateException("can not listen for global checkpoint changes on a closed shard [" + shardId + "]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we throw an AlreadyClosedException like everywhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that. I think we should be consistent and pass IndexShardClosedException to the listener in that case (like it was registered and then we were closed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 50a9a6c.

}
if (listeners == null) {
listeners = new ArrayList<>();
}
listeners.add(listener);
}

@Override
public void close() throws IOException {
synchronized (this) {
closed = true;
}
notifyListeners(UNASSIGNED_SEQ_NO, new IndexShardClosedException(shardId));
}

/**
* Invoke to notify all registered listeners of an updated global checkpoint.
*
* @param globalCheckpoint the updated global checkpoint
*/
void globalCheckpointUpdated(final long globalCheckpoint) {
assert globalCheckpoint >= NO_OPS_PERFORMED;
notifyListeners(globalCheckpoint, null);
}

private void notifyListeners(final long globalCheckpoint, final IndexShardClosedException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for simplicity, let's move this under the mutex everywhere (and add assert Thread.holdsLock(this) to the beginning of the method). It looks like whenever we call this method, we already go under a mutex for a quick operation before this. In case noone is using the listener functionality, this will therefore amount to the same overhead. In case this infrastructure will be used, listeners should not fly in by the millions in a second, so listeners will mostly be null and there should be no overhead. Let's optimize this in the future if we see any issue. I was scratching my head for a bit to check if concurrency was correct here (and I believe it is), but it's not worth the complexity imho.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I pushed 88dee76.

assert (globalCheckpoint == UNASSIGNED_SEQ_NO && e != null) || (globalCheckpoint >= NO_OPS_PERFORMED && e == null);
if (listeners != null) {
final List<GlobalCheckpointListener> currentListeners;
synchronized (this) {
currentListeners = listeners;
listeners = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks as if the listeners are only notified once and then need to reregister if they want more events? I can see why this kind of behavior makes sense for the refresh listeners, with refresh being an expensive operation that ensures that all events registered before the refresh will now see the changes they're waiting for. With global checkpoints, it's less clear to me, as they can be potentially updated many many times per second, so wouldn't you want to stay registered to receive events. If not, will this lead to a storm of reregister events? I think I need to better understand the integration point here, i.e., how this API will be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch See #32651. The intended usage is in CCR where a remote cluster will, when the remote cluster is fully caught up, (remotely) attach a single-use listener to the local cluster for the next global checkpoint change. When the global checkpoint is updated, the listener will be invoked which will return a response to the remote cluster that will act as letting the remote cluster know that there are now additional changes to be fetched.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, given this behavior, I wonder if we can make the listener interface simpler (same as Boaz's concern). As the listeners are only notified once and then need to reregister if they want more events, I wonder if it's simpler to just signal an UNASSIGNED_SEQ_NO (or the lastKnownGlobalCheckpoint) on a closing and then have the caller fail on a repeated call to the add method (by throwing directly the exception in that method, not relaying it to the listener). This means that the listener can remain a functional interface (but just a LongConsumer). WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ywelsch Personally I do not buy the argument that

        @Override
        protected void asyncShardOperation(
                final Request request, final ShardId shardId, final ActionListener<Response> listener) throws IOException {
            final IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
            final IndexShard indexShard = indexService.getShard(shardId.id());
            indexShard.addGlobalCheckpointListener(
                    request.getGlobalCheckpoint(),
                    (g, e) -> {
                        if (g != UNASSIGNED_SEQ_NO) {
                            listener.onResponse(new Response(g));
                        } else {
                            listener.onFailure(e);
                        }
            });
        }

is less clean than

        @Override
        protected void asyncShardOperation(
                final Request request, final ShardId shardId, final ActionListener<Response> listener) throws IOException {
            final IndexService indexService = indicesService.indexServiceSafe(shardId.getIndex());
            final IndexShard indexShard = indexService.getShard(shardId.id());
            indexShard.addGlobalCheckpointListener(
                    request.getGlobalCheckpoint(),
                    g -> {
                        if (g != UNASSIGNED_SEQ_NO) {
                            listener.onResponse(new Response(g));
                        } else {
                            listener.onFailure(new IndexShardClosedException(shardId));
                        }
            });
        }

We still have to have a check in one form or the other whether or not closing has been signaled to us, so there's always going to be an if check. At least, that is what I read the comment from @bleskes as arguing:

The down side is of course that people wouldn't be able to pass a method references, but the method won't need to start with if (e != null) etc.

In fact, I would argue the approach I have taken is cleaner as it's the shard telling us that we are closed rather than it being signaled indirectly through the value of the global checkpoint passed in the callback. Sure the actual interface is simpler but I prefer the explicit approach. So I think we should either stick with what I have, or have an onClosed callback and lose the ability for the interface to be a functional interface.

Regarding throwing on attempting to register a listener on a closed shard, that was my preferred approach too but @bleskes thought otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There also still the option of adding an onClosed method to the interface with a default NOOP implementation. It will remain a functional interface, and if most tests don't care about the shard closed case, they can treat it like a functional interface. I'll leave the decision to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be a functional interface for the purposes of the production use-case for this that I have in mind (we have to handle the shard closed event). Thanks, I will leave as-is.

}
if (currentListeners != null) {
executor.execute(() -> {
for (final GlobalCheckpointListener listener : currentListeners) {
try {
listener.accept(globalCheckpoint, e);
} catch (final Exception caught) {
if (globalCheckpoint != UNASSIGNED_SEQ_NO) {
logger.warn(
new ParameterizedMessage(
"error notifying global checkpoint listener of updated global checkpoint [{}]",
globalCheckpoint),
caught);
} else {
logger.warn("error notifying global checkpoint listener of closed shard", caught);
}
}
}
});
}
}
}

}
21 changes: 15 additions & 6 deletions server/src/main/java/org/elasticsearch/index/shard/IndexShard.java
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@
import java.util.stream.StreamSupport;

import static org.elasticsearch.index.mapper.SourceToParse.source;
import static org.elasticsearch.index.seqno.SequenceNumbers.NO_OPS_PERFORMED;
import static org.elasticsearch.index.seqno.SequenceNumbers.UNASSIGNED_SEQ_NO;

public class IndexShard extends AbstractIndexShardComponent implements IndicesClusterStateService.Shard {

Expand Down Expand Up @@ -189,6 +191,7 @@ public class IndexShard extends AbstractIndexShardComponent implements IndicesCl

private final SearchOperationListener searchOperationListener;

private final GlobalCheckpointListeners globalCheckpointListeners;
private final ReplicationTracker replicationTracker;

protected volatile ShardRouting shardRouting;
Expand Down Expand Up @@ -298,8 +301,10 @@ public IndexShard(
this.checkIndexOnStartup = indexSettings.getValue(IndexSettings.INDEX_CHECK_ON_STARTUP);
this.translogConfig = new TranslogConfig(shardId, shardPath().resolveTranslog(), indexSettings, bigArrays);
final String aId = shardRouting.allocationId().getId();
this.globalCheckpointListeners = new GlobalCheckpointListeners(shardId, threadPool.executor(ThreadPool.Names.LISTENER), logger);
this.replicationTracker =
new ReplicationTracker(shardId, aId, indexSettings, SequenceNumbers.UNASSIGNED_SEQ_NO, globalCheckpoint -> {});
new ReplicationTracker(shardId, aId, indexSettings, UNASSIGNED_SEQ_NO, globalCheckpointListeners::globalCheckpointUpdated);

// the query cache is a node-level thing, however we want the most popular filters
// to be computed on a per-shard basis
if (IndexModule.INDEX_QUERY_CACHE_EVERYTHING_SETTING.get(settings)) {
Expand Down Expand Up @@ -664,7 +669,7 @@ private IndexShardState changeState(IndexShardState newState, String reason) {
public Engine.IndexResult applyIndexOperationOnPrimary(long version, VersionType versionType, SourceToParse sourceToParse,
long autoGeneratedTimestamp, boolean isRetry) throws IOException {
assert versionType.validateVersionForWrites(version);
return applyIndexOperation(SequenceNumbers.UNASSIGNED_SEQ_NO, operationPrimaryTerm, version, versionType, autoGeneratedTimestamp,
return applyIndexOperation(UNASSIGNED_SEQ_NO, operationPrimaryTerm, version, versionType, autoGeneratedTimestamp,
isRetry, Engine.Operation.Origin.PRIMARY, sourceToParse);
}

Expand Down Expand Up @@ -765,7 +770,7 @@ public Engine.DeleteResult getFailedDeleteResult(Exception e, long version) {
public Engine.DeleteResult applyDeleteOperationOnPrimary(long version, String type, String id, VersionType versionType)
throws IOException {
assert versionType.validateVersionForWrites(version);
return applyDeleteOperation(SequenceNumbers.UNASSIGNED_SEQ_NO, operationPrimaryTerm, version, type, id, versionType,
return applyDeleteOperation(UNASSIGNED_SEQ_NO, operationPrimaryTerm, version, type, id, versionType,
Engine.Operation.Origin.PRIMARY);
}

Expand Down Expand Up @@ -1192,7 +1197,7 @@ public void close(String reason, boolean flushEngine) throws IOException {
} finally {
// playing safe here and close the engine even if the above succeeds - close can be called multiple times
// Also closing refreshListeners to prevent us from accumulating any more listeners
IOUtils.close(engine, refreshListeners);
IOUtils.close(engine, globalCheckpointListeners, refreshListeners);
indexShardOperationPermits.close();
}
}
Expand Down Expand Up @@ -1729,6 +1734,10 @@ public void updateGlobalCheckpointForShard(final String allocationId, final long
replicationTracker.updateGlobalCheckpointForShard(allocationId, globalCheckpoint);
}

public void addGlobalCheckpointListener(final GlobalCheckpointListeners.GlobalCheckpointListener listener) {
this.globalCheckpointListeners.add(listener);
}

/**
* Waits for all operations up to the provided sequence number to complete.
*
Expand Down Expand Up @@ -2273,8 +2282,8 @@ public void acquireReplicaOperationPermit(final long opPrimaryTerm, final long g
updateGlobalCheckpointOnReplica(globalCheckpoint, "primary term transition");
final long currentGlobalCheckpoint = getGlobalCheckpoint();
final long localCheckpoint;
if (currentGlobalCheckpoint == SequenceNumbers.UNASSIGNED_SEQ_NO) {
localCheckpoint = SequenceNumbers.NO_OPS_PERFORMED;
if (currentGlobalCheckpoint == UNASSIGNED_SEQ_NO) {
localCheckpoint = NO_OPS_PERFORMED;
} else {
localCheckpoint = currentGlobalCheckpoint;
}
Expand Down
Loading