-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document search is not working anymore #500
Comments
Do you know if this occurring both staging and production? If not, which environment is it occurring in? |
I pulled this screenshot from MCP. I'm not sure if it's happening in staging as well, since the entire documents page isn't accessible there. |
So it seems that the search functionality is broken for both the Prod:The error message reads: {"detail":"{\"message\":\"Credential should be scoped to a valid region, not 'us-east-1'. \"}"} Which I believe is just telling us that we're not signing the request with the correct region, similarly to this ticket. We are hardcoding the region as aws cloudformation describe-stack-resources --stack-name nasa-apt-api-lambda-prod --query 'StackResources[?ResourceType==`AWS::Elasticsearch::Domain`]'
[
{
"StackName": "nasa-apt-api-lambda-prod",
"StackId": "arn:aws:cloudformation:us-west-2:237694371684:stack/nasa-apt-api-lambda-prod/c2aeefb0-7fb7-11ec-9b7d-02f9de4065db",
"LogicalResourceId": "nasaaptapilambdaprodelasticsearchdomainF69BA438",
"PhysicalResourceId": "apt-api-lambda-prod-elastic",
"ResourceType": "AWS::Elasticsearch::Domain",
"Timestamp": "2022-01-27T22:39:35.796000+00:00",
"ResourceStatus": "CREATE_COMPLETE",
"DriftInformation": {
"StackResourceDriftStatus": "NOT_CHECKED"
}
}
] I strongly that changing hardcoded region from: region = "us-east-1" to region = os.environ["AWS_REGION"] would resolve the error, however I have not deployed the fix to production, because as of this moment, we do not have the ability to re-deploy a stack (if needed) due to the MCP block on Staging:Staging is failing with a very puzzling error; <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n with no traces in the lambda logs and only logging a The |
Potential resolution: Migrate to OpensearchElasticsearch was open source until recently. After the company that makes Elasticsearch announced a more restrictive license on future versions of Elasticsearch, AWS forked Elasticsearch and host it as a service called Opensearch, which they will maintain open sourced going forward. ref The APT stack currently uses the Elasticsearch 7.7 engine (the last open sourced version before new license), so the only updates that will be available going forward are if we migrate to an Opensearch service. Since Opensearch is just a fork of Elasticsearch, AWS says that Opensearch will be backwards compatible with Elasticsearch without requiring updates to client code. Migrating to Opensearch would not explain why the Elasticsearch instance entered and is stuck in a un-reachable state, but it would likely allow us to side step the un-reachable Elasticsearch instance and ensure that we can continue to stay up to date with Opensearch developments. (Note: staying up to day with Opensearch is a small advantage - our usage of the search services is so basic that I doubt any of the updates would be crucial to APT, however, it's still a "nice to have") Consideration:It's possible to upgrade the instance from Elasticsearch ( I don't have a problem with losing the indexed documents in staging and when it comes to the data that we will be importing from the UAH prod stack to the MCP prod stack, none of the crucial ATBDs are published which means that they aren't yet indexed in the UAH prod Elasticsearch instance. We would be free to start with a fresh search instance, in this case OpenSearch, as opposed to Elasticsearch, in the new MCP prod stack. Options to migrate the data (if we do want to migrate it) include:
|
Quick update on the search functionality in the staging instance: The AWS console still shows the Elasticsearch instance in an "in-progress" state, however the instance is now responsive. All of the documents that were indexed before the crash are lost, but I was able to index a new one and search it: curl -X 'POST' \
'https://0mrzuyq2e3.execute-api.us-east-1.amazonaws.com/v2/search' \
-H 'accept: application/json' \
-H 'Authorization: Bearer $AUTH_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"query":{"bool":{"must":[{"multi_match":{"query":"oceans"}}],"filter":[]}},"highlight":{"fields":{"*":{}}}}' result: {
"took": 598,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.54655457,
"hits": [
{
"_index": "atbd",
"_type": "atbd",
"_id": "19_v1",
"_score": 0.54655457,
"_source": {
...
}
....
]
} |
Update: we would like to use option 2.1 outlined in #509 |
Update: after deploying an update to MCP which use the region from the lambda's execution context to sign the Elasticsearch request, the document search still appears to be broken with an error message indicating that the signing credentials are scoped to |
I've followed steps to create a manual snapshot of the current elastic search index. However, the instance appears locked in "processing," and the Opensearch Dashboard is then inaccessible. I've updated the security configurations on the opensearch domain in DevSeed AWS, but the changes are not applying since the instance appears locked. I was able to get through IAM roles and permissions required to create a snapshot of indices, but was not able to create the snapshot because the instance is locked. There is an option to use the AWS dashboard to force through an update of this domain. And I believe in current indices may be lost during that process. I am recommending both of the following:
I'll come back to this ticket after some discussion, and I've prepared the infrastructure code to deploy a new opensearch stack. Here's a TLDR of what remains the same when migrating from ElasticSearch to OpenSearch. Upgrade to OpenSearchThe upgrade is automated and can be done automatically through the AWS management console. However, any changes made through the console would not be automatically reflected in CDK infrastructure code. So in this branch the infrastructure code has been updated. Notes are added on the following affected areas. Here are summaries of what will need to be changed and what can be left alone. New API version
New event format
What's staying the same?
|
Consideration Current solution: |
Unblocked: There are no critical ATBDs/documents in APT in Prod accounts. |
Opensearch upgrade: Ready to deploy pending review. |
Remaining steps:
|
See this branch and PR thread for changes made. Current state: Next steps: |
Upcoming changes to Backend API for Opensearch compatibility
PR forthcoming, in the meantime see
|
@naomatheus @wrynearson @aboydnw @deborahUAH After testing open search on staging, it does work, but there are still a few issues. (1) A search term must be entered. We would like for a user to be able to search for any ATBDs published in a specific year. For example, leave the search term blank and just search for all ATBDs published in 2022. (2) When a search term is included, the year must still be "All." The image below shows no document matching "contributor" when searching using the year as 2022. This particular ATBD was published in 2022. Below is the search results with the search term "contributor" and the year is set to "all." |
Thanks @bwbaker1. I think we want to prioritize having a functional production environment quickly. To do so, would you and @deborahUAH agree to push our current implementation of this ticket from staging to production? |
@wrynearson My opinion is that it is functional enough to go ahead and push to PROD. Then we can get the DEMO ATBD public since that is a big deal. The other parts of this ticket can be fixed afterward. But this is @deborahUAH decision. |
Brad is correct. We need to download the real DEMO ATBD that @bwbaker1 has created in prod to see that it looks ok before we publish it in prod (because at that point it will be visible for good). Publishing in MCP Prod is not a testing space. This is why we needed a staging space on MCP and I thought one was created months ago?? (there was a ticket for it somewhere - please add # if you find it). The order in MCP Prod must be 1) download the DEMO ATBD PDF and check that it looks good (enough), then publish it, then check the ability to see it when searching. |
@wrynearson @deborahUAH I just checked and I no longer see the test ATBD on PROD. So it seems you were successful in deleting that document. |
see my clarification on deletions on ticket #548 |
Hey @wrynearson . Are we able to move this issue to done for now? |
Hey @naomatheus , sorry I just saw this. @deborahUAH is the owner of this repo, so I'm not able to close tickets. She treats |
Gotcha. Noted @wrynearson |
awaiting more than 1 document so this can be tested on Production. |
@deborahUAH @bwbaker1 now that we have Document PDFs working in production, this ticket is unblocked. I've marked it for your review. |
Note: we can't review until we have published documents. |
@wrynearson Search is no longer working on staging. It does not provide error or anything. Screen.Recording.2023-04-10.at.11.32.38.AM.mov |
Thanks for flagging this @bwbaker1 @naomatheus and maybe @thenav56 - could you look into this? I'm getting |
⬆️ Here is a quick screen recording of searching. Screen.Recording.2023-05-01.at.11.48.37.AM.mov |
@bwbaker1 the issue is that public documents that were made public before we implemented #699 are not "indexed" (i.e., not searchable). We made two documents on staging that are searchable now: Search.Index.movThis won't be an issue on production because we don't have any published documents there yet, so once the first document is published, it will be indexed. However, we will go back and re-index the existing published documents on staging to not cause any further confusion. |
@wrynearson This makes sense. I remember this happening awhile back when search was updated/fixed. |
@wrynearson IMO there is no need to wait since the production instance doesn't have any published ATBDs anyway |
@wrynearson Yes, as @sunu said we are good to push this to production. |
Here is the error in the console log when trying to conduct a search:
The text was updated successfully, but these errors were encountered: