Document search is not working anymore #500

aboydnw · 2022-05-09T15:00:20Z

Here is the error in the console log when trying to conduct a search:

leothomas · 2022-05-24T19:49:23Z

Do you know if this occurring both staging and production? If not, which environment is it occurring in?

aboydnw · 2022-05-24T21:45:47Z

I pulled this screenshot from MCP. I'm not sure if it's happening in staging as well, since the entire documents page isn't accessible there.

leothomas · 2022-05-31T19:34:52Z

So it seems that the search functionality is broken for both the staging and prod stacks, but for seemingly different reasons.

Prod:

The error message reads:

{"detail":"{\"message\":\"Credential should be scoped to a valid region, not 'us-east-1'. \"}"}

Which I believe is just telling us that we're not signing the request with the correct region, similarly to this ticket.

We are hardcoding the region as us-east-1 when signing the request, whereas in MCP the Elasticsearch Domain is deployed to us-west-2 (at Shawn's request - they get a discount in that region):

 aws cloudformation describe-stack-resources --stack-name nasa-apt-api-lambda-prod --query 'StackResources[?ResourceType==`AWS::Elasticsearch::Domain`]'
[
    {
        "StackName": "nasa-apt-api-lambda-prod",
        "StackId": "arn:aws:cloudformation:us-west-2:237694371684:stack/nasa-apt-api-lambda-prod/c2aeefb0-7fb7-11ec-9b7d-02f9de4065db",
        "LogicalResourceId": "nasaaptapilambdaprodelasticsearchdomainF69BA438",
        "PhysicalResourceId": "apt-api-lambda-prod-elastic",
        "ResourceType": "AWS::Elasticsearch::Domain",
        "Timestamp": "2022-01-27T22:39:35.796000+00:00",
        "ResourceStatus": "CREATE_COMPLETE",
        "DriftInformation": {
            "StackResourceDriftStatus": "NOT_CHECKED"
        }
    }
]

I strongly that changing hardcoded region from:

region = "us-east-1"

to

region = os.environ["AWS_REGION"]

would resolve the error, however I have not deployed the fix to production, because as of this moment, we do not have the ability to re-deploy a stack (if needed) due to the MCP block on ApiGatewayV2::HTTPApi resource types, and I haven't deployed this fix to staging, because 1) staging is deployed to us-east-1 and 2) staging is failing for a different reason:

Staging:

Staging is failing with a very puzzling error;

<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n</body>\r\n</html>\r\n

with no traces in the lambda logs and only logging a 502 error in the Api logs.

The 502 error seems to indicate an integration error. I will report back once I have more information

leothomas · 2022-06-01T14:24:42Z

Update: the 502 error isn't being generated by the Lambda function or the API Gateway instance, it's generated by the Elasticsearch node itself.

A few strange things to note about the Elasticsearch instance:

Last night:

Last night I couldn't load the instance info in the AWS console
The Elasticsearch instance had no status/allocated memory in the Opensearch Domain overview tab:
Querying the Elasticsearch instance through the command line showed that the instance was "up and running as normal"
I added logging to the Elasticsearch instance through the command line

This morning:

The Elasticsearch instance now loads in the AWS console
The Elasticsearch instance seems to be stuck in some sort of update processing state:

(I'm not sure how this was triggered, perhaps by adding the logs last night?)

leothomas · 2022-06-01T15:49:12Z

Potential resolution: Migrate to Opensearch

Elasticsearch was open source until recently. After the company that makes Elasticsearch announced a more restrictive license on future versions of Elasticsearch, AWS forked Elasticsearch and host it as a service called Opensearch, which they will maintain open sourced going forward. ref

The APT stack currently uses the Elasticsearch 7.7 engine (the last open sourced version before new license), so the only updates that will be available going forward are if we migrate to an Opensearch service. Since Opensearch is just a fork of Elasticsearch, AWS says that Opensearch will be backwards compatible with Elasticsearch without requiring updates to client code.

Migrating to Opensearch would not explain why the Elasticsearch instance entered and is stuck in a un-reachable state, but it would likely allow us to side step the un-reachable Elasticsearch instance and ensure that we can continue to stay up to date with Opensearch developments.

(Note: staying up to day with Opensearch is a small advantage - our usage of the search services is so basic that I doubt any of the updates would be crucial to APT, however, it's still a "nice to have")

Consideration:

It's possible to upgrade the instance from Elasticsearch (v7.7) to Opensearch (v1.2) from the AWS console, but I strongly suspect that behind the scenes, the update will just delete the Elasticsearch domain and create an Opensearch one with the same name, losing all stored/indexed data.

I don't have a problem with losing the indexed documents in staging and when it comes to the data that we will be importing from the UAH prod stack to the MCP prod stack, none of the crucial ATBDs are published which means that they aren't yet indexed in the UAH prod Elasticsearch instance. We would be free to start with a fresh search instance, in this case OpenSearch, as opposed to Elasticsearch, in the new MCP prod stack.

Options to migrate the data (if we do want to migrate it) include:

The AWS-recommended procedure: take a snapshot of the instance, upload the snapshot to an S3 bucket, grant permissions to the new Opensearch instance to access the S3 bucket, load the new Opensearch instance from the snapshot
Build re-indexing logic into the APT API that re-indexes all of the published ATBDs. This used to exist in the APT API, and I think is a good idea as a fail safe in case the background task that is in charge of indexing the ATBD fails.
- Note: ATBD indexing happens whenever a document is published or whenever the minor version is bumped, however, each minor version bump gets indexed separately (eg: v1.1, v1.2, v1.3, v2.0, v2.1). This won't be possible to maintain if we implement a re-index functionality, since the indexing process pulls directly from the database, and the database does not store every minor version. However I would argue that we can do away with that requirement (if it ever even was a requirement to begin with) since the purpose of the search functionality is to find documents by keywords. I don't think it makes much sense to index terms from a document that is no longer editable/viewable in the APT web-app (it could still be downloaded as a PDF, however)

leothomas · 2022-06-06T18:54:00Z

Quick update on the search functionality in the staging instance: The AWS console still shows the Elasticsearch instance in an "in-progress" state, however the instance is now responsive. All of the documents that were indexed before the crash are lost, but I was able to index a new one and search it:
eg: querying keyword "ocean"

curl -X 'POST' \
  'https://0mrzuyq2e3.execute-api.us-east-1.amazonaws.com/v2/search' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer $AUTH_TOKEN' \
  -H 'Content-Type: application/json' \
  -d '{"query":{"bool":{"must":[{"multi_match":{"query":"oceans"}}],"filter":[]}},"highlight":{"fields":{"*":{}}}}'

result:

{
  "took": 598,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.54655457,
    "hits": [
      {
        "_index": "atbd",
        "_type": "atbd",
        "_id": "19_v1",
        "_score": 0.54655457,
        "_source": {
           ...
        }
      ....
      ]
    }

aboydnw · 2022-06-13T17:48:41Z

Update: we would like to use option 2.1 outlined in #509

leothomas · 2022-06-27T20:56:09Z

Update: after deploying an update to MCP which use the region from the lambda's execution context to sign the Elasticsearch request, the document search still appears to be broken with an error message indicating that the signing credentials are scoped to us-east-1 (when the ES instance is deployed to us-east-1).

naomatheus · 2022-07-21T13:01:04Z

Proposed Solution 1

naomatheus · 2022-08-04T14:37:41Z

I've followed steps to create a manual snapshot of the current elastic search index. However, the instance appears locked in "processing," and the Opensearch Dashboard is then inaccessible. I've updated the security configurations on the opensearch domain in DevSeed AWS, but the changes are not applying since the instance appears locked.

I was able to get through IAM roles and permissions required to create a snapshot of indices, but was not able to create the snapshot because the instance is locked.

There is an option to use the AWS dashboard to force through an update of this domain. And I believe in current indices may be lost during that process.

I am recommending both of the following:

Processing the update from ElasticSearch to OpenSearch through the AWS console (risk losing indexes)
Deploy a new opensearch domain.
Configure the new opensearch domain to automate index snapshots and back them up to S3.

I'll come back to this ticket after some discussion, and I've prepared the infrastructure code to deploy a new opensearch stack.

Here's a TLDR of what remains the same when migrating from ElasticSearch to OpenSearch.

Upgrade to OpenSearch

The upgrade is automated and can be done automatically through the AWS management console. However, any changes made through the console would not be automatically reflected in CDK infrastructure code. So in this branch the infrastructure code has been updated. Notes are added on the following affected areas. Here are summaries of what will need to be changed and what can be left alone.

New API version

OpenSearch continues to support both API versiions from ElasticSearch and OpenSearch.
- Compatibility can be referenced here
  Renamed instance types
No action needed
Access policy changes
IAM Policies
- No actions needed. No IAM policies applied with CDK to ElasticSearch ->> OpenSearch resources.
SCP Policies
- The SCP policy applied to OpenSearch and many other services in nasa-impact allows all actions. No changes are required here.
  New resource types
CDK Modifications
- https://docs.aws.amazon.com/cdk/api/v1/docs/aws-elasticsearch-readme.html#migrating-to-opensearch
- CDK infrastructure code in stack.app has been updated in this branch
  Kibana renamed to OpenSearch Dashboards
  Renamed CloudWatch metrics
No CW metrics created in CDK.
Billing and Cost Management console changes
- https://docs.aws.amazon.com/opensearch-service/latest/developerguide/rename.html#rename-billing

New event format

No existing EventBridge events that need reformatting.

What's staying the same?

The following features and functionality, among others not listed, will remain the same:
- Service principal (es.amazonaws.com)
- Vendor code
- Domain ARNs
- Domain endpoints

naomatheus · 2022-08-08T15:36:22Z

Consideration
We don't have a way to handle permissions in opensearch, elasticsearch.
in APT, we do not have a way to limit the index search capability in line with user document access permissions.
Adding permissions to the opensearch/elasticsearch layer.

Current solution:
Documents only get indexed when documents are published i.e. they're available to everyone.

naomatheus · 2022-08-09T11:44:06Z

Unblocked: There are no critical ATBDs/documents in APT in Prod accounts.
So there will be no need for migration steps to include creating a snapshot of current indices.
Those in development will also be lost when upgrading from ElasticSearch to OpenSearch.

naomatheus · 2022-08-09T15:15:56Z

Opensearch upgrade:
develop...open-search-upgrade-1

Ready to deploy pending review.

naomatheus · 2022-08-12T13:44:29Z

Remaining steps:

Change deployed open search endpoint from https://0mrzuyq2e3.execute-api.us-east-1.amazonaws.com/v2/search to https://2t2dh1w620.execute-api.us-east-1.amazonaws.com/search.
~~The new OpenSearch indices will be blank, and test ATBDs should be created in the stage environment.~~
~~Search for contents of the new, indexed ATBDs~~

naomatheus · 2022-08-23T14:57:51Z

See this branch and PR thread for changes made.
Error state: ATBD documents displayed no documents

Current state:
ATBD documents now displays documents

Next steps:
Change deployed open search endpoint from https://0mrzuyq2e3.execute-api.us-east-1.amazonaws.com/v2/search to https://2t2dh1w620.execute-api.us-east-1.amazonaws.com/search

naomatheus · 2022-09-14T22:41:24Z

Upcoming changes to Backend API for Opensearch compatibility

Replace requests_aws4auth with opensearch-py/opensearchpy
- requests_aws4auth does not currently support AWS Opensearch service and is maintained outside of the Opensearch organization
- opensearch-py is supported and maintained by Opensearch.org, is opensource licensed, and is compatible with Opensearch v1.0.1, our current version of Opensearch
- Affected modules are elasticsearch.py app/api/v2/elasticsearch at and elasticsearch.py at app/search/elasticsearch.py

PR forthcoming, in the meantime see app/api/v2/elasticsearch.aws_auth. AWS auth function returns an object used to authorize post requests to Elasticsearch domain. In the update, the client must be used directly with queries. (reason: opensearch-py is low level client for now)

Setup.py is deprecated (?)
- This should be replaced with a shell script that uses pip for package management
- setup.py install is deprecated

bwbaker1 · 2022-10-20T18:14:27Z

@naomatheus @wrynearson @aboydnw @deborahUAH After testing open search on staging, it does work, but there are still a few issues.

(1) A search term must be entered. We would like for a user to be able to search for any ATBDs published in a specific year. For example, leave the search term blank and just search for all ATBDs published in 2022.

(2) When a search term is included, the year must still be "All." The image below shows no document matching "contributor" when searching using the year as 2022. This particular ATBD was published in 2022.

Below is the search results with the search term "contributor" and the year is set to "all."

wrynearson · 2022-10-25T15:32:09Z

Thanks @bwbaker1. I think we want to prioritize having a functional production environment quickly. To do so, would you and @deborahUAH agree to push our current implementation of this ticket from staging to production?

bwbaker1 · 2022-10-25T15:36:56Z

@wrynearson My opinion is that it is functional enough to go ahead and push to PROD. Then we can get the DEMO ATBD public since that is a big deal. The other parts of this ticket can be fixed afterward. But this is @deborahUAH decision.

deborahUAH · 2022-11-02T16:14:06Z

Brad is correct. We need to download the real DEMO ATBD that @bwbaker1 has created in prod to see that it looks ok before we publish it in prod (because at that point it will be visible for good). Publishing in MCP Prod is not a testing space. This is why we needed a staging space on MCP and I thought one was created months ago?? (there was a ticket for it somewhere - please add # if you find it).

The order in MCP Prod must be 1) download the DEMO ATBD PDF and check that it looks good (enough), then publish it, then check the ability to see it when searching.
@wrynearson @naomatheus

bwbaker1 · 2022-11-02T16:16:53Z

@wrynearson @deborahUAH I just checked and I no longer see the test ATBD on PROD. So it seems you were successful in deleting that document.

deborahUAH · 2022-11-02T16:25:31Z

see my clarification on deletions on ticket #548

naomatheus · 2022-11-07T23:30:16Z

Hey @wrynearson . Are we able to move this issue to done for now?
I'd suggest splitting off another issue if there's more to do here for this specific issue.

wrynearson · 2022-11-10T13:25:55Z

Hey @naomatheus , sorry I just saw this.

@deborahUAH is the owner of this repo, so I'm not able to close tickets. She treats done as it's deploy to prod and tested. Once their team writes a demo ATBD, downloads it while it's in draft, then publishes it and makes sure it's indexed, then they will close it.

naomatheus · 2022-11-14T21:23:53Z

Gotcha. Noted @wrynearson

deborahUAH · 2023-01-05T20:45:46Z

awaiting more than 1 document so this can be tested on Production.

wrynearson · 2023-04-04T08:25:24Z

@deborahUAH @bwbaker1 now that we have Document PDFs working in production, this ticket is unblocked. I've marked it for your review.

bwbaker1 · 2023-04-04T18:09:43Z

Note: we can't review until we have published documents.

bwbaker1 · 2023-04-10T16:34:24Z

@wrynearson Search is no longer working on staging. It does not provide error or anything.

Screen.Recording.2023-04-10.at.11.32.38.AM.mov

wrynearson · 2023-04-11T11:42:03Z

Thanks for flagging this @bwbaker1

@naomatheus and maybe @thenav56 - could you look into this? I'm getting 500 errors in the console, so this might be an AWS issue.

wrynearson · 2023-04-24T15:24:44Z

@sunu @thenav56, do we need to make any further updates on production to make sure that this ticket can be close (e.g. #706)?

wrynearson · 2023-05-01T15:12:12Z

@thenav56 @sunu @batpad currently, no results are appearing on staging for document search. There is no error, but results that we think should appear aren't.

cc @bwbaker1

bwbaker1 · 2023-05-01T16:50:15Z

⬆️ Here is a quick screen recording of searching.

Screen.Recording.2023-05-01.at.11.48.37.AM.mov

wrynearson · 2023-05-02T07:06:51Z

@bwbaker1 the issue is that public documents that were made public before we implemented #699 are not "indexed" (i.e., not searchable).

We made two documents on staging that are searchable now:

Search.Index.mov

This won't be an issue on production because we don't have any published documents there yet, so once the first document is published, it will be indexed. However, we will go back and re-index the existing published documents on staging to not cause any further confusion.

cc @thenav56 @sunu

bwbaker1 · 2023-05-02T12:21:28Z

@wrynearson This makes sense. I remember this happening awhile back when search was updated/fixed.
@thenav56 @sunu @batpad Thanks for getting this solved so quickly!

wrynearson · 2023-05-03T12:18:45Z

@thenav56 should we wait to push this to production until #718 is ready? cc @sunu @batpad

sunu · 2023-05-04T03:20:06Z

@wrynearson IMO there is no need to wait since the production instance doesn't have any published ATBDs anyway

thenav56 · 2023-05-04T04:12:11Z

@wrynearson Yes, as @sunu said we are good to push this to production.
#718 can wait as it will add a feature to fix ES issues if there are any in the future.

aboydnw mentioned this issue Jul 14, 2022

S1: Further develop APT and provide workflow unit and integration testing #524

Closed

16 tasks

aboydnw added this to the PI 22.4 APT Milestone milestone Jul 18, 2022

naomatheus self-assigned this Jul 21, 2022

aboydnw added the development A task for the DS development team on APT label Jul 21, 2022

naomatheus assigned TimMcCauley and unassigned TimMcCauley Jul 25, 2022

aboydnw mentioned this issue Aug 8, 2022

Bug: can't download PDF of demo ATBD on live APT #545

Closed

This was referenced Sep 8, 2022

copy/create DEMO ATBD #552

Closed

APT Weekly Check-in #556

Open

deborahUAH modified the milestones: PI 22.4 APT Milestone, PI23.1 APT Milestone Oct 12, 2022

bwbaker1 modified the milestones: PI23.1 APT Milestone, PI23.2 APT Milestones Jan 9, 2023

wrynearson assigned bwbaker1 and deborahUAH and unassigned naomatheus Apr 4, 2023

thenav56 mentioned this issue Apr 18, 2023

Create index atbd if not exists on opensearch client setup #699

Merged

bwbaker1 modified the milestones: PI23.2 APT Milestones, PI23.3 APT Milestones Apr 18, 2023

thenav56 mentioned this issue May 2, 2023

Re-index existing published documents #718

Closed

2 tasks

wrynearson closed this as completed May 15, 2023

bwbaker1 mentioned this issue Jun 30, 2023

FY23.3 Objective 2: Complete existing APT improvements and bug fixes #686

Closed

29 tasks

Document search is not working anymore #500

Document search is not working anymore #500

Comments

aboydnw commented May 9, 2022 • edited Loading

leothomas commented May 24, 2022

aboydnw commented May 24, 2022

leothomas commented May 31, 2022

Prod:

Staging:

leothomas commented Jun 1, 2022

Last night:

This morning:

leothomas commented Jun 1, 2022 • edited Loading

Potential resolution: Migrate to Opensearch

Consideration:

leothomas commented Jun 6, 2022

aboydnw commented Jun 13, 2022

leothomas commented Jun 27, 2022

naomatheus commented Jul 21, 2022

naomatheus commented Aug 4, 2022 • edited Loading

Upgrade to OpenSearch

naomatheus commented Aug 8, 2022

naomatheus commented Aug 9, 2022

naomatheus commented Aug 9, 2022

naomatheus commented Aug 12, 2022 • edited Loading

naomatheus commented Aug 23, 2022

naomatheus commented Sep 14, 2022

bwbaker1 commented Oct 20, 2022

wrynearson commented Oct 25, 2022

bwbaker1 commented Oct 25, 2022

deborahUAH commented Nov 2, 2022 • edited Loading

bwbaker1 commented Nov 2, 2022

deborahUAH commented Nov 2, 2022

naomatheus commented Nov 7, 2022

wrynearson commented Nov 10, 2022

naomatheus commented Nov 14, 2022

deborahUAH commented Jan 5, 2023

wrynearson commented Apr 4, 2023

bwbaker1 commented Apr 4, 2023

bwbaker1 commented Apr 10, 2023

wrynearson commented Apr 11, 2023 • edited Loading

wrynearson commented Apr 24, 2023

wrynearson commented May 1, 2023

bwbaker1 commented May 1, 2023

wrynearson commented May 2, 2023 • edited Loading

bwbaker1 commented May 2, 2023

wrynearson commented May 3, 2023

sunu commented May 4, 2023

thenav56 commented May 4, 2023

aboydnw commented May 9, 2022 •

edited

Loading

leothomas commented Jun 1, 2022 •

edited

Loading

naomatheus commented Aug 4, 2022 •

edited

Loading

naomatheus commented Aug 12, 2022 •

edited

Loading

deborahUAH commented Nov 2, 2022 •

edited

Loading

wrynearson commented Apr 11, 2023 •

edited

Loading

wrynearson commented May 2, 2023 •

edited

Loading