Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all fields test to new framework #239

Merged
merged 12 commits into from
Dec 18, 2023
Merged

Move all fields test to new framework #239

merged 12 commits into from
Dec 18, 2023

Conversation

bhuvana-talend
Copy link
Contributor

Description of change

https://jira.talendforge.org/browse/TDL-24438
Made a new base class to use the new framework.
Made a new all_fields_test class to use the new framework and separate the tests for record count validation and record values validation.

Manual QA steps

Risks

Rollback steps

  • revert this branch

Copy link
Contributor

@HarrisonMarcRose HarrisonMarcRose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't thoroughly reviewed the actual tests. But like the base case, We should evaluate what is already in the inherited case and try to only write code for the differences if possible. Once you get the base case comments cleaned up, lets try to do this together.

Comment on lines 43 to 47
def get_properties(self):
start_date = dt.today() - timedelta(days=1)
start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT)

return {'start_date' : start_date_with_fmt}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new framework we make the start date a property so it can be changed in tests outside of this method.

Suggested change
def get_properties(self):
start_date = dt.today() - timedelta(days=1)
start_date_with_fmt = dt.strftime(start_date, self.START_DATE_FORMAT)
return {'start_date' : start_date_with_fmt}
# set the default start date which can be overridden in the tests.
start_date = BaseCase.timedelta_formatted(dt.utcnow(), delta=timedelta(days=-1))
def get_properties(self):
return {'start_date': self.start_date,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - Changed this method as per the suggestion

"campaigns": {
BaseCase.PRIMARY_KEYS: {"id"},
BaseCase.REPLICATION_METHOD: BaseCase.FULL_TABLE,
HubspotBaseCase.OBEYS_START_DATE: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this property OBEYS_START_DATE although not currently a framework concept would apply to more taps and should be moved into the tap tester framework

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to the tap-tester base case

BaseCase.PRIMARY_KEYS: {"companyId"},
BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
BaseCase.REPLICATION_KEYS: {"property_hs_lastmodifieddate"},
HubspotBaseCase.EXPECTED_PAGE_SIZE: 250,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expected page size is now a concept in the framework as API_LIMIT. We should use this instead.
https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L88

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed EXPECTED_PAGE_SIZE to API_LIMIT to use the variable already defined in tap-tester base case

BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
HubspotBaseCase.EXPECTED_PAGE_SIZE: 100,
HubspotBaseCase.OBEYS_START_DATE: True,
HubspotBaseCase.PARENT_STREAM: 'companies'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how common parent stream is, If this is unique to hubspot then doing it like this is appropriate. If we think there are more cases where this would exist we should also move this concept to the tap-tester framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed it.. It is moved to tap-tester base case. I changed to use BaseCase.PARENT_STREAM

Comment on lines 158 to 165
def expected_primary_keys(self):
"""
return a dictionary with key of table name
and value as a set of primary key fields
"""
return {table: properties.get(self.PRIMARY_KEYS, set())
for table, properties
in self.expected_metadata().items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be overriding base case methods unless there is a good reason to make them different. In this case there is no difference except expanded functionality in the base case. We should remove this method.

This exists in https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L162-L172

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. that's right- a lot of methose methods are already there in tap-tester base case. Didn't realise it. Removed all these methods from here.

Comment on lines 203 to 217
def expected_primary_keys(self):

"""
return a dictionary with key of table name
and value as a set of primary key fields
"""
return {table: properties.get(self.PRIMARY_KEYS, set())
for table, properties
in self.expected_metadata().items()}

def expected_automatic_fields(self):
auto_fields = {}
for k, v in self.expected_metadata().items():
auto_fields[k] = v.get(self.PRIMARY_KEYS, set()) | v.get(self.REPLICATION_KEYS, set())
return auto_fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also true for these methods

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 219 to 221
##########################
# Common Test Actions #
##########################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all of these as well. These are all in tap tester base case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 366 to 379
def datetime_from_timestamp(self, value, str_format="%Y-%m-%dT00:00:00Z"):
"""
Takes in a unix timestamp in milliseconds.
Returns a string formatted python datetime
"""
try:
datetime_value = dt.fromtimestamp(value)
datetime_str = dt.strftime(datetime_value, str_format)
except ValueError as err:
raise NotImplementedError(
f"Invalid argument 'value': {value} "
"This method was designed to accept unix timestamps in milliseconds."
)
return datetime_str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the only tap that uses unix timestamps, but this function looks like it could be helpful to other taps. I would consider if it should be moved to base case in the helper function? https://github.com/stitchdata/tap-tester/blob/ed0886e9a9bd3f1340ad4929a03f2d67ab4ebf2a/tap_tester/base_suite_tests/base_case.py#L474-L522

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to tap-tester base case

Comment on lines 381 to 383
def is_child(self, stream):
"""return true if this stream is a child stream"""
return self.expected_metadata()[stream].get(self.PARENT_STREAM) is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should follow PARENT_STREAM. If that stays here this should, if that goes to base case this should follow it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to tap-tester base case

can_save = True
return ret_records

FIELDS_ADDED_BY_TAP = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be in the class and not at the module level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved all these to the new hubspot base case - base_hubspot.py

Copy link
Contributor

@HarrisonMarcRose HarrisonMarcRose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments that are not explained in detail and would be challenging to implement. Let's get together to discuss them so it is clear what solutions should be implemented and how to go about it.

Comment on lines 185 to 186
def expected_check_streams(self):
return set(self.expected_metadata().keys())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as this method in tap-tester with a different name. It would be best to remove this and rename the calls to it to the method in tap-tester. At the minimum if we want to keep this name for some reason, just can make it a wrapper and call the underlying tap-tester method in it as follows:

Suggested change
def expected_check_streams(self):
return set(self.expected_metadata().keys())
@classmethod
def expected_check_streams(cls):
return cls.expected_stream_names()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used only start_date test. So, I removed this. I think we can change start date test when we move that to the new framework.

BaseCase.REPLICATION_METHOD: BaseCase.INCREMENTAL,
HubspotBaseCase.EXPECTED_PAGE_SIZE: 100,
HubspotBaseCase.OBEYS_START_DATE: True,
HubspotBaseCase.PARENT_STREAM: 'companies'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see PARENT_STREAM defined. Can you show me where this is? Did it move to BaseCase? If so, we should specify that instead of HubspotBaseCase.

Comment on lines 61 to 74
def convert_datatype(self, expected_records):
for stream, records in expected_records.items():
for record in records:

# convert timestamps to string formatted datetime
timestamp_keys = {'timestamp'}
for key in timestamp_keys:
timestamp = record.get(key)
if timestamp:
unformatted = datetime.datetime.fromtimestamp(timestamp/1000)
formatted = datetime.datetime.strftime(unformatted, self.BASIC_DATE_FORMAT)
record[key] = formatted

return expected_records
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling unix timestamps and converting them to a formatted string looks like something that is useful in multiple taps. I would like to work to see if we can make the part of this that actually does the conversion be a utility we can use in tap-tester itself so we can use this logic once and if there are any improvements just update it in one place. Maybe we can pair on this.

LOGGER.info("The test client found %s %s records.", len(records), stream)

self.convert_datatype(self.expected_records)
super().setUp()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the fact that you just have the differences in the setup that are unique to this tap and then call the existing setup so it is easy to see why we are overriding this method.

Comment on lines 170 to 172
@unittest.skip("Random selection doesn't always sync records")
def test_all_streams_sync_records(self):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we skip this test if we have a client to create data. Wouldn't we verify in the sync that at least one record is present and if it wasn't we can create it. For instance if there isn't a contact we can use

def create_contacts(self):

Comment on lines 154 to 168
@unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ")
def test_all_fields_for_streams_are_replicated(self):
for stream in self.test_streams:
with self.subTest(stream=stream):

# gather expectations
expected_all_keys = self.selected_fields.get(stream, set()) - set(self.MISSING_FIELDS.get(stream, {}))

# gather results
fields_replicated = self.actual_fields.get(stream, set())

# verify that all fields are sent to the target
# test the combination of all records
self.assertSetEqual(fields_replicated, expected_all_keys,
logging=f"verify all fields are replicated for stream {stream}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same test that is above. Why skip this if we are running it above. Python also uses the latest definition of a method, so it is possible you are overwriting your main test and not running it.


return expected_records

def test_all_fields_for_streams_are_replicated(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test in tap-tester is written concisely. It already has a MISSING_FIELDS concept. We should update tap tester to have extra concepts for ADDED_FIELDS. I'm not sure if there can be anything but missing and added fields but if there are categories of things that add up to a total of MISSING_FIELDS we can track them separately and then build missing fields for the test.

Copy link
Contributor

@HarrisonMarcRose HarrisonMarcRose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if this is passing (with the tap-tester changes) for all the streams. It looks very streamlined. Love it.

Comment on lines 19 to 24
'deals': {
# BUG_TDL-14993 | https://jira.talendforge.org/browse/TDL-14993
# Has an value of object with key 'value' and value 'Null'
'property_hs_date_entered_1258834',
'property_hs_time_in_example_stage1660743867503491_315775040'
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is only used for the all_fields test. This is duplicative of the bad keys below and isn't necessary as there are just specific examples. We should move the bug info down to the method for removing the bad keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah.. This just got carried over from the old test.
So, we added these extra fields and removed the bad prefixes later.
Not needed here, will remove it.

Comment on lines 78 to 91
for key in self.expected_all_keys:
for bad_prefix in bad_key_prefixes:
if key.startswith(bad_prefix):
bad_keys.add(key)
for key in self.fields_replicated:
for bad_prefix in bad_key_prefixes:
if key.startswith(bad_prefix):
bad_keys.add(key)

for key in bad_keys:
if key in self.expected_all_keys:
self.expected_all_keys.remove(key)
if key in self.fields_replicated:
self.fields_replicated.remove(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments:

  • I'm not sure why we need to build bad keys and then go through them. It appears we could just remove them at the same time.
  • If the key is in both the expected_all_keys and the fields_replicated this would pass. So I am uncertain as to why we would need to remove this from both our expectations and our actual results. Can we log if a key is removed and where it is removed from when we do this. I would expect it to only be in the actual results (fields_replicated) and not necessary in the expectations. If this isn't a good assumption I would like to figure out what I'm not understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first comment, I felt the same and removed it directly instead of adding to bad_keys, but it complained that the set has changed while iterating.
For the second comment - let me print and see what is being removed. It could be the result of adding that extra fields in the above comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the method to include only the bad keys that are not in both the lists.

# Tests To Skip
##########################################################################

@unittest.skip("Skip till all cards of missing fields are fixed. TDL-16145 ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice place to use the tap-tester @skipUntilDone to ensure this gets picked up as soon as TDL-16145 is completed.

@bhuvana-talend bhuvana-talend merged commit e57af17 into master Dec 18, 2023
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants