Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build timeout on package_build.py for Fedora #191

Open
pleia2 opened this issue May 9, 2024 · 8 comments · May be fixed by #200
Open

Build timeout on package_build.py for Fedora #191

pleia2 opened this issue May 9, 2024 · 8 comments · May be fixed by #200
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@pleia2
Copy link
Contributor

pleia2 commented May 9, 2024

While generating the sources for Fedora, I got a build timeout while it was building Fedora 40. Full traceback below (the 404 errors are unrelated and a known issue).

Essentially, it looks like by traversing dozens of directories for three different versions, we're hitting some sort of rate limiting so it hangs until it finally fails without generating Fedora_40_List.json.

We should evaluate the logic we're using in the script, do some research into whether there's a better way of collecting this data (a different source or API?), and perhaps reach out to a contact at Fedora to get their thoughts.

software-discovery-tool/bin $ ./package_build.py fedora
Extracting fedora data ... 
404 Directory 1 not found
404 Directory 5 not found
Saved!
filename: Fedora_38_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_39_List.json
404 Directory 1 not found
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 1017, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 411, in connect
    self.sock = ssl_wrap_socket(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/usr/lib/python3/dist-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib/python3.10/ssl.py", line 513, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib/python3.10/ssl.py", line 1100, in _create
    self.do_handshake()
  File "/usr/lib/python3.10/ssl.py", line 1371, in do_handshake
    self._sslobj.do_handshake()
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 756, in urlopen
    retries = retries.increment(
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 719, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 700, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 386, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 337, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='dl.fedoraproject.org', port=443): Read timed out. (read timeout=None)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/elizabeth/git/software-discovery-tool/bin/./package_build.py", line 372, in <module>
    fedora()
  File "/home/elizabeth/git/software-discovery-tool/bin/./package_build.py", line 139, in fedora
    req = requests.get(link)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 544, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 657, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='dl.fedoraproject.org', port=443): Read timed out. (read timeout=None)
@Paul-Annay
Copy link

Hey @pleia2 hope you're doing well. I tried to reproduce this issue. I only executed the package_build.py file in a separate directory isolated from the actual code base (I was temporarily working on a windows PC, I got curious seeing this issue so hurriedly tried to reproduce it without setting up the entire repo with all the dependencies). I'll admit that the build took a long time (did not track how long but at least 6-7 minutes or more) but apparently it did eventually generate the Fedora_40_List.json file. Here's what I got:

Extracting fedora data ... 
404 Directory 1 not found
404 Directory 5 not found
Saved!
filename: Fedora_38_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_39_List.json
404 Directory 1 not found
404 Directory 5 not found
404 Directory 7 not found
Saved!
filename: Fedora_40_List.json
Thanks for using SDT!

Below is a snapshot of the directory structure I was using (within a virtual environment in python)

image

I'm looking into ways that would make the build work faster but in the meantime I would be glad if you could tell me if there was anything specific you did in order to get this error (I'll run this again on my actual PC where the repository is set up properly to see if it gets regenerated there). Thanks!

@rachejazz
Copy link
Member

I'm guessing the scripts lacks timeout parameters on requests and error handling for the same. Also, we'd need to async it. That will make processing each file faster!

@rachejazz rachejazz added enhancement New feature or request help wanted Extra attention is needed labels May 10, 2024
@glitcher007
Copy link

hey @pleia2
Is this issue still open?

@Paul-Annay
Copy link

@glitcher007 yes it's open I got busy with other things so couldn't keep track of this though I initially looked to make some changes.... You're more than welcome to work on it, or share some insights. May be we could come up with something together.

@pleia2
Copy link
Contributor Author

pleia2 commented May 20, 2024

Hi @glitcher007 Thanks for asking, it is! The big thing about this one is that it doesn't always happen, but when it does it blocks the installation from continuing, so it's important that we work to find a better way to gather this data. It's taking a long time because it's doing a bunch of requests as it traverses these directories, and I suspect that looks like some sort of attack to the Fedora servers, so it stops allowing access. We don't want that either 😄

@glitcher007
Copy link

Hii @pleia2
As I was going through the code I also found the same mistake, I think caching the data would be an option here.
But I can see for any changes that could be made to solve that

@hbarsaiyan
Copy link
Contributor

I was thinking if we can iterate over a mirrorlist to use some other mirror if the connection gets timed out. Something like https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-40&arch=s390x&country=global. I am still figuring out how the mirrorlist can be filtered for other distros. As a last resort, we can manually add some sources in a list.

@hbarsaiyan hbarsaiyan linked a pull request May 21, 2024 that will close this issue
@pleia2
Copy link
Contributor Author

pleia2 commented May 21, 2024

I was thinking if we can iterate over a mirrorlist to use some other mirror if the connection gets timed out. Something like https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-40&arch=s390x&country=global. I am still figuring out how the mirrorlist can be filtered for other distros. As a last resort, we can manually add some sources in a list.

My big question about this approach is whether it's what the Fedora community would prefer. If they find our method of traversing directories to be a problem, just jumping to another mirror could be seen as abusive behavior.

@sharkcz Do you have any thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants