Skip to content

Latest commit

 

History

History
165 lines (101 loc) · 6.33 KB

scrapy_evaluation.md

File metadata and controls

165 lines (101 loc) · 6.33 KB

Project Name: Scrapy

Evaluating Person or Team: Sylvia | sylviaji


Project Data

  1. Project description:
    Scrapy is a fast high-level scraping and web crawling framework for Python. It is used by people and companies that want to scrape data from the web.

  2. Project website/homepage: Scrapy | A Fast and Powerful Scraping and Web Crawling Framework

  3. Project repository: GitHub - scrapy/scrapy: Scrapy, a fast high-level web crawling & scraping framework for Python.

License

  1. What is the project's license?
    BSD License.

Code Base

  1. What is the primary programming language in the project?
    Python.

  2. What is the development environment?
    Linux, Windows, macOS, BSD. It is recommended to install Scrapy inside a virtual environment on all platforms.

  3. Are there instructions for how to download, build, and install? How easy is it to find them? Do they seem easy (relatively speaking) to follow?
    The quick way of installation can be easily found in README.rst followed by a link to the detailed installation guide.The instructions are easy to follow.

  4. Does the project depend on external additional software modules such as database, graphics, web development, or other libraries? If so, are there clear instructions on how to install those?
    The project depends on a few key Python packages, including lxml, parsel, w3lib, twisted, cryptography and pyOpenSSL. Some of these packages depends on non-Python packages that might require additional installation steps depending on the platform. Instructions on how to install the packages can be found in the documentation.

  5. Is the code easy to understand? Browse some source code files and make a judgment based on your random sample.
    The code seems easy to understand.

  6. Is this a big project? If you can, find out about how many lines of code are in it, perhaps on OpenHub.
    The project has 28,284 lines of code.

  7. Does the repository have tests? If so, are the code contributors expected to write tests for newly added code?
    Yeah, the repository has tests. Code contributors are expected to write tests for new features and bug fixes.

Code and Design Documentation

  1. Is there clear documentation in the code itself?
    It depends. Some files are very well documented, while some are barely documented.

  2. Is there documentation about the design?
    No.

Activity Level

  1. How many commits have been made in the past week?
    8.

  2. When was the most recent commit?
    2 days ago.

  3. How many issues are currently open?
    428.

  4. How long do issues stay open?
    Issues stay open for an average of 1.6 days based on the five most recently closed issues.

  5. Read the conversations from some open and some closed issues. Is there active discussion on the issues?
    Yeah, there is usually active discussion on the issues.

  6. Are issues tagged as easy, hard, for beginners, etc.?
    Some issues are tagged as good first issues.

  7. How many issues were closed in the past six months?
    53.

  8. Is there information about how many people are maintaining the project?
    According to the website, the project is maintained by Scrapinghub and many other contributors.

  9. How many contributors has the project had in the past six months?
    17.

  10. How many open pull requests are there?
    289.

  11. Do pull requests remain un-answered for a long time?
    Pull requests stay open for an average of 3.5 days based on the fivt most recently closed pull requests.

  12. Read the conversations from some open and some closed pull requests. Is there active discussion on the pull requests?
    There is usually active discussion on the pull requests, but sometime the only reply is the Codedev report.

  13. How many pull requests were opened within the past six months?
    239.

  14. When was the last pull request merged?
    2 days ago.

Welcomeness and Community

  1. Is there a CONTRIBUTING document? If so, how easy to read and understand is it? Look through it and see if it is clear and thorough.
    Yeah, there is a CONTRIBUTING.md which contains the link to the guidelines of contributing. The guidelines are clear and easy to understand.

  2. Is there a CODE OF CONDUCT document? Does it have consequences for acts that violate it?
    Yeah, there is a CODE_OF_CONDUCT.md. It has consequences for acts that violate it.

  3. Do the maintainers respond helpfully to questions in issues? Are responses generally constructive? Read the issue conversations.
    Yeah, the maintainers respond helpfully to questions in issues. However, questions that the maintainers believe should go to Stack Overflow will not be answered.

  4. Are people friendly in the issues, discussion forum, and chat?
    Yeah, people seem to be friendly.

  5. Do maintainers thank people for their contributions?
    Sometimes.

Development Environment Installation

Install the development environment for the project on your system. Describe the process that you needed to follow:

  1. How involved was the process?
    I followed the installation guideline and installed Scrapy using conda.

  2. How long it take you?
    Five minutes.

  3. Did you need to install additional packages or libraries?
    No.

  4. Were you able to build the code following the instructions?
    Yes.

  5. Did you need to look for additional help in installing the environment?
    No.

  6. Any other comments?
    N/A.

Summary

  1. Do you think this is a project to which it would be possible to contribute in the course of a few weeks before the end of this semester?
    Yes. The project is easy to install, and the majority of the code seems well-documented and easy to follow. The documentation of the project is also very detailed, which will make it easier to understand the project.

  2. Would you be interested in contributing to this particular project?
    I am quite interested in web crawling, which is why I picked Scrapy as the project to evaluate. The project is written in Python, a language I am pretty comfortable with, which is a plus. One barrier will be that in order to contribute to the project, network related knowledge will be essential, however I don't have much understanding about the topic. But overall, I would be interested in contributing to the project.