Skip to content

NeuroMorphoOrg/PaperBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaperBot

PaperBot is a web crawler configurable, modular, open-source web-based solution to automatically find and efficiently annotate peer-reviewed publications based on periodic full-text searches across publisher portals. Without user interactions, PaperBot retrieves and stores article information (full reference, corresponding email contact, and full-text keyword hits) based on pre-set search logic from disparate sources including Wiley, ScienceDirect, Springer/Nature/Frontiers, HighWire, PubMed/PubMedCentral, and GoogleScholar. Although different portals require different search configurations, the common interface of PaperBot unifies the process from the user perspective. Once saved, all information becomes web accessible, allowing efficient triage of articles based on their actual relevance to the project goals and seamless annotation of suitable metadata dimensions.

The user should read and understand the terms of use of the portals that are using a scraper prior to activte this portion of the tool, we are not responsible of any misuse of it:

https://www.google.com/policies/terms/
http://olabout.wiley.com/WileyCDA/Section/id-826542.html

For further information please visit the peer reviewed publication:PaperBot: open-source web-based search and metadata organization of scientific literature

1. DataBase

1.1. Install & launch MongoDB

Follow the instructions: https://docs.mongodb.com/manual/administration/install-community/

1.2. Get an API key for ScienceDiect, SpringerLink and PubMed

The portals ScienceDiect, SpringerLink and PubMed require the user to register and obtain an API to use their APIs. You can register and find the key at https://dev.elsevier.com/user/registration, https://dev.springer.com/signup, and https://www.ncbi.nlm.nih.gov/account/

1.3. Upload the portals configuration to the Portal Database

This is needed if you want to use the automated search (Elsevier/ScienceDirect, Springer, Nature, Wiley, PubMed/PubMed Central, and GoogleScholar). The manual PubMed search does not use the Portal Database.

  • The token can be configured using the web once installed

Using the terminal type the following
mongo
use paperbot-portal
db.portal.remove({})
db.portal.insertMany([
... {
... "name": "PubMed",
... "active": true,
... "db": "pubmed",
... "searchUrlApi": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils",
... "startSearchDate": new Date("2019-01-01T01:00:00+0100")
... },
... {
... "name": "PubMedCentral",
... "active": true,
... "db": "pmc",
... "searchUrlApi": "https://eutils.ncbi.nlm.nih.gov/entrez/eutils",
... "startSearchDate": new Date("2019-01-01T01:00:00+0100")
... },
... {
... "name": "ScienceDirect",
... "searchUrlApi": "https://api.elsevier.com/content/search/sciencedirect?",
... "startSearchDate": new Date("2019-01-01T01:00:00+0100"),
... "active": true
... },
... {
... "name": "Nature",
... "searchUrlApi": "http://api.nature.com/content/opensearch/request?",
... "startSearchDate": new Date("2019-01-01T01:00:00+0100"),
... "active": true
... },
... {
... "name": "Wiley",
... "url": "https://onlinelibrary.wiley.com/action/doSearch",
... "active": false,
... "startSearchDate": new Date("2019-01-01T01:00:00+0100"),
... "base": "http://onlinelibrary.wiley.com"
... },
... {
... "name": "SpringerLink",
... "apiUrl": "http://api.springer.com/metadata/json?",
... "active": true,
... "searchUrlApi": "http://api.springer.com/metadata/json?",
... "startSearchDate": new Date("2019-01-01T01:00:00+0100")
... },
... {
... "name": "GoogleScholar",
... "url": "https://scholar.google.com/scholar?l=es&",
... "base": "https://scholar.google.com",
... "active": false,
... "startSearchDate": new Date("1993-01-01T01:00:00+0100")
... }
... ]
... );
}

If everything works well you should see the following response. Of course the ids will be different:

{
"acknowledged" : true,
"insertedIds" : [
ObjectId("57c709dcf139a309cc559a81"),
ObjectId("57c709dcf139a309cc559a82"),
ObjectId("57c709dcf139a309cc559a83"),
ObjectId("57c709dcf139a309cc559a84"),
ObjectId("57c709dcf139a309cc559a85"),
ObjectId("57ceca1e14896407206e3d82"),
ObjectId("59272282f139a31a3a033501")
]
}

Close mongo console:
exit

2. Boot MicroServices

Microservices run an embedded tomcat using Spring Boot (.jar). All of them are independent and can be launched in any order

Pre-requisites: Maven to compile and build the code. Download: https://maven.apache.org/download.cgi and install: https://maven.apache.org/install.html and Java 8

2.1. Download the jars

Download the code from git from the download button

2.2. Launch

If using Linux or Mac you can launch it typing: ./launch.sh

This will launch the required services with nohup and java -jar. Any error will be traced in the correspondnt log.

NOTE: Although the services can be used on your local machine, they are designed to run in a server. If you run them locally and restart your computer this step needs to be executed again. Same happens in a server. Servers are not rebooted that often, but I highly encourage you to create Unix/Linux services following Spring instructions resumed in https://springjavatricks.blogspot.com/2017/11/installing-spring-boot-services-in.html

3. Fronted

Pre-requisites: Apache web server installed & running: https://httpd.apache.org

3.1. Copy the frontend to apache folder & launch

Apache default directory is:
- MacOS: /Library/WebServer/Documents/
- Linux: /var/www/html
- Windows v2.2 and up (replace 2.2 with the version you had installed): C:\Program Files\Apache Software Foundation\Apache2.2\htdocs
- Windows v2: C:\Program Files\Apache Group\Apache2\htdocs

Replace from the following commands /Library/WebServer/Documents/ with your apache folder in the following commands:

sudo mkdir /Library/WebServer/Documents/PaperBot
sudo cp -r PaperBotWeb/ /Library/WebServer/Documents/PaperBot

In your browser type: http://localhost/PaperBot

3.2. If runing on a server and not your localhost remember to update the ip in the browser

Update PaperBot/communications/articlesCommunicationService.js

var url_literature = 'http://<serverIP>:8443/literature';
var url_metadata = 'http://<serverIP>:8443/metadata';
var url_pubmed = 'http://<serverIP>:8443/pubmed';
...

3.3. Update metadata html to your desired metadata properties

Edit PaperBot/article/metadata.html. Any kind of object is supported since the metadataService receives type Object in java, so you can add Strings, Booleans, and Lists. If you want to use Lists you have to update the frontend controller accordingly.

Lets update a name for a given tag. For example:

<tr>
   <td><strong>Category 1:</strong></td>
   <td><span e-style="width:600px;" editable-text="metadata.category1">{{metadata.category1}}</span></td>
</tr>

Update Category 1 for your desired name, also category1 if you want the name of the DataBase to match (not needed). You can add as many <tr> groups as you want.

The metadataFinished is a nice feature that allows you to remember if you had finished reviewing a paper. If it is set to false, when you navigate to the Positive group of articles a red flag will remind you that there is pending work

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published