Welcome to docudigger 👋

Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)

🏠 Homepage

Prerequisites

npm >=9.1.2
node >=18.12.1

Configuration

All settings can be changed via CLI, env variable (even when using docker).

Setting	Description	Default value
AMAZON_USERNAME	Your Amazon username	`null`
AMAZON_PASSWORD	Your amazon password	`null`
AMAZON_TLD	Amazon top level domain	`de`
AMAZON_YEAR_FILTER	Only extracts invoices from this year (i.e. 2023)	`2023`
AMAZON_PAGE_FILTER	Only extracts invoices from this page (i.e. 2)	`null`
ONLY_NEW	Tracks already scraped documents and starts a new run at the last scraped one	`true`
FILE_DESTINATION_FOLDER	Destination path for all scraped documents	`./documents/`
FILE_FALLBACK_EXTENSION	Fallback extension when no extension can be determined	`.pdf`
DEBUG	Debug flag (sets the log level to DEBUG)	`false`
SUBFOLDER_FOR_PAGES	Creates sub folders for every scraped page/plugin	`false`
LOG_PATH	Sets the log path	`./logs/`
LOG_LEVEL	Log level (see https://github.com/winstonjs/winston#logging-levels)	`info`
RECURRING	Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default `true`when using docker container	`false`
RECURRING_PATTERN	Cron pattern to execute periodically. Needs RECURRING to true	`/30 * * *`
TZ	Timezone used for docker environments	`Europe/Berlin`

Install

⚠️ Attention: There is no need to install this locally. Just use npx

Usage

🔨 Make sure you have an .env file present (with the variables from above) in the work directory or use the appropriate cli arguments.

🚑 If you want to use an .env file, make sure you use env-cmd (https://www.npmjs.com/package/env-cmd)

$ npx docudigger COMMAND
running command...

$ npx docudigger (--version)
@disane-dev/docudigger/2.0.2 linux-x64 node-v18.16.1

$ npx docudigger --help [COMMAND]
USAGE
  $ docudigger COMMAND

`docudigger scrape all`

Scrapes all websites periodically (default for docker environment)

USAGE
  $ npx docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l <value>] [-c <value> -r]

FLAGS
  -c, --recurringCron=<value>  [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>        [default: ./logs/] Log path
  -r, --recurring
  --logLevel=<option>          [default: info] Specify level for logging.
                               <options: trace|debug|info|warn|error>

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Scrapes all websites periodically

EXAMPLES
  $ docudigger scrape all

`docudigger scrape amazon`

Used to get invoices from amazon

USAGE
  $ npx docudigger scrape amazon -u <value> -p <value> [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l
    <value>] [-c <value> -r] [--fileDestinationFolder <value>] [--fileFallbackExentension <value>] [-t <value>]
    [--yearFilter <value>] [--pageFilter <value>] [--onlyNew]

FLAGS
  -c, --recurringCron=<value>        [default: * * * * *] Cron pattern to execute periodically
  -d, --debug
  -l, --logPath=<value>              [default: ./logs/] Log path
  -p, --password=<value>             (required) Password
  -r, --recurring
  -t, --tld=<value>                  [default: de] Amazon top level domain
  -u, --username=<value>             (required) Username
  --fileDestinationFolder=<value>    [default: ./data/] Amazon top level domain
  --fileFallbackExentension=<value>  [default: .pdf] Amazon top level domain
  --logLevel=<option>                [default: info] Specify level for logging.
                                     <options: trace|debug|info|warn|error>
  --onlyNew                          Gets only new invoices
  --pageFilter=<value>               Filters a page
  --yearFilter=<value>               Filters a year

GLOBAL FLAGS
  --json  Format output as json.

DESCRIPTION
  Used to get invoices from amazon

  Scrapes amazon invoices

EXAMPLES
  $ docudigger scrape amazon

Docker

docker run \ 
  -e AMAZON_USERNAME='[YOUR MAIL]' \ 
  -e AMAZON_PASSWORD='[YOUR PW]' \
  -e AMAZON_TLD='de' \ 
  -e AMAZON_YEAR_FILTER='2020' \
  -e AMAZON_PAGE_FILTER='1' \
  -e LOG_LEVEL='info' \
  -v "C:/temp/docudigger/:/home/node/docudigger" \
  ghcr.io/disane87/docudigger

Dev-Time 🪲

NPM

npm install
[Change created .env for your needs]
npm run start

Author

👤 Marco Franke

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

Show your support

Give a ⭐️ if this project helped you!

This README was generated with ❤️ by readme-md-generator

Name		Name	Last commit message	Last commit date
Latest commit History 425 Commits
.devcontainer		.devcontainer
.github		.github
.husky		.husky
.vscode		.vscode
bin		bin
docs		docs
scripts		scripts
src		src
test		test
.commitlintrc.js		.commitlintrc.js
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc		.eslintrc
.gitignore		.gitignore
.hintrc		.hintrc
.mocharc.json		.mocharc.json
.releaserc		.releaserc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
dockerfile		dockerfile
dockerfile.debug		dockerfile.debug
npm-shrinkwrap.json		npm-shrinkwrap.json
package.json		package.json
renovate.json		renovate.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to docudigger 👋

🏠 Homepage

Prerequisites

Configuration

Install

Usage

`docudigger scrape all`

`docudigger scrape amazon`

Docker

Dev-Time 🪲

NPM

Author

🤝 Contributing

Show your support

About

Releases

Packages

Languages

License

Joebinator/docudigger

Folders and files

Latest commit

History

Repository files navigation

Welcome to docudigger 👋

🏠 Homepage

Prerequisites

Configuration

Install

Usage

docudigger scrape all

docudigger scrape amazon

Docker

Dev-Time 🪲

NPM

Author

🤝 Contributing

Show your support

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

`docudigger scrape all`

`docudigger scrape amazon`

Packages