Packages and Binaries:

waybackpy

waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine’s APIs.

Internet Archive’s Wayback Machine has 3 useful public APIs.

 SavePageNow API (also known as Save API)
 CDX Server API
 Availability API

These three APIs can be accessed via the waybackpy either by importing it from a Python file/module or from the command-line interface.

Installed size: 97 KB
How to install: sudo apt install waybackpy

Dependencies:
  • python3
  • python3-click
  • python3-requests
  • python3-urllib3
waybackpy
root@kali:~# waybackpy --help
Usage: waybackpy [OPTIONS]

                       _                _
                      | |              | |
  __      ____ _ _   _| |__   __ _  ___| | ___ __  _   _
  \ \ /\ / / _` | | | | '_ \ / _` |/ __| |/ / '_ \| | | |
   \ V  V / (_| | |_| | |_) | (_| | (__|   <| |_) | |_| |
    \_/\_/ \__,_|\__, |_.__/ \__,_|\___|_|\_\ .__/ \__, |
                  __/ |                     | |     __/ |
                 |___/                      |_|    |___/

  Python package & CLI tool that interfaces the Wayback Machine APIs

  Repository: https://github.com/akamhy/waybackpy

  Documentation: https://github.com/akamhy/waybackpy/wiki/CLI-docs

  waybackpy - CLI usage(Demo video): https://asciinema.org/a/469890

  Released under the MIT License. Use the flag --license for license.

Options:
  -u, --url TEXT                  URL on which Wayback machine operations are
                                  to be performed.
  -ua, --user-agent, --user_agent TEXT
                                  User agent, default value is 'waybackpy
                                  3.0.6 -
                                  https://github.com/akamhy/waybackpy'.
  -v, --version                   waybackpy version.
  -l, --show-license, --show_license, --license
                                  Show license of Waybackpy.
  -n, -au, --newest, --archive_url, --archive-url
                                  Retrieve the newest archive of URL.
  -o, --oldest                    Retrieve the oldest archive of URL.
  -N, --near                      Archive close to a specified time.
  -Y, --year INTEGER RANGE        Year in integer.  [1994<=x<=9999]
  -M, --month INTEGER RANGE       Month in integer.  [1<=x<=12]
  -D, --day INTEGER RANGE         Day in integer.  [1<=x<=31]
  -H, --hour INTEGER RANGE        Hour in integer.  [0<=x<=24]
  -MIN, --minute INTEGER RANGE    Minute in integer.  [0<=x<=60]
  -s, --save                      Save the specified URL's webpage and print
                                  the archive URL.
  -h, --headers                   Headers data of the SavePageNow API.
  -ku, --known-urls, --known_urls
                                  List known URLs. Uses CDX API.
  -sub, --subdomain               Use with '--known_urls' to include known
                                  URLs for subdomains.
  -f, --file                      Use with '--known_urls' to save the URLs in
                                  file at current directory.
  --cdx                           Flag for using CDX API.
  -st, --start-timestamp, --start_timestamp, --from TEXT
                                  Start timestamp for CDX API in
                                  yyyyMMddhhmmss format.
  -et, --end-timestamp, --end_timestamp, --to TEXT
                                  End timestamp for CDX API in yyyyMMddhhmmss
                                  format.
  -C, --closest TEXT              Archive that are closest the timestamp
                                  passed as arguments to this parameter.
  -f, --cdx-filter, --cdx_filter, --filter TEXT
                                  Filter on a specific field or all the CDX
                                  fields.
  -mt, --match-type, --match_type TEXT
                                  The default behavior is to return matches
                                  for an exact URL. However, the CDX server
                                  can also return results matching a certain
                                  prefix, a certain host, or all sub-hosts by
                                  using the match_type
  -st, --sort TEXT                Choose one from default, closest or reverse.
                                  It returns sorted CDX entries in the
                                  response.
  -up, --use-pagination, --use_pagination
                                  Use the pagination API of the CDX server
                                  instead of the default one.
  -gz, --gzip TEXT                To disable gzip compression pass false as
                                  argument to this parameter. The default
                                  behavior is gzip compression enabled.
  -c, --collapse TEXT             Filtering or 'collapse' results based on a
                                  field, or a substring of a field.
  -l, --limit TEXT                Number of maximum record that CDX API is
                                  asked to return per API call, default value
                                  is 25000 records.
  -cp, --cdx-print, --cdx_print TEXT
                                  Print only certain fields of the CDX API
                                  response, if this parameter is not used then
                                  the plain text response of the CDX API will
                                  be printed.
  --help                          Show this message and exit.

Updated on: 2024-May-23