waybackpy is a Python package and a CLI tool that interfaces with the Wayback Machine’s APIs.

Internet Archive’s Wayback Machine has 3 useful public APIs.

 SavePageNow API (also known as Save API)
 CDX Server API
 Availability API

These three APIs can be accessed via the waybackpy either by importing it from a Python file/module or from the command-line interface.

Installed size: 97 KB
How to install: sudo apt install waybackpy

  • python3
  • python3-click
  • python3-requests
  • python3-urllib3
root@kali:~# waybackpy --help
Usage: waybackpy [OPTIONS]

  Python package & CLI tool that interfaces the Wayback Machine APIs



  -u, --url TEXT                  URL on which Wayback machine operations are
                                  to be performed.
  -ua, --user-agent, --user_agent TEXT
                                  User agent, default value is 'waybackpy
                                  3.0.6 -
  -v, --version                   waybackpy version.
  -l, --show-license, --show_license, --license
                                  Show license of Waybackpy.
  -n, -au, --newest, --archive_url, --archive-url
                                  Retrieve the newest archive of URL.
  -o, --oldest                    Retrieve the oldest archive of URL.
  -N, --near                      Archive close to a specified time.
  -Y, --year INTEGER RANGE        Year in integer.  [1994<=x<=9999]
  -M, --month INTEGER RANGE       Month in integer.  [1<=x<=12]
  -D, --day INTEGER RANGE         Day in integer.  [1<=x<=31]
  -H, --hour INTEGER RANGE        Hour in integer.  [0<=x<=24]
  -MIN, --minute INTEGER RANGE    Minute in integer.  [0<=x<=60]
  -s, --save                      Save the specified URL's webpage and print
                                  the archive URL.
  -h, --headers                   Headers data of the SavePageNow API.
  -ku, --known-urls, --known_urls
                                  List known URLs. Uses CDX API.
  -sub, --subdomain               Use with '--known_urls' to include known
                                  URLs for subdomains.
  -f, --file                      Use with '--known_urls' to save the URLs in
                                  file at current directory.
  --cdx                           Flag for using CDX API.
  -st, --start-timestamp, --start_timestamp, --from TEXT
                                  Start timestamp for CDX API in
                                  yyyyMMddhhmmss format.
  -et, --end-timestamp, --end_timestamp, --to TEXT
                                  End timestamp for CDX API in yyyyMMddhhmmss
  -C, --closest TEXT              Archive that are closest the timestamp
                                  passed as arguments to this parameter.
  -f, --cdx-filter, --cdx_filter, --filter TEXT
                                  Filter on a specific field or all the CDX
  -mt, --match-type, --match_type TEXT
                                  The default behavior is to return matches
                                  for an exact URL. However, the CDX server
                                  can also return results matching a certain
                                  prefix, a certain host, or all sub-hosts by
                                  using the match_type
  -st, --sort TEXT                Choose one from default, closest or reverse.
                                  It returns sorted CDX entries in the
  -up, --use-pagination, --use_pagination
                                  Use the pagination API of the CDX server
                                  instead of the default one.
  -gz, --gzip TEXT                To disable gzip compression pass false as
                                  argument to this parameter. The default
                                  behavior is gzip compression enabled.
  -c, --collapse TEXT             Filtering or 'collapse' results based on a
                                  field, or a substring of a field.
  -l, --limit TEXT                Number of maximum record that CDX API is
                                  asked to return per API call, default value
                                  is 25000 records.
  -cp, --cdx-print, --cdx_print TEXT
                                  Print only certain fields of the CDX API
                                  response, if this parameter is not used then
                                  the plain text response of the CDX API will
                                  be printed.
  --help                          Show this message and exit.

Updated on: 2024-May-23