GNU/Linux Desktop Survival Guide
by Graham Williams |
|||||
Wget UserAgent Browser Identification |
20210211 Some sites will check whether a browser is being identified to download and if not they will return a 403 Forbidden response. This is to prevent the burden of automated programs using the site's bandwidth. By overriding this we are placing a burden on the websites owner. They may also employ other mechanisms to identify robots and block appropriately. They may even decide to block your IP address transiently or even permanently! So do due diligence before deciding to override the website owner's choices.
Programs and the command line wget typically may not report a UserAgent to the website from which they are downloading any files, or they may report accurately that they are wget, for example.
The reported UserAgent can be changed to avoid the 403 error:
$ wget -U "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" https://example.com/paper.pdf |