FAKEUSERAGENT module use and network chief solution Jason

2023-01-04   ES  

When using Python as a crawler, we needcamouflage head information4 4 Anti -climbing strategy, third -party module in Pythonfake_useragentwill solve this problem very well. It will return us to a randomly encapsulated head information, and we can use it directly

FAKE_USERAGENT installation

pip install fake_useragent

FAKE_USERAGENT

From Fake_useragrated Import Useragent 
 # 
  Ua = useragent (). Random 
  request.headers ['user-agent'] = UA

FAKE_USERAGENT’s error of the use process

socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\programdata\anaconda3\lib\site-packages\fake_useragent\utils.py", lin
e 166, in load
    verify_ssl=verify_ssl,
  File "d:\programdata\anaconda3\lib\site-packages\fake_useragent\utils.py", lin
e 122, in get_browser_versions
    verify_ssl=verify_ssl,
  File "d:\programdata\anaconda3\lib\site-packages\fake_useragent\utils.py", lin
e 84, in get
    raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

According to reporting an error message,Inference is caused by the timeout of the network, learned from the online checking data that this library will quote online resources, and its source code Fake_useragent \ settings.py related configuration is shown below:

# -*- coding: utf-8 -*-
from __future__ import absolute_import, unicode_literals

import os
import tempfile

__version__ = '0.1.11'

DB = os.path.join(
    tempfile.gettempdir(),
    'fake_useragent_{version}.json'.format(
        version=__version__,
    ),
)

CACHE_SERVER = 'https://fake-useragent.herokuapp.com/browsers/{version}'.format(
    version=__version__,
)

BROWSERS_STATS_PAGE = 'https://www.w3schools.com/browsers/default.asp'

BROWSER_BASE_PAGE = 'http://useragentstring.com/pages/useragentstring.php?name={browser}'  # noqa

BROWSERS_COUNT_LIMIT = 50

REPLACEMENTS = {
    ' ': '',
    '_': '',
}

SHORTCUTS = {
    'internet explorer': 'internetexplorer',
    'ie': 'internetexplorer',
    'msie': 'internetexplorer',
    'edge': 'internetexplorer',
    'google': 'chrome',
    'googlechrome': 'chrome',
    'ff': 'firefox',
}

OVERRIDES = {
    'Edge/IE': 'Internet Explorer',
    'IE/Edge': 'Internet Explorer',
}

HTTP_TIMEOUT = 5

HTTP_RETRIES = 2

HTTP_DELAY = 0.1

After testing, it was found that because the website of Browsers_Stats_page = ‘https://www.w3schools.com/browsers/default.asp’ cannot be opened.

Solution:download the file to the local area and place it under the corresponding folder.

browser access https://www.w3schools.com/browsers/default.asp URL, then Ctrl+S save the file into anotherfake_useragent_0.1.11.json, note that the name cannot be changed. It is the same as the name configuration of the source file, otherwise it will cause it to be unable to access. As for putting the saved file to that position, you can check the configuration source code:

DB = os.path.join(
    tempfile.gettempdir(),
    'fake_useragent_{version}.json'.format(
        version=__version__,
    ),
)

It was found that it was stitched into the full path of DB with the path of Tempfile.getTempdir (), so the path of Tempfile.getTempdir () is to store the path of Fake_useragent_0.1.11.json. As shown in the figure below, you only need to put the saved json file in the directory, and you can access it normally, and there will be no timeout problems!

Note:If the cache_server is not https://fake-sserager.herokuapp.com/browsers/0.1.11, please update the library:

pip install --upgrade fake_useragent

source

Random Posts

MATLAB Learning Notes# 06Zhenglin

React Senior Guide (4) 【Uncontrolled Components】

vue -two -way data binding

Redhad Linux DHCP (allocated multiple network IP)

zigbee serial communication