Bugku CTF Web (Question 10-15)

2022-12-28   ES  

1: Introduction
in “Section 14.6 Python simulation browser access to the implementation code of the webpage“introduces the method of accessing the webpage using the request module using the URLLIB package. However, the last section specifically explained that the HTTP message head of the ACCEPT-ENCODING is best not set, otherwise the server will use the corresponding method to compress the HTTP message according to the situation of the field and server. Response reported style. This section briefly introduces how to deal with the compression of the response in response.
When climbing the webpage on the reptile, if you pass the “‘Accept-Encoding’:” GZIP ‘”information in the request header, the server will use GZIP compression packets. At this time Messages. The GZIP compression requires the installation of the GZIP module and determines whether the server compresses the message when the server returns the HTTP response message.

2. Steps to support compressed processing on the message of the HTTP response message
To respond to the compression of HTTP messages, the application of crawlers needs to be treated as follows:
1. Set the compressed format that can be supported in the Accept-Encoding in the HTTP message header of the request message;
2. After reading the response message, determine the compression format of the return value of the contact-enCoding in the head of the response.
3. Call the corresponding decompression method for postproof decompression.

3, case
1. Import related modules:
import urllib.request
from io import BytesIO
import gzip

2. Constructors support compression request report head
This section is in “Section 14.5 HTTP request header access to the http information constructed by the browser

On the basis of the MKHead function of, add a parameter to confirm whether the compression packet needs to be processed. Request the account head of the Accept-Encoding parameter, the code is as follows:

  def mkhead(NeedEncoding=False):
    if NeedEncoding:
        header = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Encoding':'gzip',
        'Accept-Language':'zh-CN,zh;q=0.9',
        'Connection':'keep-alive',
        'Cookie':'uuid_tt_dd=10_35489889920-1563497330616-876822;......',
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
    else:
        header = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
        'Accept-Language':'zh-CN,zh;q=0.9',
        'Connection':'keep-alive',
        'Cookie':'uuid_tt_dd=10_35489889920-1563497330616-876822;......',
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.100 Safari/537.36'}
        
    return header

3. After reading the response message, the compression format of the return value of Content-Encoding in the head of the text

reQ = urLlib.request.request (url = site, headers = header) 
    sitrsp = urLlib.request.urlopen (REQ) 
    Encoding = sitesp.info (). Get ('Content-Entiding') #

4. After processing according to the corresponding situation of the compression, return to the text on the body. If it is GZIP compression, call the GZIP to decompress it.

if encoding == 'gzip': # Judging whether the compression format is GZIP format 
         Print ("Encoding == 'gzip'") 
         buf = bytesio (sitesp.read ()) 
         fzip = gzip.gzipfile (fileObj = BUF) 
         Return fzip.read (). Decode () 
     Elif Not Encoding: #Whether there is no compression message 
         Print ("Encoding == None") 
         Return sitesp.Read (). decode () 
     else: 
         Print (F "Content-Entoring = {Encoding}, can'T unzip") 
         Return None

4. Case Full Code

# Read the compression HTTP response message 
 Import urllib.request 
 From IO Import bytesio 
 Import GZIP 
   
 def mkhead (needencoding = false): 
     if needencoding: 
         Header = {'Accept': 'Text/HTML, Application/XHTML+XML, Application/XML; q = 0.9, Image/Webp, Image/APng,*/*; B3 ', 
         'Accept-entreteting': 'gzip', 
         'Accept-Language': 'zh-cn, zh; q = 0.9', 
         'Connection': 'Keep-alive', 
         'Cookie': 'uuid_tt_dd = 10_3548989920-1563497330616-876822; 
         'User-agent': 'mozilla/5.0 (Windows nt 6.1; wow64) Applewebkit/537.36 (KHTML, LIKE GECKO) Chrome/76.0.3809.100 Safari/537.36'} 
     else: 
         Header = {'Accept': 'Text/HTML, Application/XHTML+XML, Application/XML; q = 0.9, Image/Webp, Image/APng,*/*; B3 ', 
         'Accept-Language': 'zh-cn, zh; q = 0.9', 
         'Connection': 'Keep-alive', 
         'Cookie': 'uuid_tt_dd = 10_3548989920-1563497330616-876822; 
         'User-agent': 'mozilla/5.0 (Windows nt 6.1; wow64) Applewebkit/537.36 (KHTML, LIKE GECKO) Chrome/76.0.3809.100 Safari/537.36'} 
        
     Return header 


 DEF Readweb (Site): 
     header = mkhead (true) 
     try: 
         REQ = Urllib.request.request (url = site, headers = header) 
         sitrsp = urLlib.request.urlopen (REQ) 
     Except Excetion as E: 
         Print (e) 
         Return None 
     Encoding = sizep.info (). Get ('Content-Entiding') 
     if encoding == 'gzip': 
         Print ("Encoding == 'gzip'") 
         buf = bytesio (sitesp.read ()) 
         fzip = gzip.gzipfile (fileObj = BUF) 
         Return fzip.read (). Decode () 
     Elif Not Encoding: 
         Print ("Encoding == None") 
         Return sitesp.Read (). decode () 
     else: 
         Print (F "Content-Entoring = {Encoding}, can'T unzip") 
         Return None 

       
 Readweb (r'https://blog.csdn.net/laoyuanpython/article/details/100585881 ') [0: 100]]

execution results:

>>> readweb(r'https://blog.csdn.net/LaoYuanPython/article/details/100585881 ')[0:100]
 Encoding== 'gzip'
'<!DOCTYPE html>\n<html lang="zh-CN">\n<head>\n    <meta charset="UTF-8">\n    <link rel="canonical" href'
>>>

Note: The cookie settings in the code can not, that is, anonymous crawling web pages. If you need to be non -anonymous, you need to set it according to your browser’s cookies.

This section introduces the Request module using the URLLIB package to read the webpage and support the implementation process of decompression to support the compressed transmission of web pages.

Old Ape Python, learn Python with the old ape!
Blog Address: https://blog.csdn.net/laoyuanpython

Old Ape Python Blog Catalog: https://blog.csdn.net/laoyuanpython/article/details/98245036
Please support, like, comment, and follow! Thanks!

source

Related Posts

About Form Form acquisition value and setting worthy method

Log4J2

Use spark, ANSJ segmentation to perform frequent statistics

F1 score, Micro F1Score, Macro F1Score

Bugku CTF Web (Question 10-15)

Random Posts

vmware cloning CentOS7 virtual machine

LINUX Installation SQL Server Prompt this Program Requires A Machine with at Least 2000 MEGABYTES of Memoryqq

Based on PostgreSQL and Postgis, Mars coordinate system, Baidu coordinate system, WGS84 coordinate system, CGCS2000 coordinate system

MySQL8.0.26 Installation Tutorial (Super Simple)

IOS memory management and malloc source code interpretation