当前位置: 首页 > 面试题库 >

在Python 3.5中使用aiohttp提取多个URL

东郭宏深
2023-03-14
问题内容

因为Python 3.5引入了async with在推荐的语法文档的aiohttp改变。现在,要获得一个网址,他们建议:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, 'http://python.org'))
        print(html)

我如何修改它以获取一组URL而不是一个URL?

在旧asyncio示例中,您将设置任务列表,例如

    tasks = [
            fetch(session, 'http://cnn.com'),
            fetch(session, 'http://google.com'),
            fetch(session, 'http://twitter.com')
            ]

我试图将这样的列表与上面的方法结合起来,但是失败了。


问题答案:

对于并行执行,您需要asyncio.Task

我已经将您的示例转换为从多个来源获取并发数据:

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        return await response.text()

async def fetch_all(session, urls):
    tasks = []
    for url in urls:
        task = asyncio.create_task(fetch(session, url))
        tasks.append(task)
    results = await asyncio.gather(*tasks)
    return results

async def main():    
    urls = ['http://cnn.com',
            'http://google.com',
            'http://twitter.com']
    async with aiohttp.ClientSession() as session:
        htmls = await fetch_all(session, urls)
        print(htmls)

if __name__ == '__main__':
    asyncio.run(main())


 类似资料: