如何使用 Boto3 在 AWS Glue 数据目录中启动爬网程序

2023-06-12 15:39:03 305

解决这个问题的方法/算法

第一步：导入boto3和botocore异常处理异常

第二步：crawler_name是这个函数中的参数。

步骤3：使用boto3lib创建AWS会话。确保在默认配置文件中提到region_name。如果未提及，则在创建会话时显式传递region_name。

第4步：为glue创建一个AWS客户端。

第5步：现在使用start_crawler函数并将参数crawler_name作为Name传递。

第6步：它返回响应元数据并启动爬虫，而不管它的时间表如何。如果crawler的状态是running，那么它会抛出CrawlerRunningException。

第7步：如果在启动爬虫时出现问题，则处理通用异常。

示例代码

以下代码在AWSGlue数据目录中启动爬网程序-

import boto3
frombotocore.exceptionsimport ClientError

def start_a_crawler(crawler_name)
   session = boto3.session.Session()
   glue_client = session.client('glue')
   try:
      response = glue_client.start_crawler(Name=crawler_name)
      return response
   except ClientError as e:
      raise Exception("boto3 client error in start_a_crawler: " + e.__str__())
   except Exception as e:
      raise Exception("Unexpected error in start_a_crawler: " + e.__str__())

#1st time start the crawler
print(start_a_crawler("Data Dimension"))
#2nd time run, before crawler completes the operation
print(start_a_crawler("Data Dimension"))

输出结果

#1st time start the crawler
{'ResponseMetadata': {'RequestId': '73e50130-*****************8e', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Sun, 28 Mar 2021 07:26:55 GMT', 'content-type': 'application/x-amz-json-1.1', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': '73e50130-***************8e'}, 'RetryAttempts': 0}}

#2nd time run, before crawler completes the operation
Exception: boto3 client error in start_a_crawler: An error occurred (CrawlerRunningException) when calling the StartCrawler operation: Crawler with name Data Dimension has already started

如何使用 Boto3 在 AWS Glue 数据目录中启动爬网程序

示例

解决这个问题的方法/算法

示例代码

热门推荐

随机推荐