Hi, I've got a space that hasn't been updated for a long time and I need to migrate some of its contents to another space. The problem is that since the content hasn't been updated for years, I have grave doubts that many links are already dead.
Can anybody suggest me what can I try to check the links?
So far my approach is to get body.view from all pages in a given space and extract all links from it. The problem is that the number of links makes it unfeasible to check them manually, so what I want is to try some Python script to iterate over them.
import requests
from requests.auth import HTTPDigestAuth
r = requests.get(link, auth = HTTPDigestAuth(user, password))
code = r.status_code
print(code)
My first attempt was to simply use requests library to get the status codes, but it appears that no matter what link I pass there - I get "200" status, even when I look up for non-existing page.
later I realized that all requests return 200 as my requests were redirected to the authentication page. I updated the code with proper authentication and now it works as intended. Maybe someone will find it useful for similar task too.
import getpass
import urllib3
agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/108.0.0.0 Safari/537.36"
passwd = getpass.getpass("type your password: ")
def link_runner_auth(link):
http = urllib3.PoolManager()
headers = urllib3.make_headers(basic_auth=f'user_id:{passwd}',
user_agent=agent)
try:
response = http.request('GET', link,headers=headers).status
time.sleep(0.5)
print(f'{link} ---- {response}')
except KeyboardInterrupt:
print("Keyboard interrupt")
except:
response = print(f'{link} ---- ERROR')
return response
Online forums and learning are now in one easy-to-use experience.
By continuing, you accept the updated Community Terms of Use and acknowledge the Privacy Policy. Your public name, photo, and achievements may be publicly visible and available in search engines.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.