Code Python Script from A to Z

Mor Co
2 min readJun 3, 2021

Today we will code quite basic command line script that will scrape expired domains/URLs related to some subject from Internet Archive.

Code
Photo by Luca Bravo on Unsplash

What Will You Learn From This Tutorial?

  • Basic use of Selenium and WebDriver object.
  • Read parameters that passed through command line.
  • Write concurrent code with Python’s asynchronous characteristics.
  • Send GET requests with “requests” library.
  • Write to external file.

Towards a Working Script

Our main goal will be achieved by searching the archive for our desired keyword, scrape URLs from the result page and check each URL for their HTTP response.

Page.py

First we are going to create Page class in Page.py file:

We import Selenium library and from time we import sleep function so we can wait for the page fully loaded before we start scraping.

main.py

After that we create the main.py file:

in our main file we are going to use sys, requests and asyncio librarys with our Page class we just created above.

We use “sys.argv” to retrieve parameters passed from the terminal/command line and raise an error in case none parameters were passed.

our scan function will iterate through every keyword in our keywords list, retrieve the result page from the archive and find links (elements with “a” tag) under elements that their class name is “result-details”.

At this point we’ve got list full of URLs and we need to check whether they are dead or not. For that, we will use the utility function is_dead and our async function links_loop.

Finally we call all our functions and write our results to an external file called “dead_links.txt”.

Now we can call our new script in the terminal and get our desired results.

1.   (venv) …\scripts>python main.py python blog2.   Keyword(s):3.   python blog4.   Starting to scan:  python blog

--

--