Building a traffic fine scraper with Python- 5 mins
The other day I was remembering that during my visit to Mexico City back in June ‘17 (for my participation on the first edition of the PythonDay México), a friend of mine hosted me at his house during my time there. The thing is that during my stay we were talking about building something that could be useful, so we decided to scrape a website in which car owners from Mexico City would be able to verify their traffic fines.
The website is the following: http://www.multasdetransito.com.mx/infracciones/df-distrito-federal
As you may have noticed if you opened the site, it contains an input to write down the car plate from which you can search the traffic attached to that plate (if it has any).
My friend told me that only a small group of Mexico City residents knew about that site, so it might be useful to create something in which they can consult this info in an easier way.
After this, I was like: “Dude, today I held a workshop on how to build a TwitterBot in 30 minutes for the PythonDay, what if we build a TwitterBot that inputs the plate in the site and then scrapes the info from it automatically?”. He said yes. With this, we built MultaBot a TwitterBot that used to read a car plate and scrape for its traffic fines.
We started with this project as a private project, but the other day I was invited by the Division of Law, Politics and Government of my university to give a tech talk about how automation can be useful for future jobs not for only people in tech industry, but also for politicians and lawyers. When I remembered about this, I thought that it would be a nice example, so I wanted to improve on its base and ended with this example script that now I want to share with you.
Selenium will allow us to automate the control of a web browser, so the idea will be to use it to automatically open the website, input the car plate and after the traffic fines are loaded, extract and parse them for the
After this, I created a function to launch the browser and scrape the info off of the car plate. If
verbose is set to
True, the function will print the details of each traffic fine:
Inside the function I set the url to be scraped, then I called function named
busca_placa_df (which means search_plate_df, df stands for Federal District in Spanish) and then I closed and quit the opened browser for this search.
Then, I built the main function that inputs data and then scrapes the traffic fines:
As you see, in lines
24-26 I clear the data in the input field with
id=plate from the HTML source, then input the car plate that I want to look for, and after this I look for the button by its XPath and click it (lines
I now want to know the results for the search, for this I look for the element with
id=secciones in the HTML of the response (lines
31-32) and first verify if it has any debts by looking if the resulting text is only
"INFRACCIONES", if so I returned a message saying t has no debts (lines
33-34). In the other case, in lines
35-36 I save the result with the text it contains (which specifies the number of traffic fines for the plate).
verbose flag is turned on, then the scraper will get all the info of each traffic fine (lines
38-40). For this, it will look for the element with
id=tablaDatos in the HTML, which contains all the info of each traffic fine and if it is not empty (line
41) it will iterate for each traffic fine and then extract all the info them (lines
42-50). I now build a dict with the response to serve it in a nice way using
52-67) and return the result (line
To test the script I just called the function
launch_browser with a demo car plate (that coincidently has 2 traffic fines) and print out the results (lines
If you run this in a script, you’d get an output like this one (note that it prints the 2 traffic fines I mentioned 😝):
At the beginning, what we thought would be a project just for fun to work together that night –like a mini hackathon– ended up being a useful project that could be helpful for citizens in Mexico City.
As you have seen, Python has tools that help to automate some things in very cool ways. I mean, if you run the script as it is provided, you should see a browser window to open automatically, then the text gets written alone in the input field and the results are shown by their own. Isn’t this really awesome?
Please let me know what you build with Selenium and Python in the comments! 💻🐍🤙🏼