I built a web scraper!
Project Overview
Earlier today, I finished the project for Phase 3 of my Academy XI course. We were asked to build a web scraper, and we weren’t given many constraints. The major parameters were:
- use a collection of objects (not hashes) to store your data.
- the user has to be able to make a choice, and then get detailed information about their choice.
What I built
I built an app called JSAAJSE — I’ll be the first to admit that the name needs work, haha. What the name stands for is:
Job Seeker App for Aussie Junior Software Engineers
The app scrapes the popular Aussie website seek.com.au for Junior Software Engineer roles. I was considering also scraping other job search sites (like LinkedIn etc), but seek.com.au was the only one that allowed scraping!
A user can choose how many pages they would like to scrape from (with a range of between 1–50), and the scraper will return the results. A user can view all the results together or break down the results by city.
I gotta say, I’m pretty happy with the result :)
Challenges
Some of the notable challenges I faced when building this project were:
Using CSS Selectors on Seek
I encourage you to take a look at the html of one of Seek’s search pages…it’s a beast! Class names are nonsensical, different sections typically have the same class names, and there aren’t many different types of tags used. What that means is that selecting the CSS text I actually wanted was a challenge.
I had to get really specific, and use a lot of ‘ninja-ing’ to get the job done. Here is an example selector:
job.location = post.css(“span._3mgsa7- strong._7ZnNccT a._17sHMz8”).text.strip.gsub(‘Information & Communication Technology’,’’)
Goodness gracious.
Getting results numbered after choosing the city filter
When I first used the ‘filter by city’ feature, I noticed my numbering was off. For example, let’s say I chose ‘filter by Adelaide’, the first job might be numbered 7, then the second result might be numbered 32, etc.
The reason why this was happening was because the instance was finding its number from a massive array which contained results from all cities.
To overcome this, I built a city specific array. Worked like a charm. Boom.
Limited to 162 results
This is a challenge I haven’t been able to overcome yet. Even though everything works, the app currently won’t show more than 162 results in the terminal. Why? Well, I have no idea. I’m not getting any error messages about it either, so will have to do some deeper research, or ask a higher power to help me solve it.
Thoughts on Ruby
When I told a friend of mine (who has done coding bootcamps) that I’d be learning Ruby, she said “Oh gosh, good luck. Ruby is the woooooorst. The wooooooooooooooooooooorst!” and then she burst into flames.
This experience made me fear Ruby initially, but the truth is, I’ve actually really enjoyed this language so far. It feels intuitive, I like what I can build with it, and the community seems pretty cool too (MINASWAN!!!!!).
Being this comfortable with Ruby has actually given me a lot more confidence to revisit some of the languages that I initially struggled with.
Going Forward
I’m proud of the scraper I built. Before this course, I couldn’t even fathom how someone would actually go about building a web-scraper. To have gained skills that I previously thought were unfathomable is pretty damn cool :)
There is still a lot to learn (and a lot to re-learn), but I am definitely more confident in my abilities now.
Next stop: Rails! Let’s hope it doesn’t make me eat my words, haha.