I have completed my dealership inventory search tool and my senior project.
For the last 14 weeks, I have juggled spending time with my wife and daughter, finishing the rest of my classes to prepare for graduation, and working two jobs. Finally, I have reached the end. I was able to build a fully functioning dashboard with samples of real data to reflect the use of the dashboard. The full extent of this project is overwhelming and will require a lot more resources to continue; therefore, I refer to my final product as a prototype.
My Goal: Create a dashboard that allows a user to search for a specific vehicle and see it’s pricing and location across the united states.
This project required me to dig into web scraping, UI design, a statistical analysis to give insightful results.
I spent several hours studying the HTML and CSS structure of multiple dealership websites. I also had to study multiple web scraping tools to determine which one gave me the quickest and most effective results. Finally, I had to actually code the web scraping tool. I created multiple versions of this code.
My first was very sloppy. I created every web scraping tool differently which resulted in a major headache when I realized that this would take an eternity if I didn’t come up with some sort of automated system. I honestly didn’t believe, at first, that you could automate web scraping. Then, I figured out that a lot of dealerships use auto-generated websites that come from similar companies that use the same HTML structuring.
I started putting that information to use by creating a script that could successfully scrape that web page and then copy and paste it into another file and change a few details to fit the next website.
Later, I tossed out my fear of using classes and created a beautiful class that I could call to scrape multiple websites. This cut my scarping time down by a lot.
Unfortunately, after I created that beautiful class, the websites started changing on me and then I had to start learning different methods of web scraping to keep up with the newer formats. For now, it was good enough for me that I was pulling in 5 dealership inventories.
After completing my class and implementing it into a main function, all that was left was to append all of the datasets together and export it as one big csv. From there, I am just pulling it from a folder into Power BI.
While designing the layout, I had to decide if I wanted to use multiple pages to show my insights or if I wanted to try to stick it all on one page. After playing with it for a bit, I found a way to put everything on one page without making it look too claustrophobic and without having to leave out important visuals. I designed the layout to use the cards method where each unit is a card with a shadow. I placed the filters at the top of the page and the main vehicle filter at the left. I positioned the most important information in the center of the screen. It has a very clean look but lacks a bit of color. I will be looking into how I can add color without bringing it’s aesthetics back to the early 2000’s.
My faculty adviser advised me to show the price of vehicles in percentiles. Originally, I was just showing the minimum, average, and maximum prices, but by showing the prices bby percentiles I am able to give a better representation of not only the prices but an idea of how many vehicles are at that price.
The biggest challenge with this project has been collecting the data. It is a very large task to web scrape every inventory across the united states. That is about 30,000 websites. I am going to need to look for alternatives to get my hands on the data. Even if I manage to web scrape that many websites, it will take about a minute to web scrape each one, which isn’t possible to do in a day.
Something I am looking forward to adding is the ability to look at vehicles across distance. I would like to add a slicer that allows you to choose a range in miles and have the data filter to the vehicles in that range.
Disclaimer: The images are unable to render. As soon as I am able to render images in my blog, I will display photos to help visualize my project.
For attribution, please cite this work as
Sant (2021, April 8). Data Science with Keaton: Senior Project: Final Prototype. Retrieved from https://keatonjsant.github.io/posts/2021-04-08-spfinalprototype/
BibTeX citation
@misc{sant2021senior, author = {Sant, Keaton}, title = {Data Science with Keaton: Senior Project: Final Prototype}, url = {https://keatonjsant.github.io/posts/2021-04-08-spfinalprototype/}, year = {2021} }