Project: ETL

Perform ETL on the data scraped from Quotes to Scrape website. The type of transformation performed for this data (cleaning, joining, filtering, aggregating, etc). The final production database to load the data into (relational or non-relational). The final tables or collections that will be used in the production database.

Jupyter notebook contains the code for the following tasks:

  • Webscraping for : 'http://quotes.toscrape.com/'
  • Imported Splinter, pymongo, pandas, requests
  • Utilized BeautifulSoup, sqlalchemy
  • Created functions for data scraping for:
  • quote text,tags,Author Name, Author Details (born, description)
  • Send data to MongoDB
  • Move data from MongoDB to Postgres
  • Created 3 Tables : Author info, Tags, Quotes

In the app.py: Created FLASK API for the following endpoints.
NAVIGATE TO THESE ENDPOINTS:

  • **"/authors"**
  • **"/quotes"**

Imported flask, sqlalchemy, pandas

Connected engine to the SQL DB AWS server Created multiple routes for endpoint testing.

  • Welcome Page
  • Quotes
  • Authors
  • Tags
  • Top 10 Tags
  View on Github