24 dataset for retail and ecommerce


Food datasets

  • Fashion-mnist: Ideal for product categorization. MNIST contains nearly 60,000 training images and 10,000 test images of fashion industry products in 10 classes.
  • Innerwear Data from Victoria’s Secret and Others: Data from 600,000+ underwear items retrieved from popular retail sites. Includes product description, price, category, rating and more.
  • Electronic Products and Pricing Data: Contains a list of over 7,000 electronic products.
  • Men’s Shoe Prices: List containing 10,000 men’s shoes and prices.
  • Women’s Shoe Prices: List containing 10,000 women’s shoes and prices.
  • eCommerce Item Data: Suitable for recommendation systems. This dataset contains part numbers and related product descriptions from the outdoor clothing brand product catalog.
  • Fashion Products on Amazon.com: This is a pre-crawled dataset created by retrieving data from Amazon. It consists of approximately 22,000 fashion items on Amazon.
  • E-commerce Tagging for Clothing: Contains images from ecommerce sites with bounding boxes drawn around shirts, jackets, sunglasses, etc. It contains 907 items, of which 504 items were tagged manually.

Retail Transaction Datasets

  • Online Retail Dataset (UCI Machine Learning Repository): Contains all transactions for an eight-month period (01/12 / 2010-09 / 12/2011) for a UK online retail company.
  • Brazilian E-Commerce Public Dataset: contains more than 100,000 anonymized orders from Brazil placed on List (100 thousand orders) from 2016 to 2018 is made at several trading floors. In addition, it includes many measurements from order status, price, payment and transportation efficiency to real written customer reviews.
  • Online Auctions Dataset: Retail dataset containing eBay auction details for Cartier watches, Xbox game consoles, Palm Pilot M515 PDAs and Swarovski beads.
  • Retailrocket Recommender System Dataset: This data has been collected from a real ecommerce website for 4.5 months. In addition, it contains information about visitor behavior, including events such as clicks, shopping cart additions, and transactions.

Ecommerce search relevance


  • ECommerce Search Relevance: Contains the URLs of the images, the rating on the page, a description of each product, the search query that led to each result, and much more of the five main English-language e-commerce sites.
  • Best Buy Search Queries NER Dataset: Contains manually tagged search queries on bestbuy.com in search queries there are phrases tagged with various important entities such as brand, model name, category name, etc.

Customer reviews


  • Women’s E-Commerce Clothing Reviews: Another great resource for e-commerce data, this Kaggle dataset contains 23,000 real customer reviews and ratings. However, since this is real business data, all information has been anonymized. Because of this, links to the company in the body of the review and body were replaced by a “retailer”.
  • Amazon Commerce Reviews Set: This retail dataset is used to identify authorship in online Writeprint, which is a new area of ​​research in pattern recognition. In addition, to check the reliability of the classification algorithms, we identified the 50 most active users who often published reviews.
  • Multidomain Sentiment Analysis Dataset: An older dataset containing product review data by product type and rating. In addition, reviews contain Star ratings (from 1 to 5 stars), which, if necessary, can be converted to binary tags.
  • Amazon and Best Buy Electronics: List of over 7,000 online reviews of 50 electronic products. In addition to the review itself, the data set includes date, source, rating, title, reviewer metadata, and more.
  • Grammar and Online Product Reviews: Contains a list of over 70,000 reviews that can be used for several machine learning use cases. For example, you can evaluate how the quality of the description affects the positive and negative reviews of a product on the Internet.



  • Annual Retail Trade Survey (ARTS): Contains national estimates of total annual sales, operating expenses, and stocks held outside the United States.
  • Economic census: The Economic Census provides a detailed portrait of business activity in industries and communities every five years, from the national to the local level.
  • E-stats: This dataset by the US government reports the value of goods and services sold online whether over open networks such as the Internet. The dataset of US government reports on the value of goods and services sold online through open networks (like the Internet).
  • EU External Trade Datasets: EU foreign trade dataset, contains information on the size of imports, exports and trade surpluses, classified by product, as well as by country of origin or destination.
  • ECommerce Sales by Merchandise Category 1999-2015: Contains real census data that shows total e-commerce sales by product line item and cumulative annual growth rate from 1999-2015.


Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by taking SkillFactory online courses:

Read more

  • 450 free courses from the Ivy League
  • Free Data Science Courses from Harvard University
  • 30 life hacks to complete the online course
  • The most successful and most scandalous Data Science project: Cambridge Analytica

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *