Frog Parser – Parsing Web Pages Effortlessly

Today I want to share with you details about an innovative tool for working with data – “Frog Parser”, project address: https://frog-parser.com This tool is a powerful web scraper that was designed to make the process of extracting data from websites as convenient, understandable and efficient as possible. I have tried to move away from complex programming paradigms, making this tool accessible to users with basic computer science knowledge, while at the same time retaining the flexibility and power needed to perform complex scraping tasks.

“Frog Parser” is based on a project approach, uses hierarchical command structures and takes into account the natural behavior of the user when navigating through web pages. In this article, I want to tell you about the key features of this product, its command language, the types of representation of working documents, as well as the benefits of using “Frog Parser” in your projects. Let’s dive into the details!”

  1. Parsing is considered as a complex of various commands united in hierarchical groups by meaning, or workflow.

  2. From the end user’s point of view, the parsing process is represented as a document called “Frog Parser Workflow Document”.

  3. A project approach is applied: documents are grouped into projects.

  4. The parsing process description language, “Frog Parser Command Language”, can be compared with assembly language – it is a set of commands with parameters.

  5. The “Frog Parser Workflow” document can have different representations: tabular, schematic, text.

  6. “Frog Parser Workflow Document” should be presented as a human-readable text file for easy storage in the version control system and manual editing.

  7. The Frog Parser Command Language should be simple and understandable for a user with basic knowledge of computer science, while providing enough flexibility to perform tasks, but not overloaded with complex programming paradigms common among professional developers.

  8. As a result of the process of parsing data from the website, a “Dataset” is created, which can be exported from the internal representation to common formats such as CSV or MS Excel.

  9. Interaction with a web page follows the generally accepted practices in the web development industry. For example, searching for elements on a page is possible using various methods: by ID, HTML tag name, HTML element attribute name, CSS class, CSS selector.

  10. Web page navigation mimics the behavior of a real user: clicking buttons, entering data into fields, and so on.

  11. To work with “Frog Parser”, the end user does not need to install any additional software – it is enough to have access to the Internet and the latest version of a web browser (Google Chrome, MS Edge).

  12. Parsing tasks are run in batch mode, and the user interface provides for monitoring the status of their execution.

  13. The Frog Parser software product is available on a subscription basis. In the case of an inactive subscription, the product works in demo mode, adding “watermarks” (Watermarks) to the output.”

project list

project list

workflow list

workflow list

Edit workflow as table, example example 1

Edit workflow as table, example example 1

Edit workflow as table, example example 2

Edit workflow as table, example example 2

Edit workflow as table, example example 3

Edit workflow as table, example example 3

Edit workflow a JSON

Edit workflow a JSON

Workflow execution details

Workflow execution details

output dataset

output dataset

Exported data to MS Excel

Exported data to MS Excel

Example “Frog Parser Workflow Document”. In this case, the site created using Ecommerce CMS Shopify is parsed: https://district-theme-demo.myshopify.com/collections/clothing

{
  "version": "1.0.0.0",
  "acceptLanguage": "",
  "commands": [
    {
      "@type": "SetConstantValueToVariableCommand",
      "enabled": true,
      "name": "set-implicitly-wait-variable-value",
      "description": "",
      "retryCount": 1,
      "variableType": "Long",
      "variable": {
        "name": "implicitly-wait"
      },
      "value": {
        "@type": "VariableValueLong",
        "value": 3000
      }
    },
    {
      "@type": "SetImplicitlyWaitCommand",
      "enabled": true,
      "name": "set-implicitly-wait",
      "description": "",
      "retryCount": 1,
      "variable": {
        "name": "implicitly-wait"
      }
    },
    {
      "@type": "CreateDatasetCommand",
      "enabled": true,
      "name": "create-dataset",
      "description": "",
      "retryCount": 1,
      "variable": {
        "name": "dataset"
      },
      "metadata": {
        "columns": [
          {
            "name": "page-number",
            "type": "Integer"
          },
          {
            "name": "product-title",
            "type": "String"
          },
          {
            "name": "product-price",
            "type": "String"
          }
        ]
      }
    },
    {
      "@type": "SetConstantValueToVariableCommand",
      "enabled": true,
      "name": "Set constant value to start-page variable",
      "description": "",
      "retryCount": 1,
      "variableType": "Url",
      "variable": {
        "name": "start-page"
      },
      "value": {
        "@type": "VariableValueUrl",
        "value": "https://district-theme-demo.myshopify.com/collections/clothing"
      }
    },
    {
      "@type": "OpenWebPageCommand",
      "enabled": true,
      "name": "Open start-web-page",
      "description": "",
      "retryCount": 1,
      "variable": {
        "name": "start-page"
      }
    },
    {
      "@type": "SetConstantValueToVariableCommand",
      "enabled": true,
      "name": "set-handle-current-page-to-true",
      "description": "",
      "retryCount": 1,
      "variableType": "Boolean",
      "variable": {
        "name": "handle-current-page"
      },
      "value": {
        "@type": "VariableValueBoolean",
        "value": true
      }
    },
    {
      "@type": "SetConstantValueToVariableCommand",
      "enabled": true,
      "name": "set-page-number",
      "description": "",
      "retryCount": 1,
      "variableType": "Integer",
      "variable": {
        "name": "page-number"
      },
      "value": {
        "@type": "VariableValueInteger",
        "value": 0
      }
    },
    {
      "@type": "WhileLoopCommand",
      "enabled": true,
      "name": "do-while-handle-current-page",
      "description": "",
      "retryCount": 1,
      "variable": {
        "name": "handle-current-page"
      },
      "commands": [
        {
          "@type": "IncrementVariableCommand",
          "enabled": true,
          "name": "increment-page-number",
          "description": "",
          "retryCount": 1,
          "variable": {
            "name": "page-number"
          }
        },
        {
          "@type": "FindElementsCommand",
          "enabled": true,
          "name": "find-product-card-elements",
          "description": "",
          "retryCount": 1,
          "findBy": {
            "@type": "FindByTagName",
            "value": "product-card"
          },
          "variable": {
            "name": "product-card-elements"
          }
        },
        {
          "@type": "ForAllLoopCommand",
          "enabled": true,
          "name": "for-all-product-cards",
          "description": "",
          "retryCount": 1,
          "variable": {
            "name": "product-card-elements"
          },
          "itemVariable": {
            "name": "product-card-element"
          },
          "commands": [
            {
              "@type": "FindChildElementCommand",
              "enabled": true,
              "name": "find-product-card-title-element",
              "description": "",
              "retryCount": 1,
              "findBy": {
                "@type": "FindByClassName",
                "value": "product-card__title"
              },
              "variable": {
                "name": "product-card-title-element"
              },
              "parentElementVariable": {
                "name": "product-card-element"
              }
            },
            {
              "@type": "ReadPropertyCommand",
              "enabled": true,
              "name": "read-product-card-title-element",
              "description": "",
              "retryCount": 1,
              "variable": {
                "name": "product-card-title-element"
              },
              "property": {
                "@type": "ElementPropertyText"
              },
              "newVariable": {
                "name": "product-card-title-text"
              }
            },
            {
              "@type": "FindChildElementCommand",
              "enabled": true,
              "name": "find-product-card-price-element",
              "description": "",
              "retryCount": 1,
              "findBy": {
                "@type": "FindByCssSelector",
                "value": ".price"
              },
              "variable": {
                "name": "product-card-price-element"
              },
              "parentElementVariable": {
                "name": "product-card-element"
              }
            },
            {
              "@type": "ReadPropertyCommand",
              "enabled": true,
              "name": "read-product-card-price-element",
              "description": "",
              "retryCount": 1,
              "variable": {
                "name": "product-card-price-element"
              },
              "property": {
                "@type": "ElementPropertyText"
              },
              "newVariable": {
                "name": "product-card-price-text"
              }
            },
            {
              "@type": "CreateDatasetRowCommand",
              "enabled": true,
              "name": "create-dataset-row",
              "description": "",
              "retryCount": 1,
              "variable": {
                "name": "dataset"
              },
              "columns": [
                {
                  "name": "page-number",
                  "columnVariable": {
                    "name": "page-number"
                  }
                },
                {
                  "name": "product-title",
                  "columnVariable": {
                    "name": "product-card-title-text"
                  }
                },
                {
                  "name": "product-price",
                  "columnVariable": {
                    "name": "product-card-price-text"
                  }
                }
              ]
            },
            {
              "@type": "SaveVariableCommand",
              "enabled": true,
              "name": "save-dataset",
              "description": "",
              "retryCount": 1,
              "variable": {
                "name": "dataset"
              }
            }
          ]
        },
        {
          "@type": "CheckElementExistsCommand",
          "enabled": true,
          "name": "check-next-page-link-exists",
          "description": "",
          "retryCount": 1,
          "findBy": {
            "@type": "FindByCssSelector",
            "value": "a[aria-label=\"Next page\"]"
          },
          "variable": {
            "name": "handle-current-page"
          }
        },
        {
          "@type": "IfThenCommand",
          "enabled": true,
          "name": "if-next-page-link-exists",
          "description": "",
          "retryCount": 1,
          "ifVariable": {
            "name": "handle-current-page"
          },
          "thenCommands": [
            {
              "@type": "FindElementCommand",
              "enabled": true,
              "name": "find-next-page-link-element",
              "description": "",
              "retryCount": 1,
              "findBy": {
                "@type": "FindByCssSelector",
                "value": "a[aria-label=\"Next page\"]"
              },
              "variable": {
                "name": "next-page-link-element"
              }
            },
            {
              "@type": "ClickCommand",
              "enabled": true,
              "name": "click-by-next-page-link",
              "description": "",
              "retryCount": 1,
              "variable": {
                "name": "next-page-link-element"
              },
              "clickIfInvisible": true
            }
          ]
        }
      ]
    }
  ]
}

Summing up, it is worth noting that “Frog Parser” is a unique tool designed to facilitate and simplify the process of parsing data from websites. It combines ease of use for users with basic knowledge of computer science and the power needed to perform complex scraping tasks.

Using “Frog Parser”, you can simplify and speed up the process of extracting data using its intuitive command language, various working document views, and the ability to export data to commonly used formats. In addition, the tool does not require installation of additional software and is available on a subscription basis, which provides maximum flexibility in use.

All in all, “Frog Parser” offers a versatile and flexible solution for working with data, combining the best practices of web development and page navigation, imitating the behavior of a real user. I hope that this software product will help you improve the efficiency and quality of your data management.

Project address: https://frog-parser.com

Facebook page of the project: https://www.facebook.com/frogparser

Telegram: https://t.me/frog_parser_com

Sincerely,

Novel

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *