browser-automation Guide

Name: browser-automation
Author: web-infra-dev

Vision-driven browser automation using Midscene. Operates from screenshots — no DOM or accessibility labels needed. Runs in headless Puppeteer — does NOT take over the user's mouse or keyboard. Also supports CDP mode and Bridge mode to connect to an existing Chrome. Use this skill when the user wants to: - Browse, navigate, or open web pages - Scrape, extract, or collect data from websites - Fill out forms, click buttons, or interact with web elements - Verify, validate, test, or QA frontend UI behavior - Take screenshots of web pages - Automate multi-step web workflows - Test what was just built, see if it works in browser - Connect to Chrome via CDP, DevTools Protocol, or remote debugging - Connect to user's Chrome browser, control my browser, operate my Chrome Powered by Midscene.js (https://midscenejs.com)

233 starsby web-infra-dev

When to use browser-automation

How to use browser-automation

browser-automation is a Claude skill in the SKILL.md format. Add it to your Claude environment from the source repository below, then it activates as a user-invocable skill when your task matches its description.

Skill source

https://raw.githubusercontent.com/web-infra-dev/midscene-skills/main/skills/browser/SKILL.md

Details

PlatformClaude

CategoryFrontend & Web

Invocationuser-invocable

Modelany

Maintainerweb-infra-dev

LicenseMIT

browser-automation Guide

When to use browser-automation

How to use browser-automation

Details

Resources