Testing AI Tools with the Same Prompt

Jessica Yang
Sep 25, 2025
13 min read

Updated: Sep 29, 2025

Part 1: Raw outcomes comparison without edits

I’ve always loved learning from YouTube, but there’s a problem: most of the videos I follow are 20 minutes, 40 minutes, sometimes even over an hour long. They’re full of valuable insights, but finding the time to watch them all — and then remembering the key points — has always been a struggle.

That’s why I started this experiment. I wanted to see if AI design tools could help me build something I’ve always needed myself: a smarter way to learn from YouTube. My idea was to create an assistant that automatically summarizes videos, translates them when needed, and lets me take quick notes so I never lose the most important insights.

Before jumping into development, I decided to run a fun experiment:👉 Take one identical product prompt (designing the dashboard for this app) and feed it into multiple AI tools to see how they perform.

To be clear, everything in Part 1 is based on first impressions — what each tool gave me “out of the box.” I focused on surface-level usability and how the prototypes felt at a glance. In the next round, I’ll dive deeper into the details: content logic, information hierarchy, and how well the generated flows actually support real user needs.

Method
- I wrote one unified product/design prompt describing the dashboard of my YouTube Learning Assistant.

Prompt

# Project Requirement: YouTube Learning Assistant Web App

## Purpose

Build a **responsive web app** (desktop + mobile friendly) that helps users learn efficiently from YouTube videos. The app extracts transcripts, translates them, generates AI summaries, and allows users to highlight and annotate content. This will be an MVP version focusing on **text-based workflows**.

---

## Core Features

1. **Dashboard (Home)**

- Input YouTube link field

- Button to fetch transcript

- Button to generate AI summary (using backend API placeholder)

- Display recent generated summaries as cards (thumbnail + title + date)

- Responsive grid/list layout (mobile: 1 column, desktop: 3 columns)

2. **Summary Detail Page**

- Show original transcript (scrollable)

- Show translated Chinese transcript (if original is not Chinese)

- Show AI-generated summary (structured text, longer but clear)

- Highlight function: select text → highlight in yellow

- Add note/comment: context menu when highlighting

- Save highlights + notes (localStorage for MVP)

3. **Notes Page**

- List of all saved highlights and comments

- Organized by video title and date

- Each note can be expanded/edited/deleted

- Simple tag or folder organization for future scalability

4. **Profile/Settings Page**

- Minimal (since single-user MVP)

- Options to:

- Change app theme color (light/dark, accent color)

- Language toggle (English / Chinese)

---

## Navigation

- Use a **bottom navigation bar** on mobile, and a **top navbar** on desktop.

- Tabs:

- Dashboard

- Summaries

- Notes

- Profile

---

## Visual Style

- **Theme**: Light, minimal, friendly

- **Primary color**: Green #2C9C58

- **Secondary color**: Soft gray #F5F7FA

- **Accent**: Yellow #FFD700 for highlights

- **Typography**: Sans-serif (Nunito / Inter), medium weight for headings, normal for body

- **Spacing**: Consistent 16px grid, cards with 20px padding

- **Cards**: Rounded corners (16px), subtle shadow

- **Icons**: FontAwesome (outline style)

- **Interactions**: Smooth transitions (hover, active states)

---

## Technical Requirements

- Framework: React (with TypeScript)

- Styling: Tailwind CSS

- State: React hooks + Context API (for highlights/notes)

- Storage: LocalStorage for MVP

- API Calls:

- Placeholder endpoints:

- `/api/getTranscript`

- `/api/translate`

- `/api/generateSummary`

- Responsive Design: Must look good on desktop (full-width) and mobile (narrow width)

- Accessibility: Basic ARIA roles for buttons and highlights

---

## Deliverables

1. Fully functional **web prototype** with dummy data integration

2. Clear modular code structure:

- `Dashboard.tsx`

- `SummaryDetail.tsx`

- `Notes.tsx`

- `Profile.tsx`

- `components/` for UI elements (Card, Navbar, etc.)

3. Responsive UI preview that simulates desktop + mobile

Tools tested:
- Cursor
- Lovable
- Stitch
- Figma Make
- UX Pilot
- Replit
- V0
- Polymet (Backed by YC)
- Rocket

In this first post, I’ll simply share the raw outputs from each tool. Some of them gave me static UI screens, while others produced clickable prototypes that feel closer to a working product. I won’t add too much commentary here — this is more of a showcase of “what comes out of the box” when you feed each tool the same instructions.

In the next post, I’ll go deeper and critique each outcome: what worked well, where it fell short, and how the differences highlight each tool’s design philosophy and strengths.

Evaluation criteria

UI Quality: visual style, modernity, polish.
UX Flow: how natural the screens feel, navigation, whether it covers the use cases.
Content Logic: how well it understands my prompts (e.g., does it generate relevant components like transcript, summary, notes).
Interpretation & Understanding: did it correctly grasp the purpose of the product, or just generate generic screens?
Tool Usability: how easy it was to generate, iterate, and tweak.
“Aha” Moments: any surprising or delightful details I didn’t expect.
Limitations: where it clearly missed the mark.

Raw Results

Lovable

The typography is clean, well-spaced, and highly readable. Navigation works correctly across pages, and both desktop and mobile layouts are handled gracefully without obvious issues. Icons are automatically styled to match the overall design, and there’s strong consistency across all pages. Small micro-interactions, like cards lifting slightly on hover, add polish. Altogether, it feels simple, organized, and visually coherent. A detail I really liked is that Lovable also auto-generated a small stats section on the dashboard, showing numbers like “Videos Processed,” “Time Saved,” “Notes Created,” and “Highlights.” It’s a subtle touch, but it makes the product feel more alive, as if it’s already tracking my learning activity.

One limitation is that while you can preview and auto-switch between desktop, tablet, and mobile views, you can’t choose a specific phone screen size, which reduces flexibility when testing.

The text structure is clear and easy to follow. Page layouts show good hierarchy, so users can quickly scan and understand what’s happening. Lovable even added useful features automatically, such as filters by type and date on the Summaries page, and an “Auto-translate videos” toggle in Settings for automatically converting non-Chinese videos to Chinese. These additions show that the tool not only generated UI but also inferred some practical product requirements from the brief.

Using Lovable was very straightforward. Lovable’s prototype stood out because of its fast generation speed and the fact that it created very concrete time metrics (e.g., “Worked for 5m 23s”, “Thought for 11 seconds”). The platform also added icons and navigation automatically, saving setup time, although some icons are not accurate, eg. use a widely recognized "Setting" icon for "Highlights". The functionality is limited to navigation and some basic features, like edit and delete. It doesn't support language switch or change to a dark mode. You can also publish it with easiness. It generated a product name, LearnTube, and even added a logo. While the logo itself isn’t particularly well designed, it does contribute to making the prototype feel more like a real product.

AI summary feature:

The AI demonstrates an ability to generate coherent, topic-specific educational content that mimics a lecture note format. The summary is surprisingly structured and clear. It breaks content into well-defined sections. This makes it very easy for a beginner to digest and even gives the impression of being pulled from a well-curated textbook. However, while it’s strong on breadth, it lacks depth. The bullet-point style improves readability and avoids overwhelming users with dense paragraphs. From a UI/UX perspective, the layout is clean and consistent. The use of bold text for the three main types, and section headers like Neural Networks and Deep Learning, helps scanning and keeps cognitive load low. The “Key Takeaway” at the bottom is a nice touch, closing the summary with a clear, motivating message.

Replit

Replit generated a fairly standard interface that feels functional but less polished than Lovable. It is more like a wireframe than a polished high fi design. One noticeable issue is that the Dashboard and Summaries pages are identical, which looks like a bug. The navigation bar also suffers from low color contrast, which makes it harder to read. On the positive side, the Notes & Highlights page works well, with actual highlight and note functionality already implemented, which was a pleasant surprise. Switching between light and dark mode is also available, which adds flexibility for different users. I actually liked the dark mode color, not black but a dark navy blue.

The implementation shows surprising depth. Unlike Lovable, which focused more on UI polish, Replit actually generated a working feature for highlights and notes. The light/dark mode toggle is also fully functional, which makes the product feel more adaptable. That said, the duplication of Dashboard and Summaries reduces content clarity and makes the flow less logical.

Replit is practical and clear in guiding the next steps. The left-side chat panel explicitly instructs the user to connect an API, which lowers the barrier for developers to move forward. Unlike Lovable, Replit allows you to preview the app on specific devices (e.g., iPhone SE, iPhone 16 Pro, Pixel 9, Samsung Galaxy S25), which is a big plus for testing responsiveness across different screen sizes. This makes it more useful for realistic multi-device design checks, though the overall generated UI is less inspiring.

The product name it generated is very straightforward, Youtube Learning Assistant, no logo. OK, fine.

The free plan doesn't support publish, have to upgrade to use it.

AI summary feature:

Replit’s AI-generated summary is structured, practical, and easy to follow. It does a good job of breaking down the learning process step by step and connecting each type of ML to real-world applications, which makes it feel more actionable than just a concept list. However, the formatting and phrasing carry a very “AI-generated” feeling — overly rigid, bullet-heavy, and slightly mechanical in tone. It reads more like an outline from a textbook generator than natural study notes a human would create. While informative, it lacks the nuance, flow, and emphasis that would make it feel more engaging or personally written.

V0 automatically named the tool LearnTube and even added a simple logo, which gives the product more identity and matches well with Lovable’s naming approach. The dark mode option is available, but it defaults to a pure black background. While functional, pure black can feel visually harsh and may reduce readability compared to softer dark gray tones typically used in modern design systems. It supports desktop, tablet and mobile but I could not change to a specific device size. The overall UI design is not good enough.

V0 leans heavily on copy. The homepage includes a clear title, subtitle, and descriptive text around the video area. This makes the product vision easy to understand, but personally, I’d prefer a more minimal approach since too much text can feel overwhelming. A highlight here is that the Summaries page generates an empty state, which is very useful for setting expectations when no content is available. However, I’m curious how this page would handle larger volumes of videos — particularly what filtering or sorting mechanisms would be applied. Because the Summaries page was empty, I couldn’t test highlight or notes functionality, which seems to be missing in this generated version.

AI summary feature:

V0’s AI-generated summary is clear, structured, and beginner-friendly. It not only covers the fundamentals of machine learning but also adds practical layers like “Getting Started Requirements” and “Action Items,” which make it more actionable compared to Lovable or Replit. This feels closer to real learning guidance rather than just a concept breakdown. The examples used, like “cat/not cat images” or “grouping puzzle pieces,” make abstract ideas approachable, which is a nice touch for users. The UI should be improved though.

Stitch

Stitch turned out to be the weakest performer among all the tools I tested. Instead of directly generating all the pages, it first asked me whether I wanted one page or the full set, which makes sense to give users more control. It named the app StudyTube, but then created four different logos across the four pages — a clear lack of consistency. Even worse, the navigation kept changing from page to page: on the homepage it was Home, Summaries, About; on the summaries page it became Home, My Library, Explore; and on the notes page it shifted again to Home, Videos, Notes, Settings. This constant inconsistency made the prototype feel fragmented and unreliable.

On the functionality side, it didn’t come close to what I asked for. Many expected features were missing, and because it only produced static pages rather than a clickable prototype, I couldn’t even test navigation or interactions. Compared to the other tools, Stitch’s result felt incomplete and messy, and unless there’s a major upgrade, I don’t see myself returning to it.

5. Figma Make

I came into Figma Make with pretty high expectations, especially on the visual design side. The result, however, felt underwhelming — the layouts were clean but ultimately quite plain and uninspired. The Summary page didn’t load at all, so I couldn’t evaluate how AI summaries would be presented, which is a major gap for this type of app. On the Notes page, notes were displayed without links back to their original videos, which breaks an important part of the workflow — users need to be able to quickly revisit the source material. Another odd choice was placing the Learning Overview inside the Profile page. It feels misplaced there, since it’s not really about personal settings but more about progress and usage, which would make more sense on a dashboard or dedicated overview page. Clicking the link on Figma Make opens the corresponding video directly on YouTube. In the long run, I’d prefer the video to play inside the app itself, but for an MVP this solution is perfectly fine — and honestly, it was a pleasant surprise to see the functionality already working.

Overall, while Figma Make got the structure in place, it missed opportunities to leverage its strength in design polish and ended up producing something quite bland, with key UX issues that would block real use.

UX Pilot

UX Pilot is positioned quite differently from the other tools I tried. It works as a design plugin that syncs directly with Figma, and instead of generating an entire product from a single prompt, it requires you to start with a high-level product description and then generate each page individually. Every page costs credits (6 per page), and the free plan gives you 90 credits to experiment with.

The generation process feels more granular and controlled — editing details on a page is relatively easy, and you can also switch between wireframes and high-fidelity designs with one click. However, it’s important to note that UX Pilot is not a prototyping tool. It focuses purely on UI output, producing visuals rather than functional flows. That makes it less useful if you want to test navigation or end-to-end interactions, but potentially more appealing to designers who care about speed and fidelity in visuals.

In some ways, UX Pilot may be the type of tool that designers should pay the closest attention to. It doesn’t replace developers or full prototyping platforms, but it comes much closer to augmenting (or even automating) a core part of a designer’s workflow: quickly producing and iterating on clean UI screens.

7. Cursor

I‘ve been using Cursor for a while to build new AI product, usually it works effectively. However, the performance was not good in this competition.

What worked

It did scaffold a basic nav (Home, Summaries, Notes) with a clean, developer-y look and some sensible components (cards, metrics, highlights list).
Highlights UI looks clear and easy to find.

Where it broke down

Navigation wiring is wrong: active tabs don’t match page content; Profile is missing; “View/Generate Summary” doesn’t lead to a real summary view.
Core flow missing: can’t actually see an AI summary; transcript/translate buttons look present but aren’t hooked to routes or mock APIs.
Run/debug friction: repeated “opened in browser” false positives, dev server not truly running, and several manual retries = the longest setup time of all tools you tried.
Reliability: lots of back-and-forth nudging and self-debug attempts from the agent before anything rendered, with regressions along the way.

For one-click prototyping, Cursor under-delivered on this prompt (and Claude Sonnet in Cursor didn’t help). As an implementation partner once a working scaffold exists, Cursor is still excellent. Here, though, it produced the most friction and the least end-to-end functionality of the tools I tested.

8. Polymet

Polymet is a YC-backed AI design tool that positions itself as an “AI product designer.” Right away, its input flow feels different from the others—you can choose between designing a product, a component, or even designing from an image. That already sets it apart conceptually.

Polymet also added a metrics section on the homepage, similar to Lovable, with stats like videos processed and time saved.

But the tool also has serious issues. For example, the header’s background color clashed with the rest of the layout, and unlike most of the other tools, it didn’t implement a way to edit highlights, which is a core feature for my use case. The summaries page is missing. It feels like it has ideas but hasn’t connected them into a coherent product yet.

So my impression is mixed: Polymet isn’t successful as a product at this stage, but it has a uniqueness and raw experimentation that make me want to keep an eye on it.

8. Rocket

I recently came across a new tool called Rocket, and what immediately stood out was its UI quality. Among all the tools I’ve tried so far, Rocket delivers one of the cleanest and most visually appealing designs. The interface is extremely minimalistic, yet the way information is displayed feels clear, structured, and easy to read. Compared to other tools that often clutter the screen or rely on generic templates, Rocket’s design choices make the overall experience feel smoother and more professional.

Because the UI quality is so good, I went into Rocket with high expectations. But the actual results were disappointing. For example, on the homepage the icons are missing, the video input field is displayed without a proper border, and the notes page suffers from major spacing and color issues. One unique detail is that the notes page automatically offered both a table view and a card layout. While the idea is interesting, in practice it doesn’t really make sense — it risks hiding or obscuring important content. I’ll dive deeper into these issues in Part 2 of my analysis.

What I learned from this first round is that AI design tools are exciting, but they’re not magic. Some of them gave me beautiful layouts in minutes, but lacked depth in functionality. Others gave me working prototypes, but required a lot of back-and-forth. The real value wasn’t in any single tool, but in how I combined them — and in how clear I was about my own product vision.

For the next step, I want to explore: How hard is it to actually adjust these drafts into the product I want?

I’ll look at:

UI adjustments: How easy is it to change the visual style, layout, or consistency issues?
Feature gaps: For example, missing highlight-to-video links, or wrong navigation flows — can I fix these quickly or is it a deep rebuild?
Practicality: Does the tool support iteration smoothly, or do I end up starting from scratch?

This will be less about critiquing what AI gave me, and more about evaluating how realistic it is to co-design with these tools to make products with good look and seamless user experience.

Part 1: Raw outcomes comparison without edits

Raw Results

Comments