A person holding a Samsung smartphone in their hand

Three Weeks In: What Building an MVP Actually Looks Like

Author of post Smiling

I'm about three weeks into building this thing.

Two weeks of actual development. And I'm rounding the corner toward having a minimal viable product ready for beta testing.

This is the third post in this series, and I want to talk about something that doesn't get discussed much: what the day-to-day problem-solving actually looks like when you're building something new.

Not the dramatic pivots or catastrophic failures. Just... the gradual realization that your initial plan won't work, and the process of figuring out what will.

The Voice Recording Pivot

Yesterday, I was working on the voice notes feature—where users can record quick observations about the person they just met at a networking event.

I started with what seemed like the obvious approach: use the Web Speech API that's built into browsers. Free, native, should work everywhere.

Tested it on my Windows laptop. Worked great. Live transcription appearing on screen, smooth recording, everything I wanted.

Then I tested it on my Android phone.

It didn't work the same way. Words were duplicating. The "live transcription" feature that worked on Windows didn't exist on Android. Not broken—just fundamentally different.

I spent maybe 90 minutes trying to fix it. Made five or six attempts, each time hoping the next tweak would solve it.

It didn't.

Not because I couldn't eventually figure it out. But because I realized I was fighting against how the platform fundamentally works. Android's implementation is just different. I could write platform-specific code to handle both, but then I'd be maintaining two different transcription systems forever.

So I looked into paid transcription services. Found Deepgram—supports 36 languages, works identically across all devices, handles the complexity I was trying to build myself.

I switched. Rebuilt it in 90 minutes. It works now.

You Don't Know What You Don't Know

Here's the thing: I didn't make a "wrong" choice initially. I made a choice based on what I knew at the time.

Free native API that works in browsers? Sounds perfect for an MVP. Why pay for something you can get for free?

Because "free" comes with constraints you don't discover until you're already building with it.

This isn't about the dramatic lesson of "free isn't actually free." It's simpler than that: you just don't know what you don't know until you start building.

Every project has these moments. You make a plan based on research and assumptions. You start implementing. Reality provides feedback. You adjust.

Not a crisis. Not a failure. Just... the process.

The Actual Takeaway

I started this post wanting to share the voice recording pivot story. But the real insight isn't about choosing Deepgram over Web Speech API.

It's that building something real is inherently messy and uncertain.

You make plans. You encounter reality. You adjust. You keep moving.

You don't know what you don't know. And that's okay. You figure it out as you go.

Not black and white. Just gray. Just the process of building something that works well enough to put in front of real users.

And then iterating from there.

The Real Progress: What's Actually Done

Let me step back and give you the full picture of where things actually stand.

Timeline:

  • Week 1: Research, architecture planning, setting up the development environment
  • Week 2-3: Building core features

What's working now:

  • ✅ Take photo of business card
  • ✅ AI extracts contact information (Anthropic Claude)
  • ✅ User can edit/correct the extracted data
  • ✅ Voice recording with transcription (Deepgram)
  • ✅ Works on Windows, Android, iOS

What I'm working on this week:

  • 📝 Email draft generation (using the voice note + contact data)
  • 📝 The mailto: flow (how the draft appears in their email client)
  • 📝 Making sure it works on both mobile and desktop

The Bigger Philosophy: Tools Should Just Work

All of this comes back to something I've been thinking about a lot at Hite Labs:

Your tools should work for you, not the other way around.

What does that mean in practice?

It means you shouldn't have to think about the AI transcription happening in the background. It should just capture what you said accurately.

It means you shouldn't have to manually format an email. The tool should draft something professional that you can quickly edit and send.

It means the interface should be simple enough that you can use it while standing in a crowded networking event with one hand.

Powerful technology enabling easy experiences.

That's the goal. Not "look at all this cool AI stuff." Just: "I met someone, recorded a quick note, got a draft email, sent it. Done."

What's Next: Beta Testing

This week, I'm finishing the email generation piece. Making sure the mailto: flow works smoothly on both mobile and desktop.

And then beta testing.

That means getting this in front of people outside my immediate circle. People who are actually networking. People who can try the different features and tell me:

  • What's working?
  • What's not?
  • Where are the bugs?
  • What's confusing?
  • What would be nice to have?

Real input from real people using the app.

Looking for Beta Testers

Here's where I could use help:

I'm looking for 2-3 additional beta testers who:

  • Will be networking in the next 2-3 weeks (conferences, events, meetings)
  • Have time to test the different flows (card scanning, voice notes, email drafts)
  • Are willing to give candid feedback

Not looking for hundreds of people. Just a few who have the time and interest to actually use this and report back.

If that's you—or if you know someone who fits—reach out. I'd genuinely appreciate it.