Posts tagged with "artificial intelligence"

A Dictation App with a CLI Is Exactly What I Needed

By Federico Viticci

Monologue for iOS.

As I mentioned in a recent issue of MacStories Weekly for Club members, I believe that reliable dictation and text-to-speech are largely solved problems in the AI industry right now for most languages. There are certainly subtle differences between the latest models and not-so-subtle discrepancies when you consider local (and free) transcription models versus cloud-hosted (and often expensive) solutions, but by and large, LLMs have “fixed” the problem of fast and high-performance speech-to-text transcription. Whether you’re using Superwhisper, Wispr Flow, Aqua Voice, or a local wrapper for Parakeet or Microsoft’s VibeVoice, chances are that your transcribed text will be more than good enough these days. Just like with regular chatbots, benchmarks matter less and less: it’s the overall user experience that defines products that are otherwise very similar to each other.

Spark Mail Adds a Mac CLI and Agent Skills

By John Voorhees

About two weeks ago, Spark, the email app by Readdle, was updated with a CLI and a set of agentic skills for Claude Code, Codex, and other agents, allowing them read-only access to messages, calendar events, contacts, and meeting notes. These features were extended again a few days ago with new abilities that added email triage actions and more skills. The approach is clever in its local architecture, which keeps your message data on your Mac while making it available to agents.

CLIs are one of this year’s top app trends, with a wide variety of productivity apps adding them. The reason is simple: agents that work in the Terminal like Claude Code and Codex can use local CLIs, which keeps token usage down because the agent only sees a command’s text output instead of carrying tool schemas with it the way MCP servers do.

Spark works with several agents.

Spark isn’t the first to create an email CLI. The Google-created, but “not an official product,” googleworkspace CLI interfaces with Gmail and a bunch of other Google services, offering over 100 skills. The difference is that a CLI like googleworkspace contacts Google’s Gmail servers and acts on your messages in the cloud, whereas Spark’s CLI acts as a remote control for the Spark app itself, managing the messages locally on your Mac and then syncing them back to Gmail via the desktop app.

I’ve worked with both the googleworkspace CLI and Spark’s, and Spark’s is by far the easier one to use because you don’t need to set up a Google Cloud project or deal with OAuth. The only drawback is that the Spark app needs to be open for its CLI to work because everything happens on your Mac. However, as a practical matter, that’s not a limitation that has impacted me since my email app is open when I’d want to use Spark’s CLI or skills anyway.

Read-only actions are available for all users. Triage actions require a Pro subscription.

There are two levels to what Spark offers. The read-only CLI and skills are available to all users, whether or not they subscribe to Spark Pro. Those actions include the ability to search and summarize messages, fetch context, read threads, and view your calendar, contacts, and meeting notes. A Pro subscription adds message drafting, replying, snoozing, pinning, labeling, moving, and archiving, along with team commenting. It’s an excellent set of actions that uses syntax similar to Gmail, which means it should be familiar to many long-time Gmail users straight out of the box.

And there’s more. Readdle has also released a set of recipes and personas, which are open-source skills. The recipes include instructions for morning and end-of-day email reviews, reviewing of new senders, catching up on messages after vacation, and more. Personas are more holistic approaches to your inbox that apply to an entire email session and have modes. For example, the Founder persona has Rapid Triage, Aggressive Delegation, and Cross-Team Oversight modes. Other personas include Executive Assistant, Freelancer, and Team Lead. Full details of every recipe and persona are available on Readdle’s GitHub page.

Searching email via the command line.

I’ve spent time using the read-only actions of Spark’s CLI with Claude Code, and it’s an excellent option for automating your email. Setup is simple and fast, and it works well. I’m not sure personas are for me, but there are a bunch of interesting ideas among the recipes, which I intend to explore more and use to create my own skills.

Spark Mail is available as a free download on the Mac App Store. The CLI’s triage actions are exclusive to users who subscribe to Spark Pro, which costs $20/month or $200/year.

Remodex Is the Best Codex Remote Client for iOS (Until OpenAI Releases an Official Codex Mobile App)

By Federico Viticci

Remodex for iOS.

Various OpenAI employees and members of the Codex team have been hinting at a native Codex app for iOS lately. While I very much hope that’s in the cards – especially if the project involves connecting to a remote Mac running the full Codex app – I wanted to highlight an indie utility I’ve been using a lot lately to access my Codex setup on my Mac Studio server from my iPhone.

The app is called Remodex, and it was created by Italian indie developer Emanuele Di Pietro. Remodex, as the name suggests, acts as a remote for the Codex CLI installed on a macOS computer, and it lets you operate your existing projects and chats with a UI that is reminiscent of the official Codex app for Mac. Even better, Remodex is not based on some hack-y workaround: it’s entirely powered by OpenAI’s official (and open-source) Codex App Server.

OpenAI Targets Coding and Knowledge Work with Its New GPT-5.5 Model

By John Voorhees

OpenAI announced GPT-5.5 and GPT-5.5 Pro today, which it says are faster and able to work more autonomously than the company’s previous models. It’s a message that is sure to interest business users whether their goal is accelerating software development or increasing productivity more generally. Some of the areas that OpenAI says GPT-5.5 and GPT-5.5 Pro excel at include:

writing and debugging code;
analyzing data;
conducting web research;
creating business documents such as spreadsheets and presentations;
using apps; and
juggling multiple tools.

In its press release, OpenAI claims that:

The gains are especially strong in agentic coding, computer use, knowledge work, and early scientific research—areas where progress depends on reasoning across context and taking action over time. GPT‑5.5 delivers this step up in intelligence without compromising on speed: larger, more capable models are often slower to serve, but GPT‑5.5 matches GPT‑5.4 per-token latency in real-world serving, while performing at a much higher level of intelligence. It also uses significantly fewer tokens to complete the same Codex tasks, making it more efficient as well as more capable.

I haven’t tried either model yet, but early reactions seem to support OpenAI’s claims that GPT-5.5 understands user intent better, requiring less precise instructions. The company says it is better at using the tools at its disposal, and checking its own work, too. OpenAI says the Pro model takes that up a notch, working faster on more complex tasks, such as programming, research, and document-intensive workflows. Whether the early hype translates into real-world gains that are noticeable in everday work, remains to be seen, but we shouldn’t have long to wait though, since GPT-5.5 is rolling out to users now.

GPT-5.5 is available in ChatGPT and Codex to Plus, Pro, Business, and Enterprise subscribers, and GPT-5.5 Pro is limited to Pro, Business, and Enterprise subscribers in ChatGPT. Neither model is available through OpenAI’s API, but the company says they will be soon.

Coding Agents Are Reshaping the App Store →

Linked By John Voorhees

While I think it’s fair to take reports from Appfigures and its cohorts with a large grain of salt, its latest report that the App Store is booming rings true to me. As Sarah Perez reports for TechCrunch, first quarter 2026 app releases were up 60% year-over-year. That’s in line with a surge that occurred at the end of last year and just so happened to coincide with the release of Claude Opus 4.5, the model that ignited a coding boom.

Another interesting tidbit from Appfigures is that the Utilities app category moved up the top five chart and Productivity apps, which were missing from the Q1 2024 and Q1 2025 top fives, made it into this past quarter’s top five.

As Perez reports:

The working hypothesis here is that AI-powered tools, like Claude Code or Replit, could be behind the surge of new launches. It also seems possible that we’re hitting some sort of tipping point in terms of AI usability, where it’s easy enough for people to leverage these tools to build their own desired mobile apps more quickly — or even build their first apps ever.

That hypothesis lines up well with the deluge of app pitches we’ve received at MacStories since the end of last year. At first, 2025 just seemed like an unusually busy fall. We always see lots of new apps when Apple refreshes its OSes after all. However, this year, the pace never let up. In fact, the pace accelerated into 2026.

From the view on the ground, this is absolutely the result of AI coding tools. Seasoned developers are releasing new apps more often and updating existing ones faster, and there are more new developers releasing their first apps than ever. Lower barriers to entry and tighter development cycles juiced by coding agents are clearly major factors.

What’s most interesting to me, though, is that the mix of quality apps hasn’t suffered meaningfully. We’ve always been sent a healthy portion of poor quality apps. But from where I sit today, the tidal wave we’ve seen so far isn’t slop. Maybe that will change, and perhaps we’re insulated from it to some degree, but I would have thought that at the pace App Store submissions have increased that there would have been a big difference in the pitches we receive. So far, not so much. Weird, right?

Permalink

OpenAI Unveils Codex “Superapp” Update with Computer Use, Automations, Built-In Browser, and More

By John Voorhees

Source: OpenAI.

Today, OpenAI introduced a long list of productivity and coding updates to Codex. I haven’t had a chance to try the new features myself yet, but the demo OpenAI gave me was as impressive as the company’s message was clear: Codex isn’t just for coders anymore.

It was just over a week ago that OpenAI raised $122 billion in financing and announced it was shifting its focus to building a superapp that brings the capabilities of its models into a unified experience. It turns out that app is Codex, OpenAI’s app that, until today, was focused primarily on developing software.

However, according to OpenAI, 50% of Codex’s users were already giving it non-coding tasks to complete. Combined with the OS flexibility of a desktop environment, that made Codex the natural place to bring together a wide range of new productivity and coding features.

On the productivity side of things, the update allows Codex to operate your desktop apps, interacting with interface elements and inputting text, for example. We’ve seen computer use from other AI companies before, but one thing that sets Codex apart is its ability to work in your apps in the background so they don’t steal the focus from whatever app you’re already using.

Codex’s built-in browser. Source: OpenAI

OpenAI has drawn aspects of its Atlas browser into Codex, too. This allows Codex to prototype websites and apps that users can comment on in-line, creating a tight feedback loop for refining designs. Currently, this feature is limited to running sites and apps via a local server setup, but OpenAI says it will be extended to incorporate actions like interacting with the greater Internet, taking screenshots, and stepping through user flows in the future.

Plugins are taking a big leap forward as well, with over 100 being added to the mix. Like the Claude plugins that Anthropic offers, Codex plugins are composed of a bundle of skills, app integrations, and MCP servers. According to OpenAI, the list includes many popular third-party tools and services like the Microsoft suite, Atlassian Rovo, CodeRabbit, Render, and Superpowers. One of my favorite moments in the Codex demo I saw was a prompt that simply asked, “Can you check Slack, Gmail, Google Calendar, and Notion and tell me what needs my attention?” It’s the sort of query that I think a lot of people can relate to as they start a busy day, and it’s all driven by stacking multiple plugins.

Plugins in action. Source: OpenAI.

OpenAI is also testing an enhancement of Codex’s memory feature as a preview that learns from you as you work. Codex will pick up on your preferences, corrections you make, and context from the tasks you give it. This is the sort of feature that is hard to demo, so I don’t have a good sense for it yet, but I expect that over time, its practical utility will become more clear.

One place OpenAI says Codex’s enhanced memory system will help is with new proactive suggestions. As the app learns your preferences and work patterns, it will offer suggestions on what to do next or where to pick up where you left off. Again, how well this will work in practice remains to be seen, but this is exactly the sort of thing that has made OpenClaw so popular. Having an agent that understands your preferences and accesses your messages, files, and other data in a proactive way can be incredibly useful if done well.

Automations. Source: OpenAI.

Automations have been expanded, too, allowing Codex to use past threads and schedule tasks over days or weeks. These heartbeat automations stay in the same Codex thread and can be modified by the model itself, allowing it to schedule its own follow-ups – again, very much like OpenClaw.

Also new to Codex is support for gpt-image-1.5 for creating image assets as part of workflows like creating presentations, website mockups, and product concepts.

Developers get new sidebar tools and more. Source: OpenAI

Although the focus of today’s update is on productivity, developers haven’t been forgotten. New development features include:

Fast frontend iteration using a combination of the in-app browser, computer use, and image generation tools;
Multiple terminal tabs;
A file sidebar for previewing PDFs, spreadsheets, slides, and other formats;
GitHub PR review support, allowing for review of comments inside Codex;
A summary pane that tracks plans, sources, and artifacts in a single view; and
Remote devbox SSH, an alpha feature for connecting to remote development environments.

That’s a lot, but with more than three million users per week, Codex has proven its popularity well beyond its core coding audience. I’m still skeptical about how much functionality a single app can support, especially when OpenAI addresses the mobile market. I also wonder whether Codex’s productivity and developer tools can coexist without alienating some segment of the app’s users. However, proactive automation of busy work and sifting through mountains of messages and other data is precisely what I’ve wanted from Codex from the start. I’ve seen what it can do when I’m working on a script or app and can’t wait to apply that to my everyday work, too.

Today’s Codex update is available in the desktop app to users with a signed-in ChatGPT account. Computer use is a Mac-only feature at launch (undoubtedly thanks to macOS’s deep accessibility support that was the basis of the same sort of computer use magic we saw in Sky, which was acquired by OpenAI last year), and a rollout of the new features will happen in the EU later. Personalization features like proactive suggestions and the memory enhancements will be coming to Enterprise, Edu, and EU users soon, too.

Google Releases Gemini for Mac

By John Voorhees

Google released a native Mac app for its Gemini chatbot today.

The app, which can be launched from your Applications folder, Dock, the menu bar, or a global hotkey, will be familiar to anyone who has used Gemini in a browser. The chatbot supports Gemini 3 in Fast and Thinking modes, as well as Pro mode, which uses Gemini 3.1 Pro. Gemini can also interact with files, the contents of a window, Google Drive, Photos, and NotebookLM. It’s multimodal, too, with support for the generation of text, images, video, and music. Dig a little deeper into Gemini’s menus and you’ll find support for Canvas, Deep Research, Guided Learning, and Personalized Intelligence.

A Gemini mini window is available from the menu bar and a global hotkey.

Even though I just downloaded the app a short time ago, my Gemini chat history was immediately available in the app. The history appears in the app’s sidebar along with a search field, My Stuff, which includes things like images and video generated in the past, and access to your account. The app is written in Swift which was a pleasant surprise.

All my past prompts were immediately available in the new Gemini Mac app.

I’ve only just begun testing Gemini for Mac, but I can already tell that it’s a cut above my hand-crafted single-purpose Safari web app solution. All the same tools found on the web are here, but in a native wrapper, which I appreciate. If you use a Mac and Gemini, the new app is well worth giving a try.

Gemini for Mac is available as a free download from Google.

Claude Mythos Preview Will Only Secure Part of the Internet

By John Voorhees

Yesterday, Anthropic announced Claude Mythos Preview, a new general-purpose model that it says is exceptionally good at finding security vulnerabilities in code. In fact, the model is so good that Anthropic has decided not to release Mythos Preview to the general public. Instead, it’s being released to a select group of companies that control OSes and other critical software.

Anthropic found thousands of vulnerabilities across every major OS and web browser with Mythos Preview, but used these three examples to illustrate their severity:

Mythos Preview found a 27-year-old vulnerability in OpenBSD—which has a reputation as one of the most security-hardened operating systems in the world and is used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it;

It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem;

The model autonomously found and chained together several vulnerabilities in the Linux kernel—the software that runs most of the world’s servers—to allow an attacker to escalate from ordinary user access to complete control of the machine.

A lengthy Frontier Red Team report brings the receipts for security researchers with an in-depth look at what Mythos Preview uncovered and the step change that the new model represents over Opus 4.6:

For example, Opus 4.6 turned the vulnerabilities it had found in Mozilla’s Firefox 147 JavaScript engine—all patched in Firefox 148—into JavaScript shell exploits only two times out of several hundred attempts. We re-ran this experiment as a benchmark for Mythos Preview, which developed working exploits 181 times, and achieved register control on 29 more.

As part of a test, Mythos Preview also managed to escape its sandboxed environment, message the researcher conducing the test, and then, outside the parameters of the test, posted about the exploit online.

The idea behind Project Glasswing, whose participants include Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, is to give them a head start at securing their systems before similar models emerge and are exploited for cyberattacks. If Mythos Preview’s capabilities are as Anthropic makes them out to be, this seems like the right approach. However, I do worry that with time, it could lead to a two-tier Internet where big tech companies operate in relative security thanks to tools like Mythos Preview, while those without access are left to swim with the sharks.

Roadtripping with ChatGPT Voice Mode

By John Voorhees

On Saturday, my wife Jennifer and I drove to Blowing Rock, a quaint little town in the Blue Ridge Mountains. We’d been there once before, but didn’t know the town well, so as we headed west I poked at the ChatGPT icon on my dashboard to give the app’s new CarPlay integration a try. I asked:

What activities would you recommend for a day trip to Blowing Rock, North Carolina?

What I got back was a short but good list of highlights including a hike, a visit to the Blowing Rock cliffside overlook, a few restaurants, a coffee shop, and some local shops. It was similar to a list of activities I’d looked up before we left using Claude. So far, so good.

I switched back to Apple Maps and was thinking I probably wouldn’t use ChatGPT in my car very often, but that it could come in handy for similar requests, when things got a little creepy. I explained to Jennifer that ChatGPT’s CarPlay feature was new, and I had been meaning to check it out all week. Then, just as I’d said I thought it had done a pretty good job, a voice interrupted. It was ChatGPT’s voice mode saying it was glad I liked it.

You see, just like a phone call doesn’t drop when you switch apps in CarPlay, neither does ChatGPT. I supposed I should have anticipated that the mic would remain live, but I didn’t. Nor did I notice the End button in the corner of the screen; I was driving, not studying the app’s UI.

I take it as a positive sign that I didn’t expect ChatGPT to follow me back to Apple Maps. I treat chatbots like I do any app. Give it some input, and you get an output. Close the app, and you’re done. It’s not my little robot buddy. It’s a tool like any other app.

Of course, that’s not how the voice modes of these chatbots are designed to work. Chats are meant to be an engaging back and forth. But having ChatGPT jump in on our one-on-one conversation while driving down the highway was too much. Suddenly, it felt like something else was in the car eavesdropping on us.

The experience was a good lesson in the balancing of utility and social norms around AI tools. Useful as they can be in some situations, their developers need to be more mindful of user expectations and provide better cues about how they work to avoid uncomfortable surprises. The recommendations we got from ChatGPT were good, but I also don’t expect it will get a second chance on our family road trips anytime soon.