Register today for Zero Trust World 2026!
BACK TO BLOGS Back to Press Releases
AI voice cloning and vishing attacks: What every business must know

AI voice cloning and vishing attacks: What every business must know

Written by:

Sarah Kinbar, Strategic Content Writer

Table of contents

AI voice cloning and vishing attacks: Why this matters in 2025

A high-profile case that cost nearly €1M

In early 2025, fraudsters cloned the voice of Italian Defense Minister Guido Crosetto and used it to call high-profile business leaders. The attackers claimed kidnapped journalists needed urgent ransom and that the government could not pay directly. At least one victim transferred nearly one million euros before police froze the funds.  

This case proved that a convincing voice is no longer reliable proof of identity.

Why voice can no longer be treated as identity

Voice used to be considered a strong signal for trust. But when an attacker can create a realistic clone with just 30 seconds of audio, relying on voice as authentication becomes dangerous.

During a recent webinar, ThreatLocker Chief Product Officer Rob Allen demonstrated an AI-generated version of his own voice. The goal wasn’t to trick anyone but to show how accurate and natural the technology has become. The cloned version captured his pronunciation, cadence, and tone so well that it highlighted how easily a real attacker could weaponize such a tool in a social engineering campaign

What is AI voice cloning and vishing

Definition of voice cloning

AI voice cloning uses machine learning to generate synthetic speech that sounds like a specific person. Attackers start by collecting short audio samples from public talks, interviews, webinars, or podcasts. These samples are processed into spectrograms, which map frequency and amplitude over time. Learning models such as encoder-decoder architectures or diffusion-based systems learn the patterns that make a voice unique, including pitch, tone, cadence, and filler words.

Once trained, the system can produce speech in that voice using any text input. Some platforms can even generate speech live, allowing attackers to hold two-way conversations that adapt in real time. The output does not need to be perfect. Telephone audio is compressed, and minor artifacts are hidden by static, background noise, or poor connections. To a listener who expects to hear a familiar voice, the result is convincing enough.

How attackers exploit cloned voices in vishing campaigns

Vishing, short for voice phishing, relies on manipulating human trust during a phone call. Attackers spoof caller ID so that the number matches a known contact, if not one belonging to their spoofed identity. They prepare a plausible pretext, such as a lost phone, urgent travel, or needing an emergency funds transfer. The cloned voice provides authenticity that bypasses natural skepticism.

The attackers’ goals vary but typically include resetting passwords, enrolling new MFA devices, changing banking details, or persuading staff to install remote support tools. Because the voice sounds real, the victim’s instinct to verify is heightened. In business contexts, the trust attached to authority figures like CEOs or government officials makes this method particularly powerful.

The rise of AI voice-based attacks

A brief history of vishing (from Captain Crunch to VoIP scams)

Voice-based manipulation has a long lineage. In the 1970s, phone phreakers like John Draper discovered that a toy whistle could generate the 2600 Hz tone that controlled telephone switches. They used this trick, combined with social engineering, to gain free long-distance calls. Through the 1980s and 1990s, criminals exploited calling cards and private branch exchange systems, often persuading operators to route calls under false pretenses.

The 2000s saw vishing emerge through VoIP, where caller ID spoofing became trivial. Attackers would impersonate banks or telecom providers, urging customers to enter credentials into automated systems. These scams succeeded because they combined technical loopholes with human trust in familiar voices or institutions.

How AI removed the realism and cost barrier

Artificial intelligence removed two major constraints: realism and cost. No longer does an attacker need strong acting skills or long recordings. A 30-second clip can be turned into a reusable voice clone. Tools that once required advanced research labs are now consumer apps or public APIs. The result is that vishing, once rare and manual, has become scalable and repeatable.

Growth of cheap, and accessible voice cloning tools

Voice cloning tools exist across a spectrum:

  • Consumer apps that produce passable voices from a short sample.
  • API-accessible platforms that allow bulk scripted generation.
  • Open-source models that can be fine-tuned on personal hardware or cloud servers.
  • Underground marketplaces that sell pre-made clones of executives or celebrities.

Pricing is low. For a few dollars, an attacker can access tools capable of deceiving all but the most careful listeners. Underground services may charge more for custom clones, but costs rarely exceed a few hundred dollars. Compared to the potential payoff of a successful fraud, the investment is negligible.

Real-world incidents and sectors at risk

The Italian case is only one example. In finance, attackers have used cloned voices to authorize urgent transfers, often paired with spoofed texts or emails for confirmation. In healthcare, calls to reset portal credentials have given attackers rapid access to patient data. In IT environments, help desks have been tricked into resetting passwords or enrolling new MFA devices, creating footholds that attackers use to pivot deeper into networks.

The psychology of trust in voices

Why humans trust familiar voices  

Neuroscience shows that voice recognition activates areas of the brain tied to memory and social bonding. We build strong associations between a voice and identity, often stronger than visual recognition. When a voice sounds familiar, we experience a sense of authenticity and comfort.

Attackers exploit this cognitive shortcut. Authority figures command attention, and urgency reduces critical thinking.  

Authority, urgency, and decision-making under pressure

Behavioral economics research shows that when immediacy and authority cues are combined, error rates in decision-making rise sharply. A cloned voice mimicking a CEO during a crisis call creates the exact recipe for becoming a victim.

How corporate culture can amplify risk

Corporate culture adds another layer. Many organizations expect employees to move quickly when executives request action. A culture of compliance can discourage questioning or delay. Attackers know this and design phone call scripts that make hesitation feel like insubordination. By combining the psychology of trust with organizational norms, vishing becomes highly effective.

The impact on businesses

Financial fraud and wire transfer scams

The most direct impact is financial loss. A single call can move millions of dollars. The FBI’s Internet Crime Complaint Center (IC3) reports billions in losses annually from business email compromise, and AI voice cloning is already being used to upgrade these schemes. Insurers now demand stronger voice verification processes before covering fraud losses.

Data exfiltration through help desk social engineering

Help desks hold the keys to user accounts. An account reset triggered by a cloned voice can lead to compromised email, file shares, and any applications linked to single sign-on. From there, attackers can request additional approvals, spread malware, or exfiltrate sensitive data. Compared to phishing campaigns that target thousands of inboxes, one successful vishing call can yield faster, more valuable access.

Reputation and compliance risks

Beyond direct losses, voice-based fraud damages trust. Customers, partners, and regulators view organizations as careless if they cannot control their identity verification processes. Breach notifications trigger reputational damage, while regulatory frameworks such as GDPR, HIPAA, and PCI DSS impose fines for mishandled data. Legal disputes also emerge over “who approved what” when recordings cannot be trusted. Without strong verification and logging, these cases are difficult to resolve.

How easy is voice cloning today?

  • Data needed: Often just 30 seconds of speech from a public source
  • Consumer, open-source, and underground tools: Widely available
  • Cost to attackers: Negligible compared to potential payouts

Attackers do not need specialized expertise. Many use legitimate AI platforms with prepaid cards and disposable accounts. Others download open-source models that can run on systems with consumer GPUs. Underground vendors advertise ready-to-use clones, often marketed as “celebrity” or “executive” voices, payable in cryptocurrency.

Common tools in a threat actor’s arsenal include text-to-speech APIs, open source cloning frameworks, caller ID spoofing services, VoIP softphones, PBX hosting services, and data scrapers that harvest audio from public sources. Combined with prewritten libraries of pretexts and rebuttals, these tools form a complete kit for pulling off convincing phone fraud.

Risk modeling and red team scenarios

Security teams use scenarios to model how attacks would unfold. A typical red team exercise might simulate a help desk user account reset via a cloned CEO voice. The attacker begins their attack setup with a short audio clip from a public event, trains a clone against the CEO’s voice within, and writes a script for their phone call. They spoof the CEO’s number and present a fake story about losing account access while traveling. The help desk is asked to reset MFA or grant a temporary password.

If the reset is approved, the attacker gains access to email, requests further approvals, and spreads laterally. The only effective defenses are those that require out-of-band verification. Gut instinct is not reliable once the voice itself has been weaponized.

Why traditional defenses do not stop vishing attacks

Antivirus and endpoint detection tools focus on malware and code execution. Vishing is social engineering through legitimate channels. By the time malware appears, the attacker already has account access. If the process itself treats phone calls as trusted, traditional endpoint security arrives too late.

Business protection checklist

Use MFA with voice-independent verification

Do not enroll or reset MFA through a phone call. Verification should occur through secure portals, SSO push notifications, or ticketing systems. Hardware keys and other phishing-resistant methods reduce exposure. Any re-enrollment should require in-person or video verification with known staff.

Human verification steps for high-risk requests

Always call back using phone numbers from the corporate directory. Require unique case numbers or callback PINs generated by the ticketing system. Enforce two-person approval for financial transfers or privilege changes. Record calls and attach them to tickets for audit trails.

Security awareness training for help desk and finance teams

Train staff to expect AI-cloned voices and to slow down during urgent requests. Provide scripts that mirror attacker tactics such as travel delays or executive pressure. Run tabletop exercises and live simulations. Publish a clear rule: no reset or financial movement is ever approved on a single call or through a single person.

How ThreatLocker reduces risk

The ThreatLocker security stack provides layered defenses that break the attack chain.

  • Application Allowlisting ensures that only approved software runs. If an attacker persuades a user to install a remote support tool, it is blocked by default.
  • Ringfencing limits what trusted apps can do. Even if a session is opened, it cannot be used to access files or launch other apps without authorization.
  • Network Control acts as a host-based firewall. It segments endpoints and servers, preventing lateral movement once a single account is compromised.
  • Storage Control restricts access to data across local folders, shares, USB devices, and cloud storage. Even with stolen credentials, attackers cannot reach sensitive files without policy approval.
  • Elevation Control grants privileges to applications rather than users. Attackers cannot trick employees into running tasks with admin rights if elevation is not tied to the user account.
  • Configuration Manager enforces security baselines, such as disabling unused services and enforcing lockouts. These measures reduce the surface area available to attackers.
  • Detect and Cloud Detect provide real-time monitoring of endpoints and Microsoft 365 environments. Suspicious logins, risky behavior, and MFA events trigger alerts before damage escalates.
  • Cyber Hero MDR connects organizations to a live response team that validates and neutralizes threats quickly. Rapid human intervention reduces the window of exposure.

By combining these controls, organizations can ensure that even if a cloned voice persuades someone to act, that action is prevented from leading to compromise.

What comes next for AI voice attacks

The next wave will be more sophisticated. Attackers are experimenting with live AI rebuttals, where the system answers verification questions in real time. Deepfake video combined with voice will allow realistic live video calls.

Defensive technologies are emerging. Researchers are developing watermarking systems that embed identifiers in synthetic speech, though adoption is not widespread. Legal and regulatory systems are also evolving. Courts and compliance frameworks will need to address whether voice recordings remain admissible evidence while cloning remains widespread.

Request your 30-day trial to the entire ThreatLocker platform today.

Try ThreatLocker