Vishing attacks become second-most observed initial infection vector
A person’s voice used to be considered a strong signal for trust. Now however, an attacker can create a realistic voice clone with just 30 seconds of audio, meaning relying on voice alone as authentication is no longer an option.
While exploits remain the most common initial infection vector, Mandiant reports that highly interactive vishing attacks recently became the second-most observed vector, accounting for 11% of all intrusions.
The rise is voice phishing attacks and their efficacy is a result of advancements in AI voice cloning, and the effects and potential reach are incredibly worrying.
In early 2025, fraudsters cloned the voice of Italian Defense Minister Guido Crosetto and used it to call high-profile business leaders. The attackers claimed kidnapped journalists needed urgent ransom and that the government could not pay directly. At least one victim transferred nearly one million euros before police froze the funds.
In late 2025, ThreatLocker CEO Danny Jenkins warned of this growing threat with a video of him speaking Spanish—a language he doesn’t speak.
In today’s threat landscape, a familiar or convincing voice is no longer reliable proof of identity.
What is vishing?
Vishing definition
Vishing (voice phishing) is another form of phishing. It's when an attacker attempts to social engineer a victim via phone calls or other forms of voice communication.
AI voice cloning
AI voice cloning uses machine learning to generate synthetic speech that sounds like a specific person. Attackers start by collecting short audio samples from public talks, interviews, webinars, or podcasts. These samples are processed into spectrograms, which map frequency and amplitude over time. Learning models such as encoder-decoder architectures or diffusion-based systems learn the patterns that make a voice unique, including pitch, tone, cadence, and filler words.
Once trained, the system can produce speech in that voice using any text input. Some platforms can even generate speech live, allowing attackers to hold two-way conversations that adapt in real time. The output does not need to be perfect. Telephone audio is compressed, and minor artifacts are hidden by static, background noise, or poor connections. To a listener who expects to hear a familiar voice, the result is convincing enough.
How attackers exploit AI voice cloning and trust in vishing campaigns
Vishing relies on manipulating human trust during a phone call. Attackers spoof caller ID so that the number matches a known contact, if not one belonging to their spoofed identity. They prepare a plausible pretext, such as a lost phone, urgent travel, or needing an emergency funds transfer. The cloned voice provides authenticity that bypasses natural skepticism.
Unlike email phishing, vishing is effective because it's a real-time conversation with little time between interactions. People are taught to look for obvious warning signs, such as bad grammar, strange formatting, or suspicious links. Now, AI has made those signs hard to find. With voice communication like phone calls, there is no suspicious email header to look at.
The attackers’ goals vary but typically include resetting passwords, enrolling new MFA devices, changing banking details, or persuading staff to install remote support tools. Because the voice sounds real, the victim’s instinct to verify is heightened. In business contexts, the trust attached to authority figures like CEOs or government officials makes this method particularly powerful.
What makes vishing hard to defend against in a corporate setting is that most companies spend more time training employees to defend against email phishing than malicious phone calls.
Since vishing is a live interaction, it's harder for employees to know when something feels wrong. Attackers also learn during the phone call itself: a support agent's tone, wording, pauses, hesitations, or even breathing can give away uncertainty. This allows the attacker to adjust their approach. Support staff are especially targeted because their job is to fix and help people.
Level 1 staff are the first people the attacker will interact with, and their access is limited, but they can provide the attacker with those who will have more access to sensitive systems for the next call. And they have the least training and the lowest pay compared to other levels of support. This makes level 1 support staff more vulnerable to manipulation or bribery.
Levels 2 and 3 have greater access and control and receive more training and higher pay than Level 1. However, they are still vulnerable to social engineering because they see fewer people and complex problems and are willing to trust a caller because the caller has been vetted by level 1 staff. If the level 1 support staff is already helping the attacker, this stage becomes easier to carry out.
Neuroscience shows that voice recognition activates areas of the brain tied to memory and social bonding. Attackers exploit this cognitive shortcut. Authority figures command attention, and urgency reduces critical thinking.
Behavioral economics research shows that when immediacy and authority cues are combined, error rates in decision-making rise sharply. A cloned voice mimicking a CEO during a crisis call creates the exact recipe for becoming a victim.
Corporate culture adds another layer. Many organizations expect employees to move quickly when executives request action. A culture of compliance can discourage questioning or delay. Attackers know this and design phone call scripts that make hesitation feel like insubordination.
By combining the psychology of trust with organizational norms, vishing becomes highly effective.
The rise of vishing attacks
Voice-based manipulation has a long lineage.
In the 1970s, phone phreakers like John Draper discovered that a toy whistle could generate the 2600 Hz tone that controlled telephone switches. They used this trick, combined with social engineering, to gain free long-distance calls. Through the 1980s and 1990s, criminals exploited calling cards and private branch exchange systems, often persuading operators to route calls under false pretenses.
The 2000s saw vishing emerge through VoIP, where caller ID spoofing became trivial. Attackers would impersonate banks or telecom providers, urging customers to enter credentials into automated systems. These scams succeeded because they combined technical loopholes with human trust in familiar voices or institutions.
How AI removed the realism and cost barrier
Artificial intelligence removed two major constraints: realism and cost. No longer does an attacker need strong acting skills or long recordings. A 30-second clip can be turned into a reusable voice clone. Tools that once required advanced research labs are now consumer apps or public APIs. The result is that vishing, once rare and manual, has become scalable and repeatable thanks to AI voice cloning.
Growth of cheap, and accessible voice cloning tools
Voice cloning tools exist across a spectrum:
- Consumer apps that produce passable voices from a short sample.
- API-accessible platforms that allow bulk scripted generation.
- Open-source models that can be fine-tuned on personal hardware or cloud servers.
- Underground marketplaces that sell pre-made clones of executives or celebrities.
Pricing is low. For a few dollars, an attacker can access tools capable of deceiving all but the most careful listeners. Underground services may charge more for custom clones, but costs rarely exceed a few hundred dollars. Compared to the potential payoff of a successful fraud, the investment is negligible.
Real-world incidents and sectors at risk
The Italian case is only one example.
In finance, attackers have used cloned voices to authorize urgent transfers, often paired with spoofed texts or emails for confirmation. In healthcare, calls to reset portal credentials have given attackers rapid access to patient data.
In IT environments, help desks have been tricked into resetting passwords or enrolling new MFA devices, creating footholds that attackers use to pivot deeper into networks.
How easy is AI voice cloning?
- Data needed: Often just 30 seconds of speech from a public source
- Consumer, open-source, and underground tools: Widely available
- Cost to attackers: Negligible compared to potential payouts
Attackers do not need specialized expertise. Many use legitimate AI platforms with prepaid cards and disposable accounts. Others download open-source models that can run on systems with consumer GPUs. Underground vendors advertise ready-to-use clones, often marketed as “celebrity” or “executive” voices, payable in cryptocurrency.
Common tools in a threat actor’s arsenal include text-to-speech APIs, open-source cloning frameworks, caller ID spoofing services, VoIP softphones, PBX hosting services, and data scrapers that harvest audio from public sources.
Combined with prewritten libraries of pretexts and rebuttals, these tools form a complete kit for pulling off convincing phone fraud.
Financial fraud and wire transfer scams
The most direct impact is financial loss. A single call can move millions of dollars.
The FBI’s Internet Crime Complaint Center (IC3) reports billions in losses annually from business email compromise, and AI voice cloning is already being used to upgrade these schemes. Insurers now demand stronger voice verification processes before covering fraud losses.
Data exfiltration through help desk social engineering
Help desks hold the keys to user accounts. An account reset triggered by a cloned voice can lead to compromised email, file shares, and any applications linked to single sign-on.
From there, attackers can request additional approvals, spread malware, or exfiltrate sensitive data. Compared to phishing campaigns that target thousands of inboxes, one successful vishing call can yield faster, more valuable access.
Reputation and compliance risks
Beyond direct losses, voice-based fraud damages trust. Customers, partners, and regulators view organizations as careless if they cannot control their identity verification processes.
Breach notifications trigger reputational damage, while regulatory frameworks such as GDPR, HIPAA, and PCI DSS impose fines for mishandled data. Legal disputes also emerge over “who approved what” when recordings cannot be trusted. Without strong verification and logging, these cases are difficult to resolve.
Common vishing techniques
- Pretexting is when the attacker creates a fabricated scenario using data gathered from previous phone calls with the target or through OSINT (Open Source Intelligence). The more data the attacker has, the better the pretext scenario will be.
- Voice changers are an important tool for attackers when making many phone calls, as they can help mask the fact that a single attacker is gaining information about how the company operates. An example is a male attacker pretending to be a pregnant female to create some empathy in a support employee to give the attacker more information.
- Soundboards help attackers create a realistic-sounding scenario during a Vishing call, such as adding the sound of a plane's engine to make the caller sound like a high-level executive at an airport and imply that the caller needs something right before takeoff.
- Rotating through people, most think attackers make only one phone call to gain access to a company. That doesn't happen in most cases. The attacker will keep calling until they find someone more susceptible. This technique is particularly effective compared with the previous techniques, which is why vishing is so successful.
- Bribing: In the ideal scenario, people would have a moral code of conduct, be paid enough, and be happy with their jobs, but in real life, that is not always the case. For attackers, it is easier to buy access rather than steal it from a company employee. A real-world example is the Coinbase data breach in 2025, in which an overseas support personnel member was bribed into granting access to Coinbase data. (Protecting Our Customers - Standing Up to Extortionists)
- AI assists with AI introductions has helped everyone, like bringing actors back from the dead for your favorite character, but that means it can also help attackers create different people when calling a company. Things that AI helps with in vishing could be completely automated, such as the entire attack script, but in most cases, attackers use AI to adjust their tone, refine wording, or improve scripts to make them sound more natural, persuasive, and believable.
Why traditional defenses do not stop vishing or AI voice cloning attacks
Antivirus and endpoint detection tools focus on malware and code execution. Vishing is social engineering through legitimate channels.
By the time malware appears, the attacker already has account access. If the process itself treats phone calls as trusted, traditional endpoint security arrives too late.
Use MFA with voice-independent verification
Do not enroll or reset MFA through a phone call. Verification should occur through secure portals, SSO push notifications, or ticketing systems. Hardware keys and other phishing-resistant methods reduce exposure.
Any re-enrollment should require in-person or video verification with known staff.
Human verification steps for high-risk requests
Always call back using phone numbers from the corporate directory. Require unique case numbers or callback PINs generated by the ticketing system.
Enforce two-person approval for financial transfers or privilege changes. Record calls and attach them to tickets for audit trails.
Security awareness training for help desk and finance teams
Train staff to expect AI-cloned voices and to slow down during urgent requests. Provide scripts that mirror attacker tactics such as travel delays or executive pressure. Run tabletop exercises and live simulations.
Publish a clear rule: no reset or financial movement is ever approved on a single call or through a single person.
How ThreatLocker reduces risk of vishing attacks
The ThreatLocker security stack provides layered defenses that break the attack chain.
Application Allowlisting ensures that only approved software runs. If an attacker persuades a user to install a remote support tool, it is blocked by default.
Ringfencing™ limits what trusted apps can do. Even if a session is opened, it cannot be used to access files or launch other apps without authorization.
Zero Trust endpoint firewall acts as a host-based firewall. It segments endpoints and servers, preventing lateral movement once a single account is compromised.
Data storage access control restricts access to data across local folders, shares, USB devices, and cloud storage. Even with stolen credentials, attackers cannot reach sensitive files without policy approval.
Privileged access management grants privileges to applications rather than users. Attackers cannot trick employees into running tasks with admin rights if elevation is not tied to the user account.
Centralized configuration management enforces security baselines, such as disabling unused services and enforcing lockouts. These measures reduce the surface area available to attackers.
EDR real-time threat detection provide real-time monitoring of endpoints and Microsoft 365 environments. Suspicious logins, risky behavior, and MFA events trigger alerts before damage escalates.
Cyber Hero MDR connects organizations to a live response team that validates and neutralizes threats quickly. Rapid human intervention reduces the window of exposure.
By combining these controls, organizations can ensure that even if a cloned voice persuades someone to act, that action is prevented from leading to compromise.
What comes next for AI voice attacks
The next wave will be more sophisticated. Attackers are experimenting with live AI rebuttals, where the system answers verification questions in real time. Deepfake video combined with voice will allow realistic live video calls.
Defensive technologies are emerging as well.
Researchers are developing watermarking systems that embed identifiers in synthetic speech, though adoption is not widespread. Legal and regulatory systems are also evolving. Courts and compliance frameworks will need to address whether voice recordings remain admissible evidence while cloning remains widespread.



