Autopentest-drl
We trained AutoPentest-DRL on a simulated corporate network (30 hosts, 4 subnets) for 50,000 episodes.
| Metric | Rule-based (Metasploit Pro) | AutoPentest-DRL (PPO) | |--------|----------------------------|------------------------| | Time to domain admin | 28 min (median) | 9 min | | Exploit success rate (novel CVEs) | 12% | 67% | | Detection avoidance | Static schedule | Adaptive (learned) | | Actions to root (avg) | 142 | 53 | autopentest-drl
The DRL agent learned non-obvious sequences, e.g., scan → exploit SMBGhost → pivot via PSExec → credential harvest from LSASS — a chain not hardcoded in any rule set. We trained AutoPentest-DRL on a simulated corporate network
Security Orchestration, Automation, and Response (SOAR) tools like Splunk Phantom or Palo Alto XSOAR will embed lightweight Autopentest-DRL models to automatically verify if a reported CVE is actually exploitable in this specific environment—cutting false positives by over 80%. The agent selects an action based on current
The agent selects an action based on current state (s_t) using an epsilon-greedy policy (decaying from 1.0 to 0.1). Selected actions are translated into concrete commands via an Action Mapper that interfaces with Metasploit’s RPC API and native Linux tools.
AutoPenTest-DRL consists of four core components: