Waraxe IT Security Portal
Login or Register
November 24, 2024
Menu
Home
Logout
Discussions
Forums
Members List
IRC chat
Tools
Base64 coder
MD5 hash
CRC32 checksum
ROT13 coder
SHA-1 hash
URL-decoder
Sql Char Encoder
Affiliates
y3dips ITsec
Md5 Cracker
User Manuals
AlbumNow
Content
Content
Sections
FAQ
Top
Info
Feedback
Recommend Us
Search
Journal
Your Account
User Info
Welcome, Anonymous
Nickname
Password
(Register)

Membership:
Latest: MichaelSnaRe
New Today: 0
New Yesterday: 0
Overall: 9144

People Online:
Visitors: 38
Members: 0
Total: 38
Full disclosure
APPLE-SA-11-19-2024-5 macOS Sequoia 15.1.1
Local Privilege Escalations in needrestart
APPLE-SA-11-19-2024-4 iOS 17.7.2 and iPadOS 17.7.2
APPLE-SA-11-19-2024-3 iOS 18.1.1 and iPadOS 18.1.1
APPLE-SA-11-19-2024-2 visionOS 2.1.1
APPLE-SA-11-19-2024-1 Safari 18.1.1
Reflected XSS - fronsetiav1.1
XXE OOB - fronsetiav1.1
St. Poelten UAS | Path Traversal in Korenix JetPort 5601
St. Poelten UAS | Multiple Stored Cross-Site Scripting in SEH utnserver Pro
Apple web content filter bypass allows unrestricted access to blocked content (macOS/iOS/iPadOS/visionO S/watchOS)
SEC Consult SA-20241112-0 :: Multiple vulnerabilities in Siemens Energy Omnivise T3000 (CVE-2024-38876, CVE-2024-38877, CVE-2024-38878, CVE-2024-38879)
Security issue in the TX Text Control .NET Server for ASP.NET.
SEC Consult SA-20241107-0 :: Multiple Vulnerabilities in HASOMED Elefant and Elefant Software Updater
Unsafe eval() in TestRail CLI
Log in Register Forum FAQ Memberlist Search
IT Security and Insecurity Portal

www.waraxe.us Forum Index -> Wordlists -> a problem with removing duplicates from wordlists
Post new topicReply to topic View previous topic :: View next topic
a problem with removing duplicates from wordlists
PostPosted: Fri Jun 13, 2008 4:40 am Reply with quote
earthquaker
Advanced user
Advanced user
Joined: Jun 02, 2008
Posts: 111
Location: q8




hey, i am having a problem with removing duplicates from my wordlists as i have got lists that are over 2 gb so i tried couple of softwares but they suddenly crash!

does any one know how can i remove duplicates from such large files because i have a total of 46Gb wordlists and im willing to share them but its hard to share huge amount of wordlists with alot of duplicates

thanks
View user's profile Send private message
PostPosted: Tue Sep 23, 2008 8:37 pm Reply with quote
stereoa
Beginner
Beginner
Joined: Sep 23, 2008
Posts: 4




usort?
View user's profile Send private message
PostPosted: Tue Sep 23, 2008 8:39 pm Reply with quote
waraxe
Site admin
Site admin
Joined: May 11, 2004
Posts: 2407
Location: Estonia, Tartu




I am using php/mysql for wordlist storage and compilation, but thats just because php is my favourite tool Smile
View user's profile Send private message Send e-mail Visit poster's website
PostPosted: Thu Nov 06, 2008 12:48 am Reply with quote
Sm0ke
Moderator
Moderator
Joined: Nov 25, 2006
Posts: 141
Location: Finland




Use PasswordsPro dictionary sorting its fast for big wordlists.
View user's profile Send private message
PostPosted: Mon Feb 09, 2009 4:33 pm Reply with quote
Baston
Regular user
Regular user
Joined: Dec 16, 2008
Posts: 17




I had the same problem and i've written a little script in perl to split my wl in many files based on the first char of the line.
With that, you can then dedup every file and you are sure that when you add them back, you won't have any dupe ...

Code:

#!/usr/bin/perl

my $dir = 'alpha';
my $pre = 'singles-';
my $post = '.txt';
my @chars = ('A' .. 'Z', 0 .. 9);
my %files;

## Make the dir
if (!-d $dir) {
mkdir $dir or die "Cannot create $dir\n";
}

## Open files
for my $char ( @chars ) {
my $file = $pre.$char.$post;
open $files{$char}, '>>', "$dir/$file" or die "CanĀ“t open $file: $!\n";
}
open (OUTOTHER, ">> $dir/$pre"."!"."$post") or die "Cannot open output file $dir/$pre.!.$post\n";

## Processing files
my $infile;
foreach my $param (@ARGV) {
print "processing $param \n";
open (INFILE, "<$param") or die "Cannot open input file $infile\n";
while (<INFILE>) {
my $line=$_;
chomp($line);
my $start = uc(substr($line,0,1)); #Returns the first char of the line uppercase

if ($start =~ /[A-Z0-9]/) {
print {$files{$start}} "$line\n";
}
else {
print OUTOTHER $line,"\n";
}
}
close(INFILE);
print "$param processed\n";
}

## Closing files
close $_ foreach values %files;
close(OUTOTHER);
View user's profile Send private message
PostPosted: Tue Aug 25, 2009 6:00 am Reply with quote
Mooka91
Advanced user
Advanced user
Joined: Aug 15, 2009
Posts: 73




http://hashkiller.com/files/downloads/wordlist-tools/

Take your pick, Plenty of Dupe removers there
View user's profile Send private message
a problem with removing duplicates from wordlists
www.waraxe.us Forum Index -> Wordlists
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
All times are GMT
Page 1 of 1

Post new topicReply to topic


Powered by phpBB © 2001-2008 phpBB Group



Space Raider game for Android, free download - Space Raider gameplay video - Zone Raider mobile games
All logos and trademarks in this site are property of their respective owner. The comments and posts are property of their posters, all the rest (c) 2004-2024 Janek Vind "waraxe"
Page Generation: 0.035 Seconds