GoFuckYourself.com - Adult Webmaster Forum

GoFuckYourself.com - Adult Webmaster Forum (https://gfy.com/index.php)
-   Fucking Around & Business Discussion (https://gfy.com/forumdisplay.php?f=26)
-   -   how do I do this... (https://gfy.com/showthread.php?t=1133948)

Jel 02-19-2014 09:37 AM

how do I do this...
 
text file 1 has 5000 unique lines, text file 2 has 500 unique lines

I want to remove each *entire line* containing the data on text file 2, from file 1, leaving me 4500 lines

!I don't know php!

!I barely know excel (well, open office) but if it's doable in that I can follow simpleton instructions :) !

Ideally an online place similar to textmechanic.com, where I load one, load the other, and hey presto.

thanks in advance, and I might even thank you again afterwards for double the fun :thumbsup

klinton 02-19-2014 09:47 AM

how about simple file splitting ?

like for 10 parts ? it gives you 500 lines which u can extract later

DamianJ 02-19-2014 09:58 AM

fdupes -r /some/directory/path > /some/directory/path/fdupes.log

or if you want a GUI:

http://download.cnet.com/Remove-Dele...-75741543.html

Google is awesome.

robwod 02-19-2014 10:02 AM

If I understand correctly, you want to simply look up all the data in list #2 to find a match on list #1, then remove that record?

A slightly inefficient method would be to use a single Excel workbook, then place list #1 in one sheet (tab) and sheet two in another. Give the data in sheet a range name and then using @vlookup in an adjacent empty cell on the first line of list #1 to perform a lookup. Se the condition that if a Match is found, then insert "Y" into that cell.

At that point, drag the vlookup formula down the column to apply it to all records in the first list. Then you can simply sort the list #1 by the vlookup column, and highlight/delete all the ones with a "Y" on them.

Someone else can likely provide a more efficient method, but this should work if you have exact match records. Use F1 inside Excel for help on using the vlookup function or a simple Google query will show you all sorts of usage examples.

ETA: see? I knew someone else would post something more efficient :D

Jel 02-19-2014 10:04 AM

Quote:

Originally Posted by robwod (Post 19988295)
If I understand correctly, you want to simply look up all the data in list #2 to find a match on list #1, then remove that record?

correct mate :thumbsup

Quote:

Originally Posted by robwod (Post 19988295)
A slightly inefficient method would be to use a single Excel workbook, then place list #1 in one sheet (tab) and sheet two in another. Give the data in sheet a range name and then using @vlookup in an adjacent empty cell on the first line of list #1 to perform a lookup. Se the condition that if a Match is found, then insert "Y" into that cell.

At that point, drag the vlookup formula down the column to apply it to all records in the first list. Then you can simply sort the list #1 by the vlookup column, and highlight/delete all the ones with a "Y" on them.

Someone else can likely provide a more efficient method, but this should work if you have exact match records. Use F1 inside Excel for help on using the vlookup function or a simple Google query will show you all sorts of usage examples.

I'm praying there's an easier way to do this :D

DamianJ 02-19-2014 12:34 PM

Quote:

Originally Posted by Jel (Post 19988299)
I'm praying there's an easier way to do this :D

I've already posted two.

Dur.

Jel 02-19-2014 01:07 PM

gaymian - your GUI suggestion doesn't do what I need it to (nor is it a full version).

Barry-xlovecam 02-19-2014 03:28 PM

Quote:

~:$cd /home/user/directory-where-the-files-are/
/home/user/directory-where-the-files-are/:$cat file1 file2 > bigfile.csv
/home/user/directory-where-the-files-are/:$sort | bigfile.csv uniq > sortedfile.csv
Use your LINUX webserver in SSH or a LINUX computer terminal for this.

Google has some excel solutions (for the point an click crowd)...
https://www.google.com/search?q=remo...la8&oe=ut f-8

Quote:

wc -l bigfile.csv
47784 bigfile.csv
wc -l sortfile.csv
29466 sortfile.csv

My 'bigfile.csv' has 47784 lines. Sorting out -18318 duplicate lines took less than 2 seconds -- touch that Excel -- eat my dust! :1orglaugh:1orglaugh

myleene 02-19-2014 03:35 PM

Notepad++ can do this in 5 seconds with TextFX.

http://stackoverflow.com/questions/3...ows-in-notepad

(*) Lines will have to be sorted either ascending or descending though if you use the first method.


Otherwise... Use a regex.

stoka 02-19-2014 03:36 PM

useful stuff :thumbsup

OneHungLo 02-19-2014 03:57 PM

http://textmechanic.com/Delimited-Column-Extractor.html

edit...I saw that you tried textmechanic.com...Are you sure you can't do it there?

Seth Manson 02-19-2014 04:40 PM

I would use UltraCompare. It's part of the UltraEdit suite. Pretty sure you can get a free trial.

Jel 02-19-2014 05:22 PM

regex, wc>@blah etc - waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay over my head...

onehunglo - don't think it does what I want - the lines on files 1 & 2 aren't the same, so aren't duplicates as such.

eg file 1 consists of:

line1: info data blahblah OH-LOOK-THIS-PHRASE-1 etc etc more data on this line
line2: info data blahblah OH-LOOK-THIS-PHRASE-2 etc etc more data on this line
line3: info data blahblah OH-LOOK-THIS-PHRASE-3 etc etc more data on this line
--
line5000: info data blahblah OH-LOOK-THIS-PHRASE-5000 etc etc more data on this line

file 2 consists of:
line1: OH-LOOK-THIS-PHRASE-1
line2: OH-LOOK-THIS-PHRASE-2
line3: OH-LOOK-THIS-PHRASE-3
--
line450: OH-LOOK-THIS-PHRASE-450

what I need:
input file 1, input file 2, and have it seek file 1 for OH-LOOK-THIS-PHRASE-X, and remove that entire line.

Jel 02-20-2014 05:30 AM

bump for this shift in case there's an easy way while I wait for my coder bloke to get back from holiday (and try ultracompare)

DamianJ 02-20-2014 05:41 AM

Quote:

Originally Posted by Jel (Post 19988562)
gaymian

Oh that is genius! You took the first bit of my name and changed it to GAY!

HiLARious!

Jel 02-20-2014 07:03 AM

almost as funny as you following dvt around and using:

bummer boy
spacker
likes bumming men
looking for men to bum
lollington
ad hominem
now I know why people got the hump with me and markham
etc

over and over and over and over. Why don't you fuck off out of this biz thread wankstain, it's annoying keep having to 'view post' when you post in my biz threads.

You're a fucking cock, you don't like me, I don't like you, so fuck off. Or don't, and go look up the word provocation, and how it's a viable defence in assault cases.

Cue 'wahwah' remarks, 'oh are you THREATENING me???' variations, 'hmm, bannable offence' bollocks, or any other of your predictable and tedious as fuck posts. don't forget the grammer mistaikes as well, mr im-too-stupid-to-realise-i-act-like-the-precocious-11-year-old-kid-that-thinks-he-is-funny-but-everyone-thinks-is-an-out-and-out-cunt :thumbsup

pinkz 02-20-2014 07:56 AM

http://textmechanic.com/Big-File-Too...ate-Lines.html

Jel 02-20-2014 08:04 AM

they aren't duplicate lines - see post #13 :)

still to fire up ultracompare to see if that does the trick


All times are GMT -7. The time now is 01:21 AM.

Powered by vBulletin® Version 3.8.8
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
©2000-, AI Media Network Inc123