There are a several ways to go about this.
Unless you are just making a raw index, you'd need pretty hefty bandwidth and hardware. You would want to cache pages, to compare them for updates, you'd also want some kind of cross words index which requires fast SQL. You also need a crawler unless you have some other way of obtaining an index.
Lucene, part of the Apache project now.
http://lucene.apache.org/java/docs/index.html
It's all Java but you need to supply your own crawler agents. Hardware requirements pretty much depend how you scale it. Typically you'd build your crawlers on seperate boxes to the index, however if you use tunnels, you can run everything from one box and just farm out the jobs to specific boxes by rerouting traffic using iptables.
http://webglimpse.net/
Webglimpse is good, but it's not free and you really need to know what you are doing, this is probably better as a document management solution or perhaps data mining. "The search engine (written in C) and webglimpse is the spider and indexer (primarily in Perl)". So it would be handy to know perl as crawler agents rarely do what you want out of the box.
Zebra, which is a tool used by many search engine researchers, is free to use, source is available and can handle huge databases.
https://www.indexdata.com/zebra
There are other options if you have Java or C# development capability on a moderate scale. I can provide more options if none I have provided suit you.
It's a really big subject, without knowing exactly what you want to achieve it's hard to say what tool set you should be looking at.
The biggest barrier to entry here is the hardware you will need to do this, working out an optimal configuration is difficult, however if you are wanting to cut down on resources and be able to scale, I would suggest having a main server to handle traffic in / out of your search platform, then redirect particular types of traffic to slave servers not visible to the internet, eg: one to crawl, one to index, one to run your database, a NAS for your disk storage - which you will need alot of, terrabytes just to start with.