Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: The Most Suitable Language To Build a Search Engine

  1. #1
    Join Date
    Nov 2009
    Location
    Egypt
    Beans
    51
    Distro
    Ubuntu 12.04 Precise Pangolin

    The Most Suitable Language To Build a Search Engine

    I'm begging a startup soon. It's a search engine that searches for some stuff over social networks and do some statistics on it. I need some advice from you guys for picking the most suitable environment for my startup.

    1- Is the language considered the bottleneck in such case, or I can pick any language of my choice and it doesn't matter ?

    2- What criteria should I consider in selecting a programming language? Should I prefer performance or productivity. In other words, I've heard that Python minimizes the coding time a lot, but It lacks performance. However, Java is a good choice but it may takes hours to do something that could be done in minutes in Python.

    Thanks very much guys

  2. #2
    Join Date
    Apr 2011
    Location
    La La Land
    Beans
    221

    Re: The Most Suitable Language To Build a Search Engine

    I find that python is great for most things. What you would need is two functions. One that uses urllib to open a page and uses re to search for keywords or certain text in a certain div, and one that looks for all the links on a page and stores them in a list. Then you would loop through the list and perform these functions on them. What you would do is something like this...

    Code:
    import urllib
    import re
    
    def GetLinks(webpage)
      try:
        f = urllib.URLopener().open(webpage)
        content = f.read()
    
    #find links and store them in a list
    
    linklist.append(link)
    
      except IOError:
        pass
    
    
    
    def GetContent(webpage):
      try:
        f = urllib.URLopener().open(webpage)
        content = f.read()
        
        #use re to find what you want in the variable content
    
      except IOError:
        pass
    
    
    GetLinks(page)
    GetContent(page)
    for item in linklist:
      GetLinks(item)
      Get Content(item)
    
    #You can define global vars in the functions and act on those.
    #You could write them to files or whatever you want.
    Last edited by CoffeeRain; January 3rd, 2012 at 04:16 PM.

  3. #3
    Join Date
    Dec 2009
    Location
    The Milky Way
    Beans
    243

    Re: The Most Suitable Language To Build a Search Engine

    I say use Cython. It's a superset of Python (meaning that all valid Python code is valid Cython code, but the reverse isn't true), and Cython code can be run through the Cython compiler to be compiled to straight C. So you get the flexibility of Python, and the speed of C

    Quote Originally Posted by CoffeeRain View Post
    I find that python is great for most things. What you would need is two functions. One that uses urllib to open a page and uses re to search for keywords or certain text in a certain div, and one that looks for all the links on a page and stores them in a list. Then you would loop through the list and perform these functions on them. What you would do is something like this...

    Code:
    import urllib
    import re
    
    def GetLinks(webpage)
      try:
        f = urllib.URLopener().open(webpage)
        content = f.read()
    
    #find links and store them in a list
    
    linklist.append(link)
    
      except IOError:
        pass
    
    
    
    def GetContent(webpage):
      try:
        f = urllib.URLopener().open(webpage)
        content = f.read()
        
        #use re to find what you want in the variable content
    
      except IOError:
        pass
    
    
    GetLinks(page)
    GetContent(page)
    for item in linklist:
      GetLinks(item)
      Get Content(item)
    
    #You can define global vars in the functions and act on those.
    #You could write them to files or whatever you want.
    That advice is way too low-level with respect to what the OP was asking, IHMO.
    There are 10 different kinds of people:
    Those who understand binary numbers
    Those who don't
    Free/open-source game development kit: http://openblox.sourceforge.net

  4. #4
    Join Date
    Apr 2011
    Location
    La La Land
    Beans
    221

    Re: The Most Suitable Language To Build a Search Engine

    Quote Originally Posted by DangerOnTheRanger View Post
    That advice is way too low-level with respect to what the OP was asking, IHMO.
    What do you mean? I thought that Python would be great for finding information and getting statistics from it. Could you explain some more?

  5. #5
    Join Date
    Aug 2006
    Location
    60°27'48"N 24°48'18"E
    Beans
    3,458

    Re: The Most Suitable Language To Build a Search Engine

    Sorry, but if you've got your own money on the line in this startup, I must say this just to protect your finances...

    You really just demonstrate that you don't have the competence to even begin with to build what you're seeking to build. You do not understand much about your problem, or about how it relates in any sense to any potential implementations. And we're not going to be able to answer you enough so that you'd be getting the competence to gain the better judgement.

    It's not a matter of programming language choice; it's a matter of the structure of your problem and its relationship to your chosen architecture. After that, you may start choosing platforms and languages. This stuff is very secondary. The construction of a search engine begins on a higher level of abstraction, and the languages are just tools.

    That said, people with exposure to certain kinds of programming will be more likely to recognize the problems inherent in what you're trying to accomplish, and will be able to choose the appropriate tools.
    LambdaGrok. | #ubuntu-programming on FreeNode

  6. #6
    Join Date
    Dec 2009
    Location
    The Milky Way
    Beans
    243

    Re: The Most Suitable Language To Build a Search Engine

    Quote Originally Posted by CoffeeRain View Post
    What do you mean? I thought that Python would be great for finding information and getting statistics from it. Could you explain some more?
    That example code you gave is what I was talking about. It served no real purpose, not to mention the two-function design you gave isn't flexible enough for what this guy is trying to do.
    There are 10 different kinds of people:
    Those who understand binary numbers
    Those who don't
    Free/open-source game development kit: http://openblox.sourceforge.net

  7. #7
    Join Date
    Feb 2009
    Beans
    789
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: The Most Suitable Language To Build a Search Engine

    +1 to what CptPicard said.

    Additionally, I expect a great search engine to be written in multiple languages. You'll need to develop the web-based site, the crawlers, you need to do computations related to the search index and analysis, et cetera and you're not going to do that in one language (for performance and suitability reasons). A language that is fast may not be suitable to write a website in and vice versa.
    "The eagle never lost so much time as when he submitted to learn from the crow."

  8. #8
    Join Date
    Apr 2011
    Location
    La La Land
    Beans
    221

    Re: The Most Suitable Language To Build a Search Engine

    If the OP is just using this on their own, wouldn't they be able to just get the information, store it, and then request it with a different program?

  9. #9
    Join Date
    Jan 2008
    Location
    Lausanne, Switzerland
    Beans
    341
    Distro
    Ubuntu 13.04 Raring Ringtail

    Re: The Most Suitable Language To Build a Search Engine

    You may need different environment for different purposes:

    1. For the fetching of data, you need a programming language able to connect to the social networks. Most probably, the bottleneck will be the speed of your network and the volume of data you can retrieve, so the pure performance of the programming language is secondary. I also recommend Python which is reasonably fast and which allows fast developments. What is important is also to choose a good database engine where you can store all the information. If you plan to store more than texts (like pictures, etc.) you will need to think whether you store this information in the database or as separate files which are referenced in the database. MySQL is a good choice and can handle quite large volumes but other database engines may be more fit for your purpose.

    2. For the display of data (what you will offer to your customers), you need a different environment. One of the most used is LAMP (Linux, Apache, MySQL and PHP). You can buy such hosted environment for a few dollars per month and when the revenues are good enough, you can always increase your capacity.

    Finally, my recommendation is go and do it! Whatever you use for the first release of your product/service, you will anyway redo it once you have more experience and more revenues which will allow you to use more sophisticated and performing environment.

    Good luck!

  10. #10
    Join Date
    Aug 2011
    Location
    47°9′S 126°43W
    Beans
    2,172
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: The Most Suitable Language To Build a Search Engine

    Quote Originally Posted by squenson View Post
    Most probably, the bottleneck will be the speed of your network and the volume of data you can retrieve!
    The bottleneck will be the willingness of the site to share its data... These sites are made for the money you can get from the data they hold. They won't let Joe Programmer pillage their cash cow without taking counter-measures.

    PS: +1 with CdtPicard.

Page 1 of 2 12 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •