Find the answer to your Linux question:
Page 2 of 2 FirstFirst 1 2
Results 11 to 12 of 12
hello dear Lakshmipathi, hello dear dochop first o all many thanks for the quick reply and all the great words,. i think you both are right -doing it stepwise is ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #11
    Linux User
    Join Date
    May 2013
    Posts
    258

    hello dear Lakshmipathi, hello dear dochop


    first o all many thanks for the quick reply and all the great words,.


    i think you both are right -doing it stepwise is the best way.
    learing with a cookbook allways thought me more than doing it with other drill-patterns. so i think your way is great...


    i did it like you told me... and startet python...

    before i did this - i created a db called cpan. this db exists... and then the following happened.


    Code:
    martin@linux-70ce:~/perl> python
    Python 2.7.6 (default, Nov 21 2013, 15:55:38) [GCC] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> 
    >>> 
    >>> import urllib
    >>> 
    >>> import urlparse
    >>> 
    >>> import re
    >>> 
    >>> import MySQLdb
    >>> 
    >>> 
    >>> db = MySQLdb.connect(host="localhost", # your host, usually localhost
    ...                      user="root", # your username
    ...                       passwd="mypasswd", # your password
    ...                       db="cpan") # name of the data base
    Traceback (most recent call last):
      File "<stdin>", line 4, in <module>
      File "/usr/lib/python2.7/site-packages/MySQLdb/__init__.py", line 81, in Connect
        return Connection(*args, **kwargs)
      File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 187, in __init__
        super(Connection, self).__init__(*args, **kwargs2)
    _mysql_exceptions.OperationalError: (1049, "Unknown database 'cpan'")
    >>> 
    >>> db = MySQLdb.connect(host="localhost", # your host, usually localhost
    ...                      user="root", # your username
    ...                       passwd="rimbaud", # your password
    ...                       db="cpan") # name of the data base
    >>> 
    >>> 
    >>> url = "http://search.cpan.org/author/?W"
    >>> html = urllib.urlopen(url).read()
    >>> for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    ...     alk = urlparse.urljoin(url, lk)
    ... 
    >>>     data = { 'url':alk, 'name':name, 'cname':capname }
      File "<stdin>", line 1
        data = { 'url':alk, 'name':name, 'cname':capname }
        ^
    IndentationError: unexpected indent
    >>> 
    >>>     phtml = urllib.urlopen(alk).read()
      File "<stdin>", line 1
        phtml = urllib.urlopen(alk).read()
        ^
    IndentationError: unexpected indent
    >>>     memail = re.search('<a href="mailto:(.*?)">', phtml)
      File "<stdin>", line 1
        memail = re.search('<a href="mailto:(.*?)">', phtml)
        ^
    IndentationError: unexpected indent
    >>>     if memail:
      File "<stdin>", line 1
        if memail:
        ^
    IndentationError: unexpected indent
    >>>         data['email'] = memail.group(1)
      File "<stdin>", line 1
        data['email'] = memail.group(1)
        ^
    IndentationError: unexpected indent
    >>> 
    >>> 
    >>> # Use all the SQL you like
    ... cur.execute("SELECT * FROM YOUR_TABLE_NAME")
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
    NameError: name 'cur' is not defined
    >>> 
    >>> # print all the first cell of all the rows
    ... for row in cur.fetchall() :
    ...     print row[0]
    ...     url = "http://search.cpan.org/author/?W"
    ... html = urllib.urlopen(url).read()
      File "<stdin>", line 5
        html = urllib.urlopen(url).read()
           ^
    SyntaxError: invalid syntax
    >>> for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
    ... alk = urlparse.urljoin(url, lk)
      File "<stdin>", line 2
        alk = urlparse.urljoin(url, lk)
          ^
    IndentationError: expected an indented block
    >>> 
    >>> data = { 'url':alk, 'name':name, 'cname':capname }
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'alk' is not defined
    >>> phtml = urllib.urlopen(alk).read()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'alk' is not defined
    >>> memail = re.search('<a href="mailto:(.*?)">', phtml)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'phtml' is not defined
    >>> if memail:
    ... data['email'] = memail.group(1)
      File "<stdin>", line 2
        data['email'] = memail.group(1)
           ^
    IndentationError: expected an indented block
    >>> 
    >>> # Use all the SQL you like
    ... cur.execute("SELECT * FROM YOUR_TABLE_NAME")
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
    NameError: name 'cur' is not defined
    >>> 
    >>> # print all the first cell of all the rows
    ... for row in cur.fetchall() :
    ... print row[0]
      File "<stdin>", line 3
        print row[0]
            ^
    IndentationError: expected an indented block
    >>> 
    >>>

    conclusion: i guess that the connection to the db works just fine.


    well i suppose that the following code - fetching and parsing the data from cpan does not work correctly... What do you say?`

    guess that there were soime things missing . i will digg deeper and will try to figure it out...

    i will come back here later the weekend - and report all the findings. if any body has an idea - i would be happy.
    Akoya P 6512 15" OpenSuse 13.1: AMD Athlon X2 P320
    Samsunng q 210, 12,1" OpenSuse 13.1: Intel® Core™ 2 Duo Proz. P8400 2,26 GHz 1066 MHz FSB 3 MB

  2. #12
    Linux User
    Join Date
    May 2013
    Posts
    258
    hello - at them moment some things go wrong here
    i have to find out what is happeining - and why the script fails to load the data into the db....

    see the follwoing


    Code:
    import urllib
    import urlparse
    import re
    
    url = "http://search.cpan.org/author/?W"
    html = urllib.urlopen(url).read()
    for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
        alk = urlparse.urljoin(url, lk)
    
        data = { 'url':alk, 'name':name, 'cname':capname }
    
        phtml = urllib.urlopen(alk).read()
        memail = re.search('<a href="mailto:(.*?)">', phtml)
        if memail:
            data['email'] = memail.group(1)
    
        print data
    This is what I got before stopping it via Ctrl+C Code: - THIS IS obviously a python dictionary

    $ python printer.py
    {'url': 'http://search.cpan.org/~wac/', 'cname': 'WAC', 'name': 'Wang Aocheng', 'email': 'wangaocheng%40hotmail.com'}
    {'url': 'http://search.cpan.org/~wade/', 'cname': 'WADE', 'name': 'James Wade', 'email': 'CENSORED'}



    note: the database on the opensuse 13.1 mysql db shows the following struckure:
    Code:
    --
    -- Tabellenstruktur für Tabelle `cname`
    --
    
    CREATE TABLE IF NOT EXISTS `cname` (
      `cname` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    -- --------------------------------------------------------
    
    --
    -- Tabellenstruktur für Tabelle `name`
    --
    
    CREATE TABLE IF NOT EXISTS `name` (
      `url` int(11) NOT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    -- --------------------------------------------------------
    
    --
    -- Tabellenstruktur für Tabelle `url`
    --
    
    CREATE TABLE IF NOT EXISTS `url` (
      `url` int(11) DEFAULT NULL,
      `name` int(11) DEFAULT NULL,
      `cname` int(11) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    
    /*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
    /*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
    /*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTIO

    and the script is like that:


    Code:
    import urllib
    import urlparse
    import re
    import MySQLdb
    
    
    db = MySQLdb.connect(host="localhost", # your host, usually localhost
                         user="root", # your username
                          passwd="rimbaud", # your password
                          db="cpan") # name of the data base
    
    # you must create a Cursor object. It will let
    #  you execute all the queries you need
    cur = db.cursor() 
    
    
    url = "http://search.cpan.org/author/?W"
    html = urllib.urlopen(url).read()
    for lk, capname, name in re.findall('<a href="(/~.*?/)"><b>(.*?)</b></a><br/><small>(.*?)</small>', html):
        alk = urlparse.urljoin(url, lk)
    
        data = { 'url':alk, 'name':name, 'cname':capname }
    
        phtml = urllib.urlopen(alk).read()
        memail = re.search('<a href="mailto:(.*?)">', phtml)
        if memail:
            data['email'] = memail.group(1)
    
    
    # Use all the SQL you like
    cur.execute("SELECT * FROM YOUR_TABLE_NAME")
    
    # print all the first cell of all the rows
    for row in cur.fetchall() :
        print row[0]

    but unfortunatly i get the following errors:
    that let me think that ihave some database-errors:

    < but wait: the tables and the database exist - (see above)
    Code:
        
        
        martin@linux-70ce:~/perl> python cpan2.py
    Traceback (most recent call last):
      File "cpan2.py", line 34, in <module>
        cur.execute("SELECT * FROM YOUR_TABLE_NAME")
      File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
        self.errorhandler(self, exc, value)
      File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
        raise errorclass, errorvalue
    _mysql_exceptions.ProgrammingError: (1146, "Table 'cpan.YOUR_TABLE_NAME' doesn't exist")
    martin@linux-70ce:~/perl> python cpan2.py
    Traceback (most recent call last):
      File "cpan2.py", line 34, in <module>
        cur.execute("SELECT * FROM YOUR_TABLE_NAME")
      File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
        self.errorhandler(self, exc, value)
      File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
        raise errorclass, errorvalue
    _mysql_exceptions.ProgrammingError: (1146, "Table 'cpan.YOUR_TABLE_NAME' doesn't exist")
    martin@linux-70ce:~/perl> python cpan2.py
    Traceback (most recent call last):
      File "cpan2.py", line 34, in <module>
        cur.execute("SELECT * FROM YOUR_TABLE_NAME")
      File "/usr/lib/python2.7/site-packages/MySQLdb/cursors.py", line 174, in execute
        self.errorhandler(self, exc, value)
      File "/usr/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
        raise errorclass, errorvalue
    _mysql_exceptions.ProgrammingError: (1146, "Table 'cpan.YOUR_TABLE_NAME' doesn't exist")
    martin@linux-70ce:~/perl>
    i have to investigate what goes wrong here
    Last edited by sayhello; 07-05-2014 at 09:19 PM.
    Akoya P 6512 15" OpenSuse 13.1: AMD Athlon X2 P320
    Samsunng q 210, 12,1" OpenSuse 13.1: Intel® Core™ 2 Duo Proz. P8400 2,26 GHz 1066 MHz FSB 3 MB

Page 2 of 2 FirstFirst 1 2

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •