Say Perl

Worldwide Perl Blogging

Totally
311 feeds,
6372 posts.

Am I connected or not?

7 August, 16:56, by manu, machine translated from French

After a lengthy absence (the end of school holidays, but my thesis kept me away from this blog), I will slowly resume writing articles for it.

Until now, this blog has mainly been devoted to Perl , but the approaching school year makes me want to expand the topics covered in the courses I teach, it will remain largely computer-related issues (and more generally computerized documentation), but with a more focused information sciences.

For the resumption of this blog, I'll stay in the habits of the past, namely a little note on Perl . Today, I speak of a module which is useful when you want to perform processing related to the web. It is LWP:: Online . This module allows a very simple answer to the question "Am I being connected to the web? .

Usage is as simple as the implementation of the module as it is an import function online() , which returns a boolean value.

A simple script would look to this:

 #! / Usr / bin / env perl

use strict;
use warnings;

use LWP:: Online qw (online) # we import function online () in our script

if (online ()) (
    print "We're connected \ n";
)
else (
    print "We are not connected \ n";
)

The module also provides a function offline() which returns true when we are not connected to the Web.

Automatic discovery of RSS son

5 May, 20:27, by manu, machine translated from French

I was confronted with an interesting problem this morning. The problem was the following:

A list of sites that we wanted to have the son RSS.

In terms of solutions, there were not too many choices:

  1. do the job manually;
  2. develop a more automated.

My heart obviously looking for the automated solution. I developed a little tool to retrieve the son of an RSS site. This tool was very simple because based on the function find_feeds() module XML:: Feed . Here is version one-liner:

 perl-MXML:: Feed-MDAT:: Dump-e 'dd (XML:: Feed-> find_feeds (shift))' http://lesoir.be

I changed my script to manage a list of links, and provide in return a list of RSS son. I also added a small management statistics to know the number of sites treated and which ones do not offer RSS son (or more precisely, sites with XML:: Feed failed to retrieve the son RSS). Here's the final script:

 #! / Usr / bin / env perl

use strict;
use warnings;

use YAML;
use Getopt:: Long;
use XML:: Feed;

my $ config = ();

GetOptions ($ config, "input = s", "output = s");

_usage die () unless _Validation ($ config);

my $ sites = YAML:: LoadFile ($ config -> (input));
my @ feeds;
my @ with_feeds;
my @ no_feeds;

foreach my $ site (@ ($ site)) (
    my @ site_feeds = XML:: Feed-> find_feeds ($ site);
    if (scalar (@ site_feeds)> 0) (
        push @ feeds, @ site_feeds;
        push @ with_feeds, $ site;
    )
    else (
        push @ no_feeds, $ site;
    )
)

YAML:: DumpFile ($ config -> (output), \

Migrating databases Winisis - First Step

27 April, 11:38, by manu, machine translated from French

For several years now, I'm régulièremnt contacted by mail to communicate with the tools to migrate databases Winisis . Unfortunately, I have never taken the time to write a proper documentation, which is unfortunate. I will lay the foundations of this material on this blog, and I hope that I will take time in the future to develop a more complete documentation.

Before turning to the tools itself, here is the context in which they were developed. As part of a training course organized jointly by the Free University of Brussels (ULB) and the University Commission for Development (CUD) . I taught a course called "Integrated Library Management. The trainees in this training should conduct a project, and one of those students wanted to migrate Winisis Koha (software was used to illustrate my way). So I looked into the issue. The objective was to migrate the database of ISIS MARC21 .

Databases based on a scheme specific to each user (Winisis is not an ILS, but a DBMS, specializing in bibliographic data, certainly, but a DBMS, so it looks like more than to Microsoft Access Koha). The first step is to determine the structure of the database in order to establish a correspondence table to the desired MARC (MARC21, UNIMARC, or any other variant). Today, I see only this phase, and the tool developed for this. Other steps will follow in the coming days.

The structure of an ISIS database, called "Field Definition Table", then "table field definition" in the jargon ISIS is stored in a text file with the extension " .fdt . In analyzing this file, one can easily determine the structure of the database. Biblio:: Isis is the module of choice for manipulating ISIS databases, it provides a function to read the table definition of fields, and So the following code is based on this module:

 #! / Usr / bin / env perl

use strict;
use warnings;

use Getopt:: Long;
use File:: Find:: Rule;
use Data:: Dumper;
use Encode qw (from_to);
use Spreadsheet:: WriteExcel;

my $ config = ();

my% save_functions = (
    Excel => \ & _save_excel,
    dump => \ & _dump_struct,
)

my $ fdt_struct = ();

GetOptions ($ config, 'database = s', 'file = s', 'save = s');

if (not exists $ config -> (

Duplicate of data during a migration

26 April, 15:18, by manu, machine translated from French

When migrating to an information system to another, one is often tempted to engage in an operation to improve quality. Although this is a project in itself, this may be the "right time", provided that we have the resources to do so. Human Resources, of course, but we must also have enough time to be able to develop a methodology able to guarantee us a better quality.

This year some students have embarked on this adventure for their graduation work with the migration of library catalogs. And a special problem was quickly placed with the presence of duplicates in the lists of authority. Catalogs are excellent venues to measure the creativity of the human mind can not follow the rules ;-)

Thus, in a library catalog, we have a list of publishers for example, and depending on the creativity of cataloguers, but also the amount of people involved in the management of these publishers, we will find more or fewer variants :

  • Ed O'Reilly
  • O'Reilly;
  • O'Reilly
  • E. O'Reilly
  • etc..

The ideal would therefore be able to find any duplicates, and replace them with the correct form. But how to find all these duplicates? Without computers, the task is tedious:

  1. browse the list of authority;
  2. establish a list of duplicates;
  3. make the choice of either "correct";
  4. make all necessary changes.

Faced with such situations, my first instinct is often to determine what I can computerize, and even automate. In this case, it would be ideal to get a list of "possible duplicates", ie, expressions that close enough for us to put a flea in his ear. In this case, the computer offers several techniques:

  • an algorithm to compute the Levenshtein distance , that is to say, the number of elementary operations to move a word to a word P M, based on this algorithm, we will be able to compare each entry in the list authority with the rest of that, and keep the items which the Levenshtein distance is not important (this threshold is obviously set as the mesh size of nets), of course, this algorithm is available on CPAN: Text :: Levenshtein for a version in Perl, and Text:: LevenshteinXS for a version in C;
  • other techniques derived from the previous exist, for example using the algorithm hiding behind agrep , a grep to make approximations in the investigations to strings, there is a Perl module that reproduces this behavior: String:: Approx (this is also an XS module, thus based on C).

Thus, techniques exist, "there is more than" ... do a search on CPAN, for example with the keyword "group" or "Similarity," which allowed me must see String:: Similarity:: Group based on String:: Similarity , which is based on an algorithm which significantly different but the same techniques as explained above. In short, once again CPAN save me time and allows me to develop a prototype quickly (without this module, I set up the group creation, which is certainly not complicated, but brings a lot of reflections).

Here is the prototype in question:

 #! / Usr / bin / env perl

use strict;
use warnings

Special Edition Powerhouse Perl - Linux Magazine

15 April, 19:47, by manu, machine translated from French

It's been a while since I had not gone into a bookstore, but this passage has been been successful since it allowed me to find a few books to complete these Easter holidays with this special issue of Linux Magazine devoted to Perl: Perl Special Edition Powerhouse .

There are nice items, and it gave me some ideas to fuel the Perl part of this blog.

Stay tuned ...

Introduction to Plack talk in French Perl Workshop

13 April, 07:14, by miyagawa, machine translated from French

Plack is a port of WSGI (Python) and Rack (Ruby). Its objective is to provide a common environment developers web framework. It provides connectors for many web servers, but also an environment to easily write middleware. Plack is still young but is already adopted by all frameworks Perl (Catalyst, Dancer, Mojo, ...), and many middleware are also available on CPAN.

via journeesperl.fr

frankcuny Will Talk about Plack in French Perl Workshop 2010.

Save the image in an HTML file

17 March, 22:36, by manu, machine translated from French

Some time ago now, when I began to read every day on my ebook, I quickly wanted to be able to read articles from the web on my reading lamp. After a quick tour, I had found nothing at the time (when it goes back even a little while now) and I had therefore embarked on developing a personal solution.

After some reflection, I developed the following tools:

As (bad) laziness, I never took the time to put these modules on CPAN 2, but this does not stop me to talk about anyway.

So, in the case before us today, we'll see HTML:: Image:: Save. Using it is as simple as possible:

Perl is a planet

15 March, 22:32, by manu, machine translated from French

With a lag, I just read the note from Jean Véronis entitled "Ontologies: Perl is a planet in the Solar System" (an English version of the article is available).

And therefore, set to hand the intrinsic value of the ticket, long live Perl ;-)

The importance of the ecosystem of a language

8 March, 20:13, by manu, machine translated from French

In reading the latest "PragPub: The First Iteration" (No. 9, available on the website PragProg) in the article "JavaScript: It's Not Just for Browsers Any More" by Jason Huggins, I came across a reflection interesting

When we choose a technology to write an application, we do not just choose the language, we also choose the list of available libraries. If a language has many useful libraries with a vibrant community around them, it's going to be easier to write your application in less time.

Well, in my case, it made me think to Perl and CPAN, but I'm sure others will read other things:)

Pearltrees, RDF & Perl

7 March, 14:54, by manu, machine translated from French

After some discussions recent, I plunge into the arcana of RDF. Indeed, Pearltrees offering a possibility to export favorites RDF, my natural curiosity prompted me to investigate in order to know what I could actually do with this file. In addition, @ SebDeclercq was nice enough to send me that his own safety, thus sparing me the tedious task of enriching my own Pearltrees.

So I have a file full of links, and now I want to exploit this information. How? As indicated in the ticket's blog Nicolas Cynober, we can use a tool like SPARQL to manipulate information. In SPARQL, you can submit your application written in SPARQL, and get a response in different format:

  • XML, with the opportunity to learn a script XSLT to transform the document (eg XHTML);
  • in JSON;
  • or just text.

So it is a very interesting tool, but it must be online, and personally, I like having my tools directly available, regardless of my web access. So, the question arises of what I have on my machine directly. A trip through the CPAN, and hop, here are some promising modules:

  • RDF:: Redland;
  • RDF:: Trine;
  • and that I need to query the file via the SPARQL: RDF:: Query.

"It keeps up!

26 February, 22:16, by manu, machine translated from French

Recently, at the turn of GitHub, I saw an interesting project: growlme. For those not familiar with Growl, it is a notification system non-intrusive. Imagine that you start burning a DVD, and during that time you enjoy to work on an article or blog post. Traditionally, the burning software you display a small window at the end of the grave, thus interrupting what you were doing. With reporting systems as Growl, a window will appear flying in the top right of the screen indicating the end of burning. Growl is only available on Mac 1, but there are similar systems on Linux (libnotify among others).

growlme can launch a command line and be kept informed of the outcome of its execution via Growl. As I often start a lengthy process via the command line and harness myself to other tasks in the meantime, I'd like to have the equivalent on my Linux.

The result is not very long and operates Desktop::Notify is to say, the Perl interface to libnotify, IPC::Run to run the command, Getopt::Long to process parameters on the command line (note use of Getopt::Long::Configure('pass_through') to keep @ARGV and Sys::Hostname for the title of the notification. For the rest, I aped the original program.

# '/ usr / bin / env perl

use strict;
use warnings;

use Desktop:: Notify;
use IPC:: Run qw (run);
use Getopt:: Long;
use Sys:: Hostname;

my $ config = (
    message => 'Succeed!'
    fail => 'FAILED',
    title => hostname (),
);

Getopt:: Long:: Configure ( 'pass_through');
GetOptions ($ config, 'message = s', 'fail = s', 'title = s');

if (scalar (@ ARGV) == 0) (
    die "$ 0: Must provide a command to execute \ n";
)
else (
    my $ notify = Deskto

Pearltrees

24 February, 21:23, by manu, machine translated from French

Pearltrees

Following a comment on his blog , @ SebDeclercq explained the reasons for his choice on Pearltrees . In summary (and correct me if I misunderstood):

  1. need a tool to make backups of these favorites;
  2. this tool should be online to be used for several machines;
  3. the proposed classification system should be effective, visual grading and tree form met its expectations.

For my part, I share the need for a system to manage my favorites. I've used for a while Delicious , but at some point, I abandoned the habit of depositing the resources deemed valuable for the benefit of a small tool developed by me. In its reply @ SebDeclercq reports that his use of Pearltrees is mainly done through the Firefox extension. I had not taken the time to test it. First, because my main browser is now Chromium , and also for lack of time. So I tested the tool offered by Pearltrees , namely the bookmarklet, but I was not necessarily happy. The latter deposited the beads (include URLs that you want to keep) in a basket, and you still had to "arrange" these pearls. And therefore, having to use this interface in Flash that I find particularly displeased a . In short, in these circumstances, it was difficult to adopt Pearltrees in my toolbox.

But the article @ SebDeclercq forced me to revisit the tool, and so this time I took the time to launch a Mozilla Firefox, and install the extension. And so I could understand why it was a useful tool to manage their favorites:

  • the obvious integration into the browser;
  • the provision of a button you can choose where to put your pearl, and therefore, it eliminates the need to go through the site, and thus avoid the Flash interface;
  • a button to launch your Pearltrees.

Nevertheless, despite everything, I do not include Pearltrees in my toolbox for the following reasons:

  • lack of APIs to this day: from my point of view, is really the biggest black cloud over this tool. Indeed, in the Web 2.0 that is ours, and in the Web 3.0 or Web data pointing the tip of her nose, the presence of a PLC is essential 2 ! So, if I have trouble with the Flash interface, the presence of an API would have allowed me to develop a command line interface that I like. But that is not with the export RDF that I'll go far three since I have to log in to achieve this export;
  • features of the tool does not exceed that of many Delicious . Well, the tree structure is very cool aspect is interesting display 4 , being able to "capture" a pearltree is really nice, but otherwise I'm afraid to reproduce the same pattern with Delicious ;
  • besides this, I discovered recently via a tweet @ MarioAsselin tool more in tune with my needs: Diigo , I use it for a week now, and although I still have to improve my workflow staff

Information Retrieval - Example of creating an index

21 February, 22:49, by manu, machine translated from French

In the context of a during a course of management of digital knowledge, I speak a little information retrieval. Just the basics, but hey, it seems important to know how a search engine, especially for future specialists in information and documentation. Among these bases, we how a search engine that receives text to index. If the theory is easy to understand, this does not prevent me from adding a visual layer, and for this reason that I wrote a little to illustrate this stage of indexing.

The script was developed over a period of noon, just before giving the course, and so I did not work ergonomics and layout tool. To achieve rapid tool, I used the following tools:

Otherwise, the code is quite simple:

# '/ usr / bin / env perl

use strict;
use warnings;

package MyView:: Templates;
use Template:: Declare:: Tags;
use base 'Template:: Declare';

private template form => sub (
    my $ self = shift;
    my $ title = shift;

    div (
        attr (style => 'margin: auto; size: 15%',);
        form (
            attr (
                action => '/ submit',
                method => 'POST',
            );
            textarea (
                attr (
                    cols => '100 ',
                    rows => '25 ',
                    name => 'title'
                    style => 'float: left;'
                );
                $ title;
            );
            input (
                attr (
                    name => 'submit',
                    type => 'submit',
                    value => 'Submit'
                    style => 'float: left;'
                );
            );
            input (
                attr (
                    name => 'reset'
                    type => 'reset'
                    value =>

The reinvention of the wheel through MARC:: Record

9 February, 21:20, by manu, machine translated from French

Recently on the mailing list perl4lib, a question was raised about the existence of a solution to divide a too large MARC file into several smaller files.

Like many users of MARC records, and programmers manipulating MARC records, I faced this problem and I developed a small quick solution:

# '/ usr / bin / env perl

use strict;
use warnings;

use MARC:: File:: USMARC;
use MARC:: Record;

use Getopt:: Long;

my $ config = (output => 'input');

GetOptions ($ config, 'input = s', 'chunk = s', 'output = s', 'max = s');

if (not exists $ config-> (input) and not exists $ config-> (chunk)) (
    die "Usage: $ 0 - input file - chunk-size [- output file] \ n";
Else ()
    run ($ config-> (input), $ config-> (output), $ config-> (chunk), $ config-> (max));
    
)

sub run (
    my ($ input, $ output, $ chunk, $ max) = @ _;

    my $ marcfile = MARC:: File:: USMARC-> in ($ input);
    
    my $ fh = $ output eq 'input'? create_file ($ input): create_file ($ output);
    my $ cpt = 1;
        my $ total = 0;
    while (my $ record = $ marcfile-> next) (
        $ count + +;
        
        if (defined $ max) (
            last if $ count> $ max;
        )
        if ($ cpt + +> $ chunk) (
            close $ fh;
            $ fh = $ output eq 'input'? create_file ($ input): create_file ($ output);
            $ cpt = 1;
        ) 

        print $ fh $ record-> as_usmarc;
    )   
    close $ fh;
)

create_file sub (
    my ($ output) = @ _;
    my $ cpt = 0;
    
    my $ filename = sprintf ( '% s.% 03d', $ output, $ cpt + +);
    while (-e $ filename) (
        $ filename = sprintf ( '% s.% 03d', $ output, $ cpt + +);
    )

    open my $ fh, '>', $ filename;
    return $ fh;
)

This tool is an example of a solution librarian - librarian should be able to program (if it wishes to schedule, of course). The algorithm used is far from complicated (although this is a good exercise), and it is an exploitation of existing modules (CPAN live!) For which there is documentation. In short, a fine example of laziness and spécicialisation.

Nevertheless, a better solution than what marc split. Pl is undoubtedly the use of utility-marcdump c yaz

Happy New Year - Happy New Year 2010

4 January, 03:51, by Yann, machine translated from French

We wish you all the best for the new year.

Our best wishes for the new year.


Eh! Maelys, Caroline & Yann

Noel 2009 - I couldn't manage a better picture this year. I suck


Baby pictures are available on my Flickr photostream as usual.

The baby pictures are available on my Flickr as usual.

A glance at the calendar

10 December, 18:33, by manu, machine translated from French

For several years now, an initiative of the Perl community, that can be called a tradition now organized in this period. It is an advent calendar. From the first day of December until Christmas (Advent so), an article is published to explain now a module, sometimes a technique, or an element of a culture monger 1.

To my knowledge, there are four Advent calendars in the Perl community this year. Each focuses on a particular project in the community:

I did a brief search to see if communities of other programming languages had taken the initiative, but I found nothing really followed:

  • Ruby was one in 2006 and 2008, but obviously nothing this year;
  • PHP has one in 2007, and this year;
  • I have found nothing for Python;
  • and I stopped there! Feel free to submit a comment for me to discover other such schedules.

Personally, I think it's an interesting initiative, first it helps to learn a language or a particular software package, and then it should also provide a sort of promotion for the purpose of the calendar .

In short, in my case, much more interesting than the timing of a famous brand of tires, for example ;-)

A long time ago

30 November, 09:23, by manu, machine translated from French

It's been a while since I have published notes on this blog, but this does not mean that Perl has not been helpful in the meantime. In fact, it is even quite the opposite! For now, I find it especially useful for me to make prototypes to better understand certain technologies.

Thus in GNU / Linux Magazine France the month of November 2009, there is an article (Put a sphinx in your search engine!) On a tool which I had thrown a quick glance: Sphinx. I have not had time to finish the article (I'm late on everything, I tell you!), But the principle is to provide an entry point to a database via a query. The results of this query is then indexed by an external software (Sphinx in this case). Then when we want to do a full text search on this database, we first examine the Sphinx to get the identifiers of data, we can then retrieve the data directly into the database.

I do not have databases large enough to hand that to see if the promises of performance are at the rendezvous, but according to the author of the article is true!

I therefore question the possibility to use Sphinx to create OPACs "new generation". The advantage is that simply adding Sphinx in the game can use a query language much richer than conventional OPACs.

I have used this technique for indexing by an external program to make available a database Winisis via the protocol Z39.50. I used Perl for this, with the following modules:

One more tool: CPAN:: Mini:: Webserver

10 November, 22:52, by manu, machine translated from French

Perl is really a tool that I like in my toolbox, and like many other programmers Perl, which I particularly appreciate is being able to count on the work of many others to help me solve my problems daily. Sometimes the services rendered are not of the order of support to complete a job, but simply to make an even more comfortable.

Often relying on the CPAN, I came to myself a copy on my hard drive that I keep synchronized as evenly as possible. This work is directed by the excellent minicpan (CPAN:: Mini). So here I am in the pleasant situation of always having to wear hand an archive from CPAN, which I am connected to the Internet or not. But the CPAN is not limited to this, it is also the tool that I consult to read the documentation (yes, I know perldoc is there for that), but what to do when I'm on the road? It's simple, use minicpan webserver (CPAN:: Mini:: Webserver). The latter is based on the configuration file minicpan, and provides all via a web server. I can now do research, read the documentation, testing, and I can even start the installation from this interface. In short, a CPAN survitaminé! A try!

Am I connected?

31 October, 22:40, by manu, machine translated from French

Most tools that I developed based on LWP, so I need a web access to run them (nice platitude!). Among these tools, some are expected to start via cron, and thus raises the question of what happens to the program when it starts and I'm not connected. One way to address the problem is to modify the program so as not to start treatment when web access is operational. Here LWP:: Online just to help me. It allows me to import a function online (), which will check if web access is present. If so, it returns a positive value. LWP:: Online checks web access by checking the presence of the copyrights on some sites like Google and Yahoo!, So this adds some latency in your program, but it remains for some interesting tools.

A cookie for another

22 October, 21:13, by manu, machine translated from French

Recently, I found myself faced with a problem rather silly but nonetheless annoying. I use for some years now the services of a site. I had to connect to this site recently, but

  1. I could not remember my password!
  2. resetting the password for difficiel was no way to remember the email address used to create the account!

Damned! How?

Well, welcome to be lazy! Indeed, to enjoy the services of this site, I had written a small robot in Perl with LWP. This tool works good as always relied on a cookie. This cookie was in the proprietary format of HTTP:: Cookies, so I had to convert the format of Mozilla Firefox.

No sooner said than done. A little research on the web, and I found an article of Mongueurs explaining a conversion of cookies, but Mozilla to LWP. After some experimentation, I eventually adapted the program. Here:

# '/ usr / bin / env perl

use strict;
use warnings;

use HTTP:: Cookies;
use HTTP:: Cookies:: Mozilla;
use Getopt:: Long;

my $ config = ();

GetOptions ($ config, 'from = s', 'to = s');

_usage die () unless _valid_config ($ config);

my $ input_jar = HTTP:: Cookies-> new (file => $ config-> (from));
bless $ input_jar, 'HTTP:: Cookies:: Mozilla';

$ input_jar-> save ($ config-> (to));

_usage sub (
    return "Usage: $ 0 - from cookies.lwp - to cookies.mozilla \ n";
)

_valid_config sub (
    my $ config = shift;

    if (exists $ config-> (from) and exists $ config-> (to)) (
        return 1;   
    Else ()
        return 0;
    )
)

He then just call this program as follows: $./cookies_converter -f my_lwp_cookie.txt -t /path/to/mozilla/profile/cookies.sqlite

Then I could go on my account page to change my password and put an email address

Help me fill my buffer

19 October, 15:26, by manu, machine translated from French

At the turn of a tweet, I discovered a little tool to complement emacs. This project perl-completion.el. This allows a smart auto-completion, that is to say that does not limit the content of open buffers. You can obtain help for the names of the modules, but also methods exported by these modules. In short, a must!

If you want to see more, there is a screencast that you will appreciate the work done. There is also a git repository online.

Enjoy!

Indexing BackPAN

29 July, 00:30, machine translated from French

brian d foy. 15 August 2008