Sciology = Science + Technology

Commonsense in Technology

Posts Tagged ‘software’

Commercial and OpenSource OCR Softwares

Posted by sureshkrishna on November 4, 2009

After testing the FineReader, OmniPage, ReadIRIS, and SimpleOCR, Aspire, Tesseract….it is evident that ABBYY FineReader 9 is the best overall value, while ReadIRIS is the best OCR software for under $150.

The main features that differentiate OCR software are:

  • Character recognition accuracy
  • Page layout reconstruction accuracy
  • Support for languages
  • Support for searchable PDF output
  • Speed
  • User interface
  • API / SDK
  • Support / Consulting
  • Stability of the engine when processing large documents
Following are some of the Softwares that i played with and compared.
SimpleOCR is the popular freeware OCR software with hundreds of thousands of users worldwide.  SimpleOCR is also a royalty-free OCR SDK for developers to use in their custom applications. If you have a scanner and want to avoid retyping your documents, SimpleOCR is the fast, free way to do it.  The SimpleOCR freeware is 100% free and not limited in any way.  Anyone can use SimpleOCR for free–home users, educational institutions, even corporate users. Our own freeware OCR application provides acceptable accuracy for those who just need to convert a few pages and can’t justify the cost of commercial OCR software.  Developers can use the command-line and SDK versions to integrate SimpleOCR with their custom applications.

 

ABBYY FineReader

FineReader Professional is a highly accurate and easy to use OCR software that includes host of features including digital camera OCR, intelligent document layouts, image enhancement, barcode recognition and command line integration.  FineReader 9 is our pick for OCR software because its document layout retention will save you much time in reformatting documents you convert for editing

IRIS ReadIRIS

Affordable OCR software for business and home users.  ReadIRIS Pro provides a extremely accurate OCR recognition rate at a low cost, but still has some of the advanced features that higher priced professional OCR software includes.

Nuance OmniPage

OmniPage is widely considered the fastest, most accurate and fully featured OCR software.  OmniPage 17 Professional has a unique new feature that lets you convert any type of document to searchable PDF or Word. OmniPage does not have a downloadable demo. Nuance also does not provide free technical support after the first call.  For these reasons we recommend the ABBYY and IRIS products instead.

OmniPage is an Optical character recognition application available from Nuance Communications. Nuance Communications was acquired by ScanSoft, which also took over its name in October 2005.OmniPage converts images such as scanned paper documents, and PDF files, into file formats used by computer applications such as Microsoft Word, Excel, Adobe Acrobat, or HTML files.OmniPage is in competition with ExperVision (TypeReader), Readiris and ABBYY Fine Reader as well as free software such as GOCR and Tesseract.

http://code.google.com/p/tesseract-ocr
In computer software, Tesseract is a free optical character recognition engine. It was originally developed as proprietary software at Hewlett-Packard between 1985 until 1995. After ten years without any development taking place, Hewlett Packard and UNLV released it as open source in 2005. Tesseract is currently developed by Google and released under the Apache License, Version 2.0.

http://jmagick.wiki.sourceforge.net
JMagick is an open source Java interface of ImageMagick. It is implemented in the form of Java Native Interface (JNI) into the ImageMagick API. JMagick does not attempt to make the ImageMagick API object-oriented. It is merely a thin interface layer into the ImageMagick API. JMagick currently only implements a subset of ImageMagick APIs. Should you require unimplemented features in JMagick, please join the mailing list and make a request. JMagick has a LGPL (Lesser GNU Public License) license.

http://www.expervision.com
The award-winning TypeReader converts scanned documents into electronic files at speed of 8,000 pages per hour with maximum reliability. Desktop 7.0 offers added flexibility to handle color and grayscale images, with duplex scanning support to process documents in English, French, German, Italian, Portuguese, Spanish, Dutch, Danish, Swedish, Norwegian, Finnish, Polish, Hungarian and Polynesian. It employs an unparalleled recognition technology to support 2618 fonts. Users can choose to output to various formats including PDF, MS Word, Excel, Lotus 1-2-3, HTML, etc.

http://www.edocfile.com
Tiff to Text is designed to perform Optical Character Recognition (OCR) in a batch process. The program utilizes the OCR engine from Nuance (Owners of OMNI Page – formally ScanSoft) that is included with Microsoft Office Document Imaging (MODI).

http://www.simpleocr.com/OCR_Software_Guide.asp

Posted in software, Technology | Tagged: , , , | 5 Comments »

Can you do effective Context Switching ?

Posted by sureshkrishna on July 23, 2009

Everyone in Software and IT industry are exposed to the what i call as “Context Switching” problem. Bosses are so adept in giving different kinds of tasks to the “makers”, they often dont realize whats involved in the context switching. Before i move on, i was reading a very interesting article from Paul Graham on the “Maker’s Schedule; Manager’s Schedule“. Indeed, he was right on to the point of where the programmers (aks Makers) and managers spend their time and what does “meetings” mean to each of them.

We very well assume that most of the programmer’s have 8 hours of work time in a day and schedule all the work according to it. What we very often forget to take in to account are the obvious and non-obvious tasks. As Paul says, programmers one piece of work/task is normally in the 1 day chunk  (for some at least, it’s in 1/2 day chunks) and any disturbance in that 4-8 hours of time proves to be very costly. We all want to concentrate and make sure that the entire program is in our head till the time we are done with it. This phenomenon is very well explained by Paul in his article “Holding a program in one’s head“.

When you start your brand new day at office thinking over a problem or a algorithm, your boss calls up and asks you for a status update because his boss asked for a team update. Well, that is the request you need to honor without a question.  Usually the calls will not be 5-10 minutes but goes for minimum of 30 minutes to 1 hour, because we are trying to solve a problem over the phone or in the meeting room.

  • Meetings,
  • Weekend vacation talk,
  • Extended Lunch and coffee time,
  • Status reports to manager,
  • Status report to customer,
  • Helping the Sales and Marketing Team,
  • Attending the personal calls
  • etc… (i am sure there are 100s of such things)

The tasks, your manager thinks of as 10 minutes actually takes 1hour and after sending the sweet report or tools comparison to him, you again head down to start writing your program and after 1hour you get a call to say that the report format should be changed so that he can submit it right away to his boss. Phewww…. you did that one too. Now the time is almost, 2.00pm and you really want to concentrate and do the REAL work. All in all, When i analyze the average programmer probably gets around 5 hours of quality time out of 8 hours in the day time. No wonder, we often end up working late nights just before the delivery. Many programmers has the similar habit of working in the dark/late hours. Yes, that works perfectly. No one to call you or ask for reports or for help. The only thing that you really think of is the problem before you.

I work on a project which has a very tight dependency with the environment(installed software on the machine) . Every thing installed on a machine matters and a lot of legacy code is maintained from past 15 years or so. The environment is so critical that if one installed the required software in anyway other than the prescribed order, you may need to burn the mid-night lamp to find some non-obvious, strange and scary system behavior. Of course, initially i was under the impression that the software system MUST not depend on the environment, but as i got into the system, i believe that some times the dependency just exists (due to several legacy apps and unimagined integrations of different products).

Context Switching is one project impedement that Agile Methododlogy and Scrum claims to remove. Scrum recommends the values where the team has a specified time for meetings and tries to decrease the buerocracy in th project against the traditional adhoc meetings  and untimely calls for context switching. Of course one can say they are following Scrum and still do the traditional way, but i have seen this working in my experience.

Did any of you have such problems of Contect Switching ? What do you do to come over it ?

Posted in Technology | Tagged: , , | Leave a Comment »

Proactive Maintenance is crucial in all industries !!!

Posted by sureshkrishna on June 7, 2009

During the start of my career as a Software Engineer, my first assignment was to maintain a COBOL system that used to transact approximately 5000 records per hour. It was very huge and challenging system with web and AS 400 system integration. During the start of the career, the general idea for me as a Computer Engineering student was to build software framework and systems with fancy programming languages and databases. Once i was thrown in to the COBOL maintenance, i was kind of dejected for initial few weeks. Luckily, my manager noticed this and made me understand why is it important to maintain software systems and what can one learn from it.

I am writing this article to remind all the developers and designers of the software/hardware systems in all industries about the maintenance of the critical systems. A problem, which everyone thinks small could become big or crucial or critical in certain circumstances. All the industries face the same problem that any system can not be tested with all the real time scenarios. The test data or test cases for any system are limited and time bound, So can not be trusted for 100% test coverage and safety of system.

Very often we encounter the “refactoring” dilemma in the software industry. The question that comes to everyone’s mind is should we refactor “NOW” or put it off for later “trigger” ? All projects are faced with the following challenges, which makes a project to decide if a “refactoring” is necessary at that time.

  • short time
  • limited budget
  • non-availability of resources
  • pressure from sales and marketing and
  • finally pressure to deliver

We always tend to postpone and procrastinate the code, design and architecture refactoring. Very often “shit happens” and the cost of refactoring is sky rocketing. Customer is angry, development team gets demotivated and project stakeholders are unhappy with the system performance. Some of these problems are addressed by the agile methodology (TDD, SCRUM, XP, RUP, etc…) and some are addressed by the timely act of “experienced” leaders in the industry. However good is a methodology or a process, finally everything depends on the people who implement it. So many times i get “upset” when big organizations talk about “people independant” process ???

Finally, i was moved by the recent incident of the Air France flight (Rio de Janeiro to Paris) havoc, which probably seems to be a problem with some failed hardware. The news seems to be that the hardware sensors had to be replaced some months back and for some reason they did not do it. Irrespective of whether this is a hardware failure, it calls for everyone to be more attentive, proactive  and creative when building the critical applications and systems. Following is an excerpt of the news from internet.

Air France issued a statement with details about the monitors hours after the French agency investigating the disaster of Flight 447 said the instruments were not replaced on that aircraft – an A330 – before it crashed last week into the Atlantic Ocean en route from Air France issued a statement with details about the monitors hours after the French agency investigating the disaster of Flight 447 said the instruments were not replaced on that aircraft – an A330 – before it crashed last week into the Atlantic Ocean en route from Rio de Janeiro to Paris.

Air France said it began replacing the monitors on the Airbus A330 model on April 27 after an improved version became available.

Pitot tubes, located on the exterior of the aircraft, are used to help measure aerodynamic speed.

Aviation officials have said the crash investigation is increasingly focused on whether external instruments may have iced over, confusing speed sensors and possibly leading computers to set the plane’s speed too fast or slow – a potentially deadly mistake in severe turbulence.

An Air France statement said that icing of the monitors at high altitude has led at times to loss of needed flying information.

However, the Air France statement stressed the recommendation to change the monitor “allows the operator full freedom to totally, partially or not at all apply it.” When safety is at issue the aircraft maker issues, rather than a recommendation, a mandatory service bulletin followed up by an airworthiness directive..

Air France said it began replacing the monitors on the Airbus A330 model on April 27 after an improved version became available.

Pitot tubes, located on the exterior of the aircraft, are used to help measure aerodynamic speed.

Aviation officials have said the crash investigation is increasingly focused on whether external instruments may have iced over, confusing speed sensors and possibly leading computers to set the plane’s speed too fast or slow – a potentially deadly mistake in severe turbulence.

An Air France statement said that icing of the monitors at high altitude has led at times to loss of needed flying information.

However, the Air France statement stressed the recommendation to change the monitor “allows the operator full freedom to totally, partially or not at all apply it.” When safety is at issue the aircraft maker issues, rather than a recommendation, a mandatory service bulletin followed up by an airworthiness directive.

Posted in News, Process, Quality, Reviews, Technology | Tagged: , , , , , , | Leave a Comment »