Sunday, October 25, 2015

Web scraping using socks/http proxies

While extracting data from websites most probably you will notice some kind of access limiting for a single IP address. In that case it is good to use proxies. In my scraping projects I've been using both http and socks proxies.

Wednesday, October 21, 2015

Using django models in standalone application

One of my favorite things about Django is its loosely coupled design. It means that you can use each part independently and not affect others. And one of the parts that I often use in my applications is Djagno Models.

There is nothing difficult to use django models in standalone application. You just need to start new django project and then remove all unnecessary stuff leaving only models. Below I will show it on example.

Saturday, October 17, 2015

Wanna quit your job and become Upwork freelancer? Do not do that!

Almost every day I receive job offers on my 5 star odesk (upwork) profile. I do not search anything manually, clients contacting me directly. But most of these job offers are really garbage.

Real example. Couple days ago some R.D. from United States contacted me. She wanted to do website scrape.

Ok, after doing quick website check, job seems straightforward. Huge list of categories/subcategories and product listings. Nothing unusual.

The next part is to set the price for the job. Usually I would say: Ok, data scrape will cost you $75. Then client approves/declines. But this time I decided to give chance my client decide how much she would like to pay. So I simply said:

"Just send me an offer with price that is ok for you."

Then she asks me if I can also deliver script. I said that I can deliver script as well, but it would cost more expensive, since I need to clean the code, make it user friendly and easy to run. And what is the result? How much do you think she evaluated my work? Below is her answer:

"Since this site is so big I was thinking having the script itself might make sense rather than reaching out each time for every random vertical - otherwise I'd have to ping you again. We have a dev team here and I studied CS, so I don't need it to be beautiful code, just functional. 😄

But for right now, let's start with the data for the food one. I had been paying $3/hr for manual data entry. Would $15 be fair to you for this category?"

Great. She had been paying $3/hr for manual copy paste for a person without any programming skills. My skills she evaluates the same - $3/hr. Even my profile states $15 per hour. Or probably she's thinking that's possible to develop script, scrape the data and deliver functional code to her just in 1 hour at $15? No further discussion.

Where these all people are from, who want me to work for $1-$3/hr? Guys, this is not serious and not funny. I'm saying NO to free labor!

P.S. I'm not saying that all of the clients on odesk are like described above. But this is just very common example of what I've experienced so far.

Note to client: If the task is so easy and you even studied CS and have dev team, why should you waste your time searching, interviewing and hiring programmer on odesk? Wouldn't it faster to write so "easy" script by yourself? It shouldn't take more than 5 minutes for such professional like you.

Friday, October 2, 2015

How to setup nginx+uwsgi with CKAN

Recently I had to deploy CKAN website on web server. There is nothing difficult if just you follow official documentation and deploy CKAN with Apache. But my choice was uwsgi+nginx, because I always use this bundle.

After setting everything up like I usually do with django, I got below error in nginx log files for all static files:

[error] 1057#0: *70 upstream prematurely closed connection while reading response header from upstream