Code in Python Blog

Showing posts with label python. Show all posts

Wednesday, November 21, 2018

Python - How to wait for presence of an element with non empty text?

Hello!

Sometimes it's useful to wait for an element with non empty text. The question is how to do it with Selenium in Python? Pretty easy! You can find an answer below.

If xpath is absolute, then you can wait for your element with non empty text like this:

# xpath - xpath of your element

WebDriverWait(driver, 30).until(lambda driver: driver.find_element_by_xpath(xpath).text.strip() != '')

How to install python 3.5 in CentOS 7 from source

At the time of writing, repository of Centos 7 contains python 3 of version 3.4, which is not the newest. And sometimes it's useful to have newer version. For example, I needed python 3.5. And here I will show you how to install it. We will be installing it from source.

1. Update repositories and install couple packages:

sudo yum update
sudo yum install zlib-devel openssl-devel

Fabric how to set environment variable to fix encoding

Hi there,

Today I've encountered with one weird problem and decided to share it with you.

Here is what happens. When I ssh to server manually and run python script - everything works fine. But if I try to run the same script using fabric script, which connects to the same server, then it fails. In particular, it was encoding error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xaa' in position bb: ordinal not in range(128)

How to scrape https website with proxies

Hi all,

My last post about scraping with proxies is quite old and I decided to write a newer version of it. In particular, today I will emphasize how to scrape https website with proxies.

There are also good news about requests library. Requests has not been supporting socks proxies for quite a long time, but in 2016 there was a new release of it. So now requests fully supports both http and socks proxies.

So let's get started. Below I will show you 4 different examples of how to scrape a single https page. First, we will scrape it with requests using socks and http proxies. Second, we will do the same using urllib3 library.

How to build uwsgi from source and run as a service

As you might have already noticed, I always deploy python applications using uwsgi and nginx. But I didn't write yet how to install uwsgi. Probably there is some version in repository, but I don't really care about it. I know it's old and not what I actually need.

I need to build uwsgi with plugin support, which would allow me to work both with python 3 and 2. Then I will create service config file, so that you can use it as service. Sounds good, isn't it? Let's do it then.

1. Download uwsgi and extract it

cd ~
curl -O http://projects.unbit.it/downloads/uwsgi-2.0.12.tar.gz
tar -xvzf uwsgi-2.0.12.tar.gz

Why Python is not good for multi-threading?

Recently I was asked this question during screening interview at Yandex (Russian search engine), and they screened fairly well from me. They said: you're cool guy, but try again after a year. You're ok for junior, but not for senior position.

Long time ago I read on some blog that multi-threading is not good idea for Python. That's the only thing came to my mind at the interview. So I only answered that's not good idea, because it will require a lot of memory. Quite silly answer.

Then the interviewer said that it's somehow related to GIL. What's GIL??? It sounded like some kind of familiar and intelligent word to me.

After that, I googled this blog which explained me why Python is not good for multi-threading. Shortly speaking, all problems come from that GIL - Global Interpreter Lock. As result Python can only execute one thread at a time. If you'd like to start many threads, all of them will be competing for a single lock (GIL). Just remember that. You can't execute multiple threads simultaneously in Python. That's one of Python disadvantages and one of popular question at interviews.

Tricky questions: function parameters with default mutable values

Today I discovered one confusing feature of Python. And also decided to open new category on my blog, which will be called "tricky python questions".

def append(num, to=[]):
    to.append(num)
    return to

aa = append(1)
print aa

bb = append(2)
print bb

Using Python passlib in Java applications

Some of my readers might ask why would python developer need to do something in java? I've never thought I would need to code in java too. But if you saw my last post, I was talking about how to implement SSO with django website. And one of its main components is java based application, which I wanted to customize.

So here is the problem: I needed to verify password hashes in java application (java 7), but those hashes were generated with python passlib library (pbkdf2_sha512). First I even tried to implement password verification in java, but then I gave up and decided to do it easy way.

Web scraping using socks/http proxies

While extracting data from websites most probably you will notice some kind of access limiting for a single IP address. In that case it is good to use proxies. In my scraping projects I've been using both http and socks proxies.

Using django models in standalone application

One of my favorite things about Django is its loosely coupled design. It means that you can use each part independently and not affect others. And one of the parts that I often use in my applications is Djagno Models.

There is nothing difficult to use django models in standalone application. You just need to start new django project and then remove all unnecessary stuff leaving only models. Below I will show it on example.

How to setup nginx+uwsgi with CKAN

Recently I had to deploy CKAN website on web server. There is nothing difficult if just you follow official documentation and deploy CKAN with Apache. But my choice was uwsgi+nginx, because I always use this bundle.

After setting everything up like I usually do with django, I got below error in nginx log files for all static files:

[error] 1057#0: *70 upstream prematurely closed connection while reading response header from upstream

s3 upload large files to amazon using boto

Recently I had to upload large files (more than 10 GB) to amazon s3 using boto. But when I tried to use standard upload function set_contents_from_filename, it was always returning me: ERROR 104 Connection reset by peer.

After quick search I figured out that Amazon does not allow direct upload of files larger than 5GB. In order to upload file greater than 5GB we must use multipart upload, i.e. divide large file into smaller pieces and upload each piece separately. But I didn't want to cut my file phisically because I didn't have much disk space. Luckily there is a great solution - we can use file pointer and set number of bytes we want to upload per time. Below is my function that you can use to upload large files to amazon:

Code in Python Blog

Wednesday, November 21, 2018

Python - How to wait for presence of an element with non empty text?

Saturday, January 6, 2018

How to install python 3.5 in CentOS 7 from source

Friday, January 20, 2017

Fabric how to set environment variable to fix encoding

Thursday, January 5, 2017

How to scrape https website with proxies

Monday, December 5, 2016

How to build uwsgi from source and run as a service

Sunday, November 27, 2016

Why Python is not good for multi-threading?

Tuesday, February 2, 2016

Tricky questions: function parameters with default mutable values

Sunday, November 8, 2015

Using Python passlib in Java applications

Sunday, October 25, 2015

Web scraping using socks/http proxies

Wednesday, October 21, 2015

Using django models in standalone application

Friday, October 2, 2015

How to setup nginx+uwsgi with CKAN

Monday, August 17, 2015

s3 upload large files to amazon using boto